Funny scoring.

Yngvar Folling (yngvar.folling@login.eunet.no)
Sat, 8 Jun 1996 10:53:29 +0200 (MET DST)

I've had some odd results using score files on the newsgroup
re.humor.funny. Take the article with the following header:

Newsgroups: rec.humor.funny
From: eschulma@nrao.edu (Eric Schulman)
Subject: The History of the Universe in 200 Words or Less
Keywords: chuckle
Approved: funny-request@clari.net
Path: Norway.EU.net!EU.net!gatech!gt-news!cc.gatech.edu!cssun.mathcs.emory.edu!swrinde!newsfeed.internetmci.com!news2.cais.net!news.cais.net!bofh.dot!nntp.crosslink.net!bofh.dot!en.com!in-news.erinet.com!bug.rahul.net!rahul.net!a2i!samba.rahul.net!rahul.net!a2i!news.PBI.net!nntp-hub2.barrnet.net!nntp-sc.barrnet.net!fugue.clari.net!looking!funny-request
Message-ID: <S9a2.2a6e@clarinet.com>
Date: Fri, 17 May 96 19:30:03 EDT
Lines: 39

Now, the following score file RHFUNNY.0:

8 pattern Keywords: funny|chuckle

gives the article the score 8, as it should, since the word "chuckle"
appears on the Keywords line.

But switch the line to

8 pattern Keywords: chuckle|funny

And the score suddenly jumps to 32. What's the matter? Aren't the two
expressions equivalent? And as long as there's only one Keywords field,
how can that line possibly give a score as high as 32?

Enclosing the two words in parentheses doesn't help, but concatenation
should have a higher precedence than | anyway.

And if you tell me to check for the two words in separate entries in a
score file, this example is just a minimum explaining a problem I had in
a much bigger score file, where the purpose was only to count a line
once, even if *both* keywords appeared.

(Ten minutes later.) Nothing like spelling out a problem to think of
possible solutions. I enclosed the entire regular expression in one
single set of parentheses instead. That worked, but I thought the
regular expression started *after* the field name.

Yngvar