Ok, here is all the information contained in the VSoup documentation.
If this is not enough, suggestions are always welcome (Barrie?):
--------------------------------------------------------------------------------
- It is possible to filter articles by header lines before retrieving
the body text, see score file handling. VSoup allows regular
expression for both group and header patterns, article length is
treated special.
:
8.2 The Score File
------------------
The score file specifies criteria used to exclude articles from the
VSoup packet. You can kill articles that have a specific subject,
are from a specific poster, or contain a particular string anywhere
in the header. Even killing of articles which do not contain a
specific pattern in their header is possible with scoring.
%HOME%\SCORE is the default score file. The name and subdirectory
of the score file can be configured by specifying the -K<scorefile>
command line option. This allows one single score file for all news
servers you wish to access.
An entry in the score file, also called <score-section>, has the
format:
<score-group> "{"
<score-rule>
...
"}"
with <score-rule> being one of the following alternatives:
1. <score> "lines" ">" <number>
2. <score> "lines" "<" <number>
3. <score> "pattern" "header" <reg-exp>
4. <score> "pattern" <where> <reg-exp>
5. <score> "header" <string>
6. <score> <where> <string>
<score-rule>s 1. and 2. are for scoring of the article length, 3.
and 4. are both working with regular expressions (<reg-exp>), 5.
and 6. are working with literal strings (<string>). 3. and 5. are
scanning the complete header for the corresponding patterns, 4. and
6. are matching its pattern only in the corresponding header line
<where>.
Description of the score-section elements
-----------------------------------------
<score-group>
<score-group> is a regular expression. The contained
<score-rule>s are applied to newsgroups which are matched
by the <score-group> regular expression completely. If
<score-group> is the string "all", the <score-rule>s are
applied to all newsgroups.
<score-rule>
see above.
<score>
is the score which has been assigned to the <score-rule>.
All scores of <score-rule>s matching an articles header are
added up and then compared to the <kill-threshold>. If the
total score is less than the <kill-threshold>, the article is
killed.
<where>
specifies where to search in the header. from searches the
From: line, subject searches the Subject: line and so on.
The special <where>-pattern "header" searches all the lines
in the header of the article.
<reg-exp>
is the string in form of a regular expression to search for.
<string>
is the literal string to search for.
<number>
is a line number to which the actual article length is
compared.
Special score-sections
----------------------
There are two special <score-section>s:
"quit" stops reading of the score file immedialtely.
"killthreshold" <kill-threshold>
sets the threshold for killing articles, i.e. the lowest
limit articles should fulfill to not being killed. The
threshold is global (i.e. applies to all groups and rules)
and the last occurance of "killthreshold" counts. Default
value is zero.
Lines beginning with a '#' in the first non-blank position are
treated as comments.
Remarks
-------
- All fields are case-insensitive.
- After "header" and <where> a colon (":") is allowed.
- News transmission speed decreases if a score file is used, because
an article is fetched then in two steps: HEAD <num>, then BODY
<num>. Otherwise an article will be fetched in only one single
step: ARTICLE <num>. Offline-Scoring of Yarn is also more flexible
than this online-scoring because you can change your mind about the
scores after reception of the articles!
- It is pretty legal to break the scoring criteria of several
<score-group>s in several <score-section>s.
- If "header"-<score-rule>s are applied, the pattern is searched for
in the complete header, i.e. one can also match specific header
fields.
- The <score-group> "all" is transformed to the regular expression
".*".
- Although the "."s in newsgroup names are meta characters
of regular expressions this should not do any harm to
'normal' group matching. E.g. the <score-group> expression
"comp.os.os2.programmer.misc" will match only one group in real
life.
- The score file works on news only.
- Check the FAQ for example score files.
Syntax
------
The exact syntax of the score file is as follows:
%HOME%\SCORE ::= {<line> "\n"}*
<line> ::= '#' <text> | <score-section>
Lines beginning with a '#' indicates a comment, <text> indicates
literal strings.
<score-section> ::= "all" "{" <score-rules> "}"
| <score-group> "{" <score-rules> "}"
| "killthreshold" <number>
| "quit"
"all" indicates, that the following <score-rules> is valid for
all newsgroups, The regular expression <score-group> limits the
<score-rules> to specific newsgroups.
<score-rules> ::= <score-rule> "\n" {<score-rules>}
<score-rule> ::= <score> "lines" ">" <number>
| <score> "lines" "<" <number>
| <score> "pattern" "header" <reg-exp>
| <score> "pattern" <where> <reg-exp>
| <score> "header" <string>
| <score> <where> <string>
<score> is a number (the score!), <number> too, <reg-exp> is a
regular expression, <where> and <string> are literal strings.
Note: "\n" is the newline character. It is mandatory!
8.2.1 Regular expression syntax
-------------------------------
The following description has been taken from the man regexp pages
(BSD experimental).
A regular expression is zero or more branches, separated by |. It
matches anything that matches one of the branches.
A branch is zero or more pieces, concatenated. It matches a match
for the first, followed by a match for the second, etc.
A piece is an atom possibly followed by *, +, or ?. An atom followed
by * matches a sequence of 0 or more matches of the atom. An atom
followed by + matches a sequence of 1 or more matches of the atom.
An atom followed by ? matches a match of the atom, or the null string.
An atom is a regular expression in parentheses (matching a match for
the regular expression), a range (see below), . (matching any single
character), ^ (matching the null string at the beginning of the input
string), $ (matching the null string at the end of the input string),
a \ followed by a single character (matching that character), or a
single character with no other significance (matching that character).
A range is a sequence of characters enclosed in []. It normally
matches any single character from the sequence. If the sequence
begins with ^, it matches any single character not from the rest of
the sequence. If two characters in the sequence are separated by -,
this is shorthand for the full list of ASCII characters between them
(e.g. [0-9] matches any decimal digit). To include a literal ] in
the sequence, make it the first character (following a possible ^).
To include a literal -, make it the first or last character.
--------------------------------------------------------------------------------
Don't blame me, that the 'description' is a little bit long (and
cryptive). You asked for the details...
Hardy
-- Hardy Griech, Ernetstr. 10/1, D-77933 Lahr http://privat.swol.de/ReinhardGriech/