It reserves space that will be used anyway. Import won't find itself
short of diskspace, because Expire keeps it for Import to pick up.
Now, the entire newsbase (barring overview and history files) is in
one single file. Defragment your harddisk, and the newsbase will be
literally in one single piece. Import and Export will not be able to
break it into fragments anymore, because Export does not try to
truncate the newsbase (unless it really makes sense to) and Import
does not try to append to it. As a result newsbase operations will be
faster in general.
The two flavours of Expire (-o and -r) will be similarly efficient.
Before, the newsbase was in multiple files, one file for one
expiratory date. Expire -o was simple --- just delete that single
file designated for expiration today, and that single file designated
for expiration yesterday (if it exists --- if you forgot to run Expire
-o yesterday), and so on. Expire -r was therefore painful: Go
through each file, identify articles marked read, and --- the slow but
you-asked-for-it part --- squeeze out the unwanted articles and shrink
the files. This is redundantly disk-intensive, and encourages
fragmentation. The two flavours modified the history records in
different fashions too, with Expire -r being more redundant. Expire
-o only needs to purge the obsolete records. Expire -r will purge the
obsolete records, as well as modify the surviving records, for each
record points to the exact location of the article --- which file,
which byte in the file; since Expire -r shrinks the newsbase, *all*
articles are moved, so *all* surviving records need to have their
pointers modified.
With the 0.9x newsbase format, both Expire -o and Expire -r do the
same thing in essence. Identify unwanted articles in news.dat, mark
the space they occupy as "available for use by Import", done. Then
truncate news.dat if it makes sense, but this is a trivial operation,
any sensible OS (including DOS) can do this in Splits Of Seconds(TM).
The history records also need to be updated, but again both flavours
of Expire do the same thing in this case --- purge the obsolete
records. There is no need to modify any pointer in any history
record; no article is moved.
Granted, Expire -o now becomes a bit slower (before, just delete a
couple of files; now, scan the entire news.dat), but Expire -r now
becomes infinitely faster.
There are a few tips I can offer to speed things up.
1. Ever since day one, (so this is not the fault of 0.9x), Yarn
benefits from disk caches. In the days of 0.7x, I was using a 286
with 640KB of RAM. Without caching, it would take more than 30
seconds of disk accesses to open a newsgroup like sci.math (expiratory
period set to 5 to 7 days). Then I installed a lowly 64KB disk cache,
and since then opening a newsgroup would only take like 5 seconds and
just a couple of disk accesses.
2. Now that news.dat is in one single huge file, defragment it by all
means.
Today I am using OS/2 with HPFS. This combination provides nice
caching and fights fragmentation. Yarn looks fast to me.
>(I'm assuming that it's done
>something useful/timesaving on the programmer side.)
I can understand your feeling.
I say, Chin takes the pain to program for our desire.
IMHO, the "expire -r" habit is a very bad habit, carried over from the
BBSing days of packet-oriented reading. If I were an author of a
newsreader, I would tell you, "you want expire -r? Go back to your
BBS. Here in Usenet, database-oriented reading is the way to go."
But Chin is much less agonizing and much more considerate than I am.
He is willing to design a newsbase format to be a superb compromise
between those who like Expire -o and those who like Expire -r, after
seeing that so many people uses Expire -r. (Just look at the 0.7x and
0.8x newsbase format, and you'll know which flavour of Expire he likes
to use.)
-- Albert Y.C. Lai trebla@io.org http://www.io.org/~trebla/