>Do the news servers store by group?
>
>I thought that servers stored only a single copy of a
>crossposted message, like Yarn does. (Just assuming here.)
Most news-servers in production still use a "traditional
spool" format.
On a "traditional spool" articles get stored by
group, each article being a seperate ascii file
in one directory per group. Directories get split
into sub-directores at the dots in the newsgroup
name. Each article filename is sequentially
assigned number within the directory. That's what
the numbers in the "X-Ref:" header line refer to.
But the news-server still only stores a _single_
copy (under the default set up of most news-servers
that use a "traditional spool") of a crossposted
article. The copies of the article outside the
first newsgroup in the crosspost are created using
softlinks or hardlinks (depending on the setup
and whether you split the spool across different
filesystems) from the other newsgroup directories,
each link having the name appropriate to that
newsgroup/directory.
(Think of Win95/98 shortcuts as a sort of poor and
broken imitation of these Unix links)
Indexing information is stored in "overview files"
(the Yarn spool also has these); in a large news
environment -- like your ISP -- these may well
be stored on seperate physical disks from the spool
to increase performance. They certainly won't be
in the same directory that represents the contains
the contents of the newsgroup they relate.
However there is a general movement by authors
of server software away from the "traditional spool".
This is because the designers of Unix filesystems (and
indeed most filesystems) didn't optimise them for
storing very large numbers of relatively small files in
single directories. One of the greatest problems
is the length of time you need to get a complete
directory listing of a directory so you can access
a particular file.
Unfortunately you tend to get just these cases with
large amounts of news and a "traditional spool",
where news volumes continue to grow exponentially.
Oooops.
Therefore versions of INN >= 2.0 have the option
(I'm not whether they've made it a default) of storing
news in "cyclic buffers" (cycbuffs) grouped into
"meta cyclic buffers" (metacycbufs). You can think
of cycbuff as big files that fill up with news. When
they reach their fully assigned size, the news software
goes back to the beginning of the file and writes the
new articles there. So the oldest articles in a
cycbuff get continually replaced with new incoming
articles, even as the amount of storage used stays
the same. Think of the end of the file getting taped
to the beginning of the file to make a loop and a
pointer continually moving round the loop indicating
where the software writes the next article.
Other recent news-server software both free, such
as Diablo, and commerical, such as Cyclone and Breeze,
use similar systems to avoid the inefficiency of the
traditional news spool layout.
This new generation of software also often stores
articles in "wire-format" rather than a straight
ascii format identical where the article to what
you would see in your newsreader (if you set it
to show all the headers).
News servers never liked having other software --
such as news-readers -- monkeying around with their
spools, but you could get away with reading from
them even tho' writing directly to the spool screwed
things up. Nowdays the only generic way that it is
_possible_ to inject or read news to or from a server,
is to open up a socket on port 119 and speak NNTP.
...
Probably waaaay more than you ever wanted to know
about news storage. I blame my hanging out on
news.software.nntp for far too long (-;
-- Kapusniak, Stefan m