please note that the following data is collected from various sources and not necessarily official or 100% correct. If you have any information to add, send away.

The official SOUP 1.2 specifications (soup12.zip) can be found here (external link).


Yarn 9.x News.Dat format

From: cthuang@io.org (Chin Huang)                
To: yarn-list@lists.Colorado.EDU
Subject: Re: format of yarn 0.90 newsbase
Date: Mon, 12 Feb 1996 01:09:07 -0500
Message-Id: <DmtHxc6r3DpO090yn@io.org>

News articles are stored in a file named news.dat, called the spool
file.  The spool file is composed of variable-sized blocks.  There are
two types of blocks.  One block type stores an article.  The other
block type marks unused storage.

Every block begins with this header:
(long means an integer stored in 32-bit binary)

    long prev;		// offset of previous block in file
    long size;		// byte size of data area in block
    long used;		// bytes used in data area

If the block stores an article, then <used> contains the article size
and the article is stored following the <used> field.

If the block is unused, then <used> is 0 and these two fields immediately
follow the <used> field:

    long prevFree;	// offset of previous block in free block list
    long nextFree;	// offset of next block in free block list

The <prevFree> and <nextFree> fields are pointers which link the free
block into a doubly-linked list of free blocks.

A special free block at the beginning (offset 0) of the spool file
stores the head node in a doubly-linked list of free blocks:

    long prev;		// offset of last block in file
    long size;		// = 8
    long used;		// = 0
    long prevFree;	// offset of last block in free block list
    long nextFree;	// offset of first block in free block list

When the expire program deletes an article, it marks the block used
by the article as free.  It also merges the freed block with adjacent
free blocks.  If the freed block is at the end of the spool file,
it truncates the spool file.


Yarn Email Status Indicators

>If you're not too busy could you give me a quick list run-down on the
>X-status values and what they stand for? I'm suddenly curious.

A   Answered
D   Marked for deletion
N   New
O   Old and Unread
R   Read
U   Unread


Yarn Folder format

Message-Id: <4/TL04uYOJoY089yn@stack.nl>
Date: Sat, 27 Sep 1997 19:18:48 +0200
From: galactus@stack.nl (Arnoud "Galactus" Engelfriet)
To: yarn-list@lists.colorado.edu
Subject: Re: File formats?

Perhaps you could also add documentation on the folder format? 

The format for Yarn folders is quite similar to the SOUP "binary clean
mail" format, although with one small difference.

In the SOUP format mentioned above, before each message is its length,
as a four-byte unsigned value, in big-endian order. This means that
if the four bytes you read are "B0 B1 B2 B3", then the length of the
message is

B0 * 256 * 256 * 256 + B1 * 256 * 256 + B2 * 256 + B3

Yarn uses the little-endian order, probably because that's what DOS
and OS/2 use. This way, Yarn can read the length with one read call.
Similar to the previous example, if you now read "B0 B1 B2 B3" from
a Yarn mail folder, the length is

B2 * 256 * 256 * 256 + B3 * 256 * 256 + B0 * 256 + B1

Note: the messages themselves are plain text and line ends are unix style
LF characters (0x0A)


Notes on History.pag


history.pag so far known by me:

- divided into 2 k blocks (0x0800 = 2048)
- first two bytes of block tell how much following data is used in that block

- third byte is the length of the first Message-ID
- then the message-ID
- first byte after Message-ID tells length of extra data following
- data follow message-ID (12 bytes it seems always)
  - first 4 bytes big endian news.dat offset to the USED value of header.
  - second 4 bytes maybe date imported (not sure of format, probably
    number of minutes from some date)
  - last 4 bytes is a 'supercedes' date, or 0 if none
- the message-ID's in each block are sorted alphabetically, seemingly


Overview files


Date: Sun, 22 Mar 1998 10:36:08 +1000
From: Ciaran Dunn

For those who are interested here s what I know of the overview 
file format


The Overview Files

The overview file holds information for articles for display in 
the art
selection level.

The structures are as follows

Header of File(8 bytes)

First four bytes is start Entry ID

Second four bytes is end Entry ID

----------------------------------

Note : If the end entry is less than the start entry the file is 
empty.

----------------------------------

Each entry in the file then has the following format

First 4 bytes - Entry ID

0x20 - Marks start of subject

Subject

0x0a - Marks end of subject

0x20 - Marks start of mail address

Mail address

0x0a - Marks end of mail address

4 bytes of C-style time_t timestamp(ie seconds since 01/01/1970)

Message ID

0x0a - Marks end of message ID

Reference List(delimited by 0x20)

ie A ref list with three references

Ref1
0x20
Ref2
0x20
Ref3
0x0a - Marks end of ref list

7 bytes

xx - ??
xx - ??
xx - ??
xx - Length (MSB ???)
xx - Length (LSB)
xx - ??
xx - ?? - These last two bytes seem to commonly be 00 01 or 00 00

If anyone knows anything further about the last 7 bytes please get 
in touch with me. 




Last Update: Friday, March 27th, 1998