Re: Readmail correction

From: rex (rex@ptw.com)
Date: Mon, 9 Aug 1999 00:56:46 -0700

On Sun, Aug 08, 1999 at 02:16:48PM -0700, Howard Schwartz wrote:
>
> I expect the way most software parses Soup headers is via a linked list
> approach: Read the first 4 bytes of the file to find the size of the
> first message, goto to that offset, and read the next 4 bytes for the
> size of the next message, etc. With that approach, even a single byte
> missing or abnormally present will throw off the parsing of all the
> messages. Thus, SOUP mail programs will be unable to read a mail
> file after readmail deletes some messages from it.
>
> If I am correct on this, what is the advantage of such a demanding
> header format? Why is it not better to just pick a distinctive string,
> like the Unix ``From '' to start each header?

It's fast. I've written software to parse both formats and SOUP is
_much_ faster. Instead of having to read every byte looking for a LF
followed by a "From " to determine start-of-message, SOUP allows you
to directly go to the next message without reading any of the
intervening text. True, there's no easy way to recover from a single
bit error in the 32 bit message length record, or a missing or added
byte in the message body, but these should be rare events unless the
hardware is faulty, or buggy software is munging the messages.

My software is a post-processor for Yarn SOUP packets that reads
each message header and alters the "From: " line if certain
information is present. Since the message length is changed, a
new 32 bit length record has to be calculated, but it isn't a
problem. If Yarn were not paternalistic and allowed full header
editing (as Mutt does), I would not have needed to write it.

-rex

-- 
If you think C++ is not overly complicated, just what is a protected
abstract virtual base pure virtual private destructor, and when was
the last time you needed one?
       -- Tom Cargil, C++ Journal.