Celeste improvements!

danielv at netvision.net.il danielv at netvision.net.il
Tue Oct 2 11:03:05 UTC 2001


The index file has two parts with different priecs for recovery - 
If you lose the id -> message file offsets information, you need to
rescan the messages file. This is unacceptable.

If you lose everything else in there, you need to parse a message from a
known location, instead of parsing a few specific lines. This isn't
really a big deal.

So - let's split the file in two. 

1. An id -> offsets mapping for every single message, in binary form to
make it as fast as possible, and logged (as it is now) to make it hard
to lose information.

2. A cache for preparsed header fields. To make it fast, we'll keep it
small (1000 last entries). Maybe binary, maybe not. Not logged, it's
just a cache.

IndexFile should read the basic index, and the cached portions. Like Lex
said - if we need information that wasn't cached, we parse it from the
messages file. In this case, it applies to whole messages as well as
specific fields.

Price - applying filters other than categories on large numbers of
messages would be slow. Setting the cache as larger allows the user to
trade this off for load/save speed.

Gain - very fast loading, saving and shutdown, much lower price on
crashes, the cache portion of the index is now discardable, so changes
to it are cheap.

What do you guys think?
(If we come to agreement soon, I have some implementation time I could
give it. No promises, but...)

"Lex Spoon" <lex at cc.gatech.edu> wrote:
> It would be nice to have a filter that chooses a particular thread.  I
> think this would help with your goal.  Unfortunately, thread information
> can't be stored in the current index file format, at least not easily. 
> I guess one of the existing lines of the file could be extended with
> more info, as a hack....
> 
> -Lex


> We should really take a good look at what should go in that file before
> we update the format again.  Threading info is part of it, to be sure. 
> Pop's "UID" field would be nice as well, so that "leave messages on
> server" would work better.  Mailing lists would be nice, too, so that
> you can filter directly based on the mailing list (for mailing lists
> that have the required headers).
> 
> An extensible format would be especially nice.  The challenging part is
> to make it be faster than what we have now!  Historically, the index
> file  has been saved in a text format so that it never goes wrong. 
> However, this doesn't seem so improtant when the index is essentially a
> cache of information that is already in the messages file.  I think the
> basic mechanism we'd want is, a way to save dictionaries whose keys are
> strings, and whose values are either strings or integers.  Then we could
> add fields as desired.  Perhaps a header would have a list of fields
> that are loaded at all -- attempts to access fields not in the index
> file could then fall back on reading the message and parsing the header
> out the (very) slow way.
> 
> 
> -Lex




More information about the Squeak-dev mailing list