[ENH] CelesteSpeedup

Stephan Rudlof sr at evolgo.de
Tue May 16 23:23:57 UTC 2000


Dear Celeste-Squeakers,

I think time is near to take a look at Celeste again to decide, if it
could replace Netscape Messenger for me....

Thank you all for your work!


Stephan

danielv at netvision.net.il wrote:
> 
> My changes basically remove limits on scalability in Celeste. If your
> mail database is small (for example, you just started using Celeste, and
> don't import your old mail) you probably won't notice them.
> 
> The only reason I'm spending alot of energy on fixing performance bugs
> is that I have imported all my old mail into Celeste, and now hold
> around 24,000 messages in it.
> 
> Let's get technical -
> Celeste uses Sets and Dictionaries quite a lot. It has Sets of messages
> (categories), and a Dictionary of messages (the Index - index number ->
> IndexFileEntry). For example, every time you enter a category, Celeste
> will go through every entry in the Dictionary (24,000 messages) and
> check whether the current category includes: each. If we have lots of
> messages to check, then each check must take very little time. Sets are
> implemented using hash tables, and hash tables are generally good at
> this - it usually takes fixed time, no matter how many entries are in
> the Set.
> 
> However, sometimes, hash tables "break".
> 
> Example 1 - SmallIntegers (like message index numbers) return themselves
> as their hash value. This is usually fine, except if alot of the numbers
> are consequtive. When you put values into a hash table which have
> consequtive hash numbers, it suddenly takes it alot of time to prove
> that a value is missing in it. In Celeste this happens if you import all
> the mail you used to have in one category at once, or generally, if you
> lots of mail entered at once that goes into a category. And then
> suddenly, changing categories becomes very slow.
> 
> Example 2 - Using an objects hash value to find it a place in a hash
> table requires some math on it's hash value. This is very cheap on
> SmallIntegers. However, IdentitySets have a tweak that makes better of
> the 12bits that most object types have. This same tweak multiplies the
> hash, making it bigger. Sometimes, it makes it overflow into
> LargeInteger land, where these tiny calculations suddenly become far
> more meaningful. Celeste messge indexes are often large numbers.
> Some tricks I used to solve the first problem also aggrevated the
> second.
> 
> The Set and Dictionary variations I sent make both kinds of problems
> very, very improbable, and thus are generally more robust than what we
> currently use. This has a small fixed price in performance, and make
> entering my Squeak category take 1.3 seconds, rather than >4 seconds.
> 
> BTW, I look forward to using your fix to take that further down. Another
> performance bug I want dead - to choose the 200 messages that should be
> displayed, Celeste currently checks all 24,000 messages in list -
> whether they're in the category, and for those that are, whether they
> fit the current filters, and then ignore all but the last 200. This
> means that entering categories is still dependent on their size, and the
> DB size, especially when we use filters...
> 
> The last batch of changes doesn't touch the on disk system at all. One
> of the changes before that made saves of the Index file incremental,
> making a very big difference on my system. It did this by introducing a
> new (.log) file with a format very similar to the index file.
> 
> mdr at scn.org (Mike Rutenberg) wrote:
> > Daniel,
> >
> > A couple of questions about your changes...
> >
> > What sorts of operations does it speed up in Celeste and by roughly how much?
> > Do they change the on-disk format of the mail files?  (It doesn't look like they do, but I wanted to check).
> >
> > Thanks for your work on Celeste!
> >
> > Mike

-- 
Stephan Rudlof (sr at evolgo.de)
   "Genius doesn't work on an assembly line basis.
    You can't simply say, 'Today I will be brilliant.'"
    -- Kirk, "The Ultimate Computer", stardate 4731.3





More information about the Squeak-dev mailing list