Moore's law and why persistence may not be necessary.

Scott A Crosby crosby at qwes.math.cmu.edu
Thu Jan 24 02:27:40 UTC 2002


On Wed, 23 Jan 2002, Tim Rowledge wrote:

> How fast does the indexing run? Fast enough to make it sensible to drop
> the indices when saving an image and regenerating on a
> lazy-initialisation basis?

Maybe.. With the stock VM[*], and using 10mb of method text, the indexing
rate is about 25kb of text/second, with the index about 70% of the size of
the text.

According to resuls from Bijarn, he indexes 90mb worth of squeak email in
about 25mb, and thats with full text.

One of the problems we're facing is we're building collections far bigger
than normal. Because few collections in squeak are even a hundred
elements, this code is exposing all kinds of latent performance issues in
Set, IdentitySet, Dictionary, #hash, and how they all interact.

Thus, performance can vary all over the map from what it should be.

Given profiling, the raw max indexing rate I get is about 35kb/sec. This
is probably within 10-30% of the ideal for a stock VM on a P3-600.

Whether this is fast enough.. You tell me. :)

Whether it is small enough. You tell me. :)

--

> Is there a filter/adaptor to exclude a (longish) list of 'common' words
> yet? It seems to me that an index of emails could sensibly exclude many

My cheezy simple (test) filter has this for testing purposes, but someone
should rewrite a new adaptor that does this cleanly.

Patience; the code has only existed for a day. :)

> words (d00d, l337, h4x0r among them :-) )and it might be useful to have
> a separate index for header content. The latter might be small enough to
> be useful even on ordinary machines with memory in the 64mb range.

I wrote the engine which is now basically done. It just needs testing.  I
lack the understanding of the system to actually integrate it into every
part of the system that should use it.

I believe Bijarn is working on an adaptor that only indexes mail headers.

Scott


[*] My VM with my method cache and using ITIMER is over twice as fast at
indexing. ITIMER could probalby be replaced with Raab's fix to
primitiveResponse.




More information about the Squeak-dev mailing list