Moore's law and why persistence may not be necessary.

Scott A Crosby crosby at qwes.math.cmu.edu
Thu Jan 24 04:28:35 UTC 2002


On Wed, 23 Jan 2002, Bijan Parsia wrote:

> On Wed, 23 Jan 2002, Scott A Crosby wrote:
>
> > On Wed, 23 Jan 2002, Tim Rowledge wrote:
>
> Much depends on the size of the material indexed, etc. etc. My *guess* is
> that ReferenceStreaming, giving the right objects, is going to be faster
> than regeration...but that's just a guess. I've not gotten profile cases
> together yet to prove this.
>

Fastest is to not store the index outside the image. :)


> > With the stock VM[*], and using 10mb of method text, the indexing
> > rate is about 25kb of text/second, with the index about 70% of the size of
> > the text.
>
> I suspect that this is with relatively unsophisticated dbs. I.e., the

I am correcting for that.

> > Thus, performance can vary all over the map from what it should be.
>
> I'll take Scott's word on this. I'm *very* sanguine about it all. It's
> looking good.
>

Heh.. You might not want to be..  Make one change to what you have..

Have it use Set (not IdentitySet) to store the Documents.

This is probably the right thing, otherwise, there's no way to #remove: a
document unless you can supply one #== the one you earlier #add:'ed.

Make that change, and perhaps find yourself spending 99% in String>>hash

And if a class doesn't have its own #hash or you use IdentitySet, and you
don't have my identityHash patch[*], and you have more than 4000 documents
in any intermediate set, you'll spend 99% of your time in Set>>scanFor:


[*] If you try it, rebuild all indexes afterwards. There is some fragility
in that patch with dealing with large IdentitySet's.

> headers as anything else...although keys like From will point to *large*
> sets of documents (19000!), that takes up space and gets slow in the
> indexing I suspect, for the reasons scott alludes to above; note that

So, make your adaptor not return header keywords. :)

> Scott feel confident that one can improve large collection handling; even
> if not, standard techniques can do a lot).

Unlikely. I've already got a patch to fix the flaw. The only extra stuff I
can do is to give people pointers on how to make their own stuff behave
better with the existing collections.

> easy to use. Scott, wanna do up a SqueakNews article? I'll help!

Maybe..

> > [*] My VM with my method cache and using ITIMER is over twice as fast at
> > indexing. ITIMER could probalby be replaced with Raab's fix to
> > primitiveResponse.
>
> Er...get I get a copy?
>

Its been on the list for, oh, about 3 months. :)

Look for:   VMDev+commonSendFixup.cs

Or, I can send you a linux binary.


Scott





More information about the Squeak-dev mailing list