Set versus IdentitySet

Fri Jan 25 13:04:30 UTC 2002

On Fri, 25 Jan 2002, GXran Hultgren wrote:

> Hi guys!
> 
> --- Bijan Parsia <bparsia at email.unc.edu> wrote:
> [SNIP]
> > (Is there a way to get the bytesize of an object in memory?)
> 
> Again I refer to my CSOTD from the other day about ImageSegments.
> There is a method called #doSpaceAnalysis which will tell you instance
> counts, total size etc. Nice.

Yep, found that. Havne't yet refound your CSOTD, will take a peek.

I'd love to target #doSpaceAnalysis a bit, but haven't dug in yet.

> > I'm not clear on RefStream vs. ImageSegments? Saving the image, thus far,
> > the *clear* super speed winner. It takes seconds whereas the alternatives
> > I've thus far tried take minutes at the least.
> 
> ImageSegments are much faster than RefStreams, which is quite natural
> given how they are implemented.

But, it would seem, much slower than image saving? Really, image saving
just happens. Image loading is equally fast. I started an image segment
dump (I *think*) and it was spending more than 20 seconds on the analysis
phase before I aborted :)

I really expect to be able to do better with a tuned serialized, at least
for the case where teh indexes are basically strings-->smallints. Scott's
new skiplisted implementation may complicate that a tad, I don't know.

If I were running a server, I'd *seriously* consider a superduper shrunk
image (or even just a reasonably trimmed image) running on a separate vm,
with all the indices loaded in memory and persisted by snapshotting. I'm
sure something cleverer could do better, but *with less work*? (Especially
by *me me me*! ;)). If the data is reasonable stable (i.e., you aren't
adding large documents many times a second), i.e., like a swiki or mail
database, this is a winning move. Well, so is keeping it in the *same*
image as your server, but I can see good reasons for keeping them separate
(e.g., snapshotting a server tends to interfere with its ability to handle
requests).

Scott's still tuning stuff. It's getting nicer ever...er...hour.

At some point, one hits ye olde squeakly trade off...do you keep it simple
and clear, or do you speed/space optimize the heck out of it? There's
still plenty of room to move, I think.

Cheers,
Bijan Parsia.