Fwd: Undirect index update

Hilaire Fernandes hilaire at ofset.org
Wed May 20 07:39:46 UTC 2009


Chris, 

This email exchange about indexing is really worthwhile for all of us
using Magma and unsure about the various implications with indexes, not
to mention the discussion is archived and could be of good use for any
new future user of Magma.

To be sure we talk about the same things (excuse the misconceptions from
a non native English reader), I would like to reword to be sure I
understand correctly the issues.

Le mardi 19 mai 2009 à 17:32 -0500, Chris Muller a écrit :
> I only meant to caution about the limited scale associated with
> duplicate keys in a MaHashIndexes.  As duplicate keys accumulate it's
> a worst-case, O(n) linear search for adds, removes AND reads.

By two duplicate keys, do you mean two different index objects (in my
example lexical term objects) producing the same hash value from
indexHashForIndexObject: ?

Or do you mean two different objects indexed with an identical object?
If so, why is it a problem? (I see it as a wished feature)
And then I don't see how to avoid it.


> It may be tempting to add index on the #color of millions of objects.
> Which might be worthwhile there are 24-million possible colors, but if
> only 8 of those colors are used then, as one-million objects are added
> to that collection, the minimum number of collisions at each key is 1M
> / 8..  Access to this index is now slow and will continue to worsen as
> more duplicates are added..

So if I understand correctly, it is a problem when the index range is
limited in regard to the indexed object population extend.
Frankly I think I do have this exact problem with indexes defined over a
limit set of word.
The object I am indexing are LOM metadata of pedagogical resources.
The LOM norm comes with attributes defined from a fixed set of
vocabulary. For example 'cost' is a two values attribute ('yes' or
'no'). In your scenario it is a worst case, but from the user point of
view it is interesting to query only commercial (value 'yes') resources.
Is it a dead end?

Best regards,

Hilaire

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Ceci est une partie de message
	=?ISO-8859-1?Q?num=E9riquement?= =?ISO-8859-1?Q?_sign=E9e?=
Url : http://lists.squeakfoundation.org/pipermail/magma/attachments/20090520/640eccd0/attachment.pgp


More information about the Magma mailing list