MagmaCollectionReader behavior

Chris Muller asqueaker at gmail.com
Wed Apr 1 02:55:02 UTC 2009


Hi Miguel, MagmaCollections can absolutely

> give accurate and exact results and by
> itself

, *can* be used for finding objects.  You *don't* always

> have to apply some
> kind of searching over the already reduced collection represented by
> MagmaCollectionReader in order to find the *exact* match you are trying to
> locate.

The key concept you've stumbled on here is that MagmaCollectionIndexes
do only provide a *finite* key space.  But how you decide to utilize
that key-space, e.g., convert your objects into an integral key value,
as well as the size of the key-space in bits, determines whether
duplicate keys will occur or not.

If you want to use a big, fat, e-mail address as a "unique
identifier", you will be better served to use most of a 64 or 128-bit
key-space than a small percentage of a 400-bit key-space.  Using only
the alpha range:

    (MaSearchStringIndex attribute: #email) keySize: 128; beAlpha; yourself

provides 27 meaningful characters, enough for probably 99% of e-mail
addresses.  A 256-bit alpha index would provide 54 meaningful
characters but using the post-detect: on a 128-bit is a better choice;
since even that will probably only detect through one element 99% of
the time).

Please don't say "but alpha does not support the @ or . character."
To maximize efficiency, you really need to make your own
MagmaEmailIndex subclass which defines its own character map and uses
it appropriately and efficiently.

Or, another thing you could do is break apart the email into three
separate entries and small key-space indexes for all three:

  miguel at gmail.com

becomes the entries #('miguel' 'gmail' 'com') and index each user at
all three.  Then, to find your user you could simply perform an
appropriately conjuncted where:

  myUsersMagmaCollectoin where: [ : each | (each first = 'miguel') &
(each second = 'gmail') & (each third = 'com') ]

There are other solutions to be sure...

  - Chris


2009/3/31 Miguel Enrique Cobá Martínez <miguel.coba at gmail.com>:
> It is not clear from the magmaseaside tutorial, but the code from
> http://wiki.squeak.org/squeak/6021:
>
> initialize
>        | users |
>        users := MagmaCollection new.
>        users addIndex: (MaSearchStringIndex attribute: #email) beAscii.
>       self at: #users put: users
>
> findUserByEmail: anEmail
>        ^ (self users where: [ :each | each email equals: anEmail ] )
> firstOrNil
>
> without any doubt suggests that the where: method and the
>
> each email equals: anEmail
>
> gives a *exact* or *equal* match, but that is not the case.
> In fact, the where: send returns a MagmaCollectionReader that stands for the
> *set* or *collection* of objects that matched the equals: method in direct
> relation with the index created for the MagmaCollection.
>
> In this example, the index is created with the default (no keySize:
> especified) of 32 bits that merely gives you 4 meaningful characters when
> searching for a string, i.e. if you have users with emails like:
>
> user   email
> 1    'miguel at domain1.com'
> 2    'miguel at domain2.com'
> 3    'miguel.coba at domain3.com'
>
> a message send like:
>
> findUserByEmail: 'miguel at domain1.com'
>
> will give you a MagmaReader that represents the 3 users in the database,
> because they all share the same 4 initial characters. After that, the
> firstOrNil message, ensure that the user # 1 will *always* be returned, no
> matter what argument you are passing to findUserByEmail. So, the answer from
>
> findUserByEmail: 'miguel at domain1.com'
> findUserByEmail: 'miguel at domain2.com'
> findUserByEmail: 'miguel.coba at domain3.com'
>
> will be always user #1.
>
> In summary, the method doesn't has a right behavior, because it can't be
> used for finding a specific user, that is the intended action.
>
> After reading the Index documentation from the magma site, it was clear that
> this MagmaCollectionReader can't give accurate and exact results and by
> itself it can't be used for finding objects. You *always* have to apply some
> kind of searching over the already reduced collection represented by
> MagmaCollectionReader in order to find the *exact* match you are trying to
> locate.
>
> So the code should be something like:
>
> findUserByEmail: anEmail
>
>  | user |
>        "Here you are working over the entire magma repo"
>  user := (self users where: [:each | each email equals: anEmail])
>           "Here you are working over the reduced set
>            returned by the where and represented by a
>            MagmaCollectionReader"
>              detect: [:each |
>                "Here you are working on a plain Collection"
>                each email = anEmail ]
>              ifNone: [nil]. "
>        ^ user
>
>
> After changing the code this way, the example correctly can find the users
> with emails 'miguel at domain1.com', 'miguel at domain2.com' and
> 'miguel.coba at domain3.com'.
>
> Can someone confirm that this is the correct way to use a
> MagmaCollectionReader?
>
> P.D. I tried with a larger keySize: at index creation (I even try 400 bits)
> but this only postponed the point where the string matching stop working.
> Also, it is not efficient and with 400, squeak throws an error.
> So that was not the way to go.
>
> Thank for your comments,
> Miguel Cobá
> _______________________________________________
> Magma mailing list
> Magma at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/mailman/listinfo/magma
>


More information about the Magma mailing list