Dear all,
I've been using the 'keywords' notion of creating indexes on a MagmaCollection. I create keywords in which some of them have the same prefix: for instance the keywords can contain: user story 1, user story 2, user story 3, and so forth. (which all donate the same object to be found)
If I then do a query on these keywords using:
magmacollection where: [:eachProductBacklog| eachProductBacklog keywords includesAllPrefixes: aString substrings]
and try to find 'user' as aString, then the object is returned multiple times, which is something I do not expect.
If I look at (part of) the implementation of includesAllPrefixes:
expression addTerm: (self copy from: eachString to: eachString maAlphabeticalNext)
I can understand why multiple results are returned. Magma loops over its keys from 'user' till 'user' maAlphabeticalNext, and hash key for user story 1 , 2 and 3 match, so that result is returned 3 times.
The easiest way to make them unique is to send the distinct message. Is this something that always should be done? The docu on the wiki only mentions to send the distinct keyword if I use an or clause in the query? Of course the includesAllPrefixes: keywords is a sort of 'or'. Are there any other ways to make this work (since distinct is slow) ?
Thanks for any advice,
Kind Regards,
Bart
Hi,
The easiest way to make them unique is to send the distinct message. Is this something that always should be done?
No, it's actually something that should almost never be done until the implementation can be changed. They can be convenient for doing conversions or other special-case queries, but I avoid using distinct: true as well as OR's in my #where: clauses for ad-hoc querying.
Instead, stick to simple where: conditions as much as possible, AND'ed conditions if you must, and then use select: blocks for OR conditions. To remove dups, send #asSet to the Reader.
I really need to change the OR implementation to be more "lazy", so Magma isn't working overtime to finish queries that don't need to be finished. But every time I try to find time to fix this, I discover the above solutions are almost the same and work well enough. Someday I will find the time and will to make using distinct: and OR's better.
includesAllPrefixes: keywords is a sort of 'or'.
It's actually an 'and'. The object's keywords must include 'prefix1' AND 'prefix2', etc. So includesAllPrefixes: will not spawn any of those background server processes.
Are there any other ways to make this work (since distinct is slow) ?
#asSet. Or, use your first-class Page to enumerate (1 to: lastKnownSize) until one page worth of unique results is accumulated; remember the index for the start of each Page.. something like that.
- Chris
magma@lists.squeakfoundation.org