Searcheable BTree
Avi Bryant
avi at beta4.com
Wed Sep 15 20:52:18 UTC 2004
On Sep 15, 2004, at 7:21 PM, Sebastian Sastre wrote:
> I was reading about tries. I think the tries are pretty much like we
> need but the implementation is not trivial enough to develop it now.
> Perhaps at the long run.
> Anyway I think the IndexedCollection could help a lot by now.
>
> Please let me know I you has the range key selection of the BTree.
Well, I didn't touch BTree today, but I did release to SM a new version
of the BTree package: it now includes a TernarySearchTree (class
TSTree), which strikes a nice balance between Tries and BTrees. From
the reading I've done, it's likely the best data structure to use
whenever you have string keys stored in an OODB. As well as the usual
#at: and #do:, it has #matchesForPrefix: (and #matchesForPrefix:do:),
as well as #matchesForString:distance: and
#matchesForString:distance:do:. These last two are interesting - they
give you the values for any keys within a specified Levenshtein
distance (adds/removes/modifies of chars) of the provided string. So,
for example,
words matchesForString: 'squeak' distance: 2
might give the values for the keys 'squeaker' and 'squawk', and
words matchesForString: 'squeak' distance: 1
would give the values for 'squeal' and 'squeaky'.
The basics are there, but feel free to fill out any further protocol
you need. For example, the article on Ternary Search Trees below
describes implementing a pattern-matching search, where you can leave
some characters as wildcards:
http://home.od.ua/~relayer/algo/data/tern/lead.htm
It may also be useful to have some keysAndValues variants of the
methods which I've implemented as values-only.
Avi
More information about the Squeak-dev
mailing list
|