Searcheable BTree

Avi Bryant avi at beta4.com
Wed Sep 15 20:52:18 UTC 2004


On Sep 15, 2004, at 7:21 PM, Sebastian Sastre wrote:

> I was reading about tries. I think the tries are pretty much like we
> need but the implementation is not trivial enough to develop it now.
> Perhaps at the long run.
> Anyway I think the IndexedCollection could help a lot by now.
>
> Please let me know I you has the range key selection of the BTree.

Well, I didn't touch BTree today, but I did release to SM a new version 
of the BTree package: it now includes a TernarySearchTree (class 
TSTree), which strikes a nice balance between Tries and BTrees.  From 
the reading I've done, it's likely the best data structure to use 
whenever you have string keys stored in an OODB.  As well as the usual 
#at: and #do:, it has #matchesForPrefix: (and #matchesForPrefix:do:), 
as well as #matchesForString:distance: and 
#matchesForString:distance:do:.  These last two are interesting - they 
give you the values for any keys within a specified Levenshtein 
distance (adds/removes/modifies of chars) of the provided string.  So, 
for example,

words matchesForString: 'squeak' distance: 2

might give the values for the keys 'squeaker' and 'squawk', and

words matchesForString: 'squeak' distance: 1

would give the values for 'squeal' and 'squeaky'.

The basics are there, but feel free to fill out any further protocol 
you need.  For example, the article on Ternary Search Trees below 
describes implementing a pattern-matching search, where you can leave 
some characters as wildcards:

http://home.od.ua/~relayer/algo/data/tern/lead.htm

It may also be useful to have some keysAndValues variants of the 
methods which I've implemented as values-only.

Avi




More information about the Squeak-dev mailing list