[Seaside] Fast text search and proper design

Adrian Lienhard adi at netstyle.ch
Sun Dec 12 17:04:09 CET 2004


A cheap way to get full text search with GOODS (or any other OODB which 
does not support it) could be to additionally use a RDBMS to store the 
strings and perform the search query. The objects with searchable 
strings can be stored unstructured in one table - just the strings and 
an identifier to get the matching obejects out of the OODB would be 
needed.

Has anybody used something like that?

Cheers,
Adrian

On Dec 12, 2004, at 12:17 PM, Avi Bryant wrote:

> On Sat, 11 Dec 2004 19:50:59 -0500, Benjamin Pollack
> <benjamin.pollack at gmail.com> wrote:
>
>> For this particular project, I would like to use GOODS rather than an
>> RDBMS, but I need the ability to search very quickly through large
>> amounts of text and return appropriate matches. I cannot find any
>> effective data structure to use for that. Avi's B-tree won't really
>> help, as near as I can tell, because I want to search *within*
>> strings, and the default string operators likewise seem to be too
>> primitive to be effective for this.
>
> Scott Crosby wrote a fulltext search engine for Squeak a while back; I
> suspect it would take a little bit of tweaking to make it work with
> recent Squeak versions, but it would be a lot less work than starting
> from scratch.  AFAIK it was only ever posted as a series of
> enhancements to the mailing list, starting with this one:
>
> http://impara.de/~bert/swiki.gsug.org/sqfixes/2203.html
>
> If you were rolling your own, you certainly wouldn't want to use BTree
> for text; instead, use the TSTree that is included in the BTree
> package.  This does allow simple prefix and near-match searches on
> individual strings, but doesn't provide a fulltext facility out of the
> box.  What I've done in the past is to tokenize the text, and store a
> set of objects in the TSTree at each token - if you're searching for
> multiple tokens, you just intersect their sets.  If you expect these
> sets to be big, and they have some immutable attribute you can sort
> them on, you may want to use a TreeSet (also in the BTree package) to
> hold them.
>
> For lots of good information on the basics behind fulltext searching,
> see Tim Bray's "on search" series of articles:
>
> http://www.tbray.org/ongoing/When/200x/2003/07/30/OnSearchTOC
>
> HTH,
> Avi
> _______________________________________________
> Seaside mailing list
> Seaside at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/listinfo/seaside
>
>
___________________
Adrian Lienhard
www.adrian-lienhard.ch
www.netstyle.ch



More information about the Seaside mailing list