[BETA][DOCS] Design decisions for the full text indexers.

Scott A Crosby crosby at qwes.math.cmu.edu
Mon Jan 28 16:25:56 UTC 2002


On Sun, 27 Jan 2002, Les Tyrrell wrote:

>
> > Sorry.. No luck.. How the indexer works is, for each word, it keeps track
> > of what documents its in. Thus, there is no way to do regexp searching.
> >
> > The only way I know of to get full regexp searching is to scan each
> > document... which is what we wish to avoid.
>
> You should be able to support some regular expression search ability
> based on matching words to the search terms, then from those finding
> the documents living in the intersection of the matching search terms.
>

(Presuming that the adaptor is some variant of a substring extractor.)

Not really. The moment you see a '.*' or [^abcde], then the regexp may
match a space, which makes it infeasible.. Yes, in theory, a conservative
approximation of a regexp could be done, but in practice, IMHO, its so
limited as to be almost utterly useless.  The closest I could come to
faking it would be an ''index allTerms''.

Other more sophisticated search engines can be done by writing a searcher
on top of the indexer. (Where the searcher presumes that it only has to
deal with text--my indexers *do not* have that assumption.) Has access to
rescan the text. This would allow 'nearby' searches, and, through
scanning, even regexp searches.

Lets wait till my engine is out and it is integrated before worrying about
that. :)

Scott




More information about the Squeak-dev mailing list