Full-text search package?

Cees de Groot cg at home.cdegroot.com
Tue Jan 22 23:37:44 UTC 2002


Scott A Crosby <crosby at qwes.math.cmu.edu> said:
>The complexity is in dealing with the disk. If you can assume
>everything is in RAM, things get a *lot* easier. I'm assuming this.
>
The Squeak-dev mailing list archive for January so far is 2.5Mb worth of text,
the collected archives since squeak-dev moved to squeakfoundation.org last
July is around 35Mb.

Now, there is something to say for squeak -memory 512Mb and let Linux
sort of swapping, but it'd probably be better to be able to write the
thing on disk right away.

Luckily, Linux has ReiserFS, which is extremely good at dealing with
large directories (I've got a couple with of tens of thousands of files
in them on our systems) and very small files (it does partial block
allocation). So a simplistic way of just storing bits and pieces with
the oid as filename would work fine at least under Linux and would
not necessarily *not* work on other systems (at worst sub-optimal, but it
would work until someone comes up with a better storage mechanism).

I've once written a simple Java package to store data in a single file with
transactions. I need to port it (volunteers now all run to
jdbm.sourceforge.net ;-)). Or maybe MinneStore can be used to stash stuff away
(all, IMHO, preferable above BDB because of the additional complexity of
having to ship C-code around).

>Oh, will I have neat things for that. :)
>
Can't wait for it, query language looks cool :-)

-- 
Cees de Groot               http://www.cdegroot.com     <cg at cdegroot.com>
GnuPG 1024D/E0989E8B 0016 F679 F38D 5946 4ECD  1986 F303 937F E098 9E8B



More information about the Squeak-dev mailing list