[Newbies] Pre-Getting started info: Unicode, utf8, large memory need

Herbert König herbertkoenig at gmx.net
Thu Apr 29 04:31:46 UTC 2010


Hi Charles,

seems you are on top of things. So just a few remarks. My experience
is from Squeak 3.8 so you should check if what I say holds true for
current Squeak.

Check out the UTF8 speed. I combine tab delimited files from disparate
sources into more complex objects and write out new files. First thing
was to change to non UTF8 for speed reasons. Seems you can't do this.

CH> I looked at Magma, and couldn't figure out whether it would be useful or
CH> not.  I've no idea just how fast it is, how capacious it is, or how much

Chris Muller is on Squeak dev and I'm sure he will be able to tell you
if you would hit the limits of Magma. Gjallar (www.Gjallar.se) uses
Magma in a commercial project (last time I looked).

CH> ahead of time.  And I want locally separate files, so I guess I'd
CH> probably use sqlite or Firebird.  With Sqlite I might need to have
CH> multiple databases to handle the final system, so it would probably be
CH> best to partition things early.  (Either that or build some sort of
CH> hierarchical storage system that rolled things from database to database
CH> depending of how recently it was accessed.)

SqueakDbx or (openDbx in other languages) might be of interest. I use
mysql from Squeak in a commercial setting, no problems.

CH> I'm guessing that FileStream would handle file BOM markers gracefully.
CH> (Most of my files are utf8 with BOM markers at the head.)  This isn't

Just try it to be sure..

CH> (I wouldn't need any fancy mapper.  If I weren't dealing with LOTS of
CH> variable length arrays of variable length strings, I could just fit the
CH> data into a simple C struct without any pointers whatsoever.  So all I
CH> need is to be able to save a list of lists of chars, plus a few integers
CH> that would all fit comfortably into 32 bits.  [Many of them would fit
CH> into 8 bits.])

CouchDB has caught my attention for inhomogeneous data, scalability,
replication. But then I consider javascript a nice functional language
and I like JSON (available in Squeak). At least look at map reduce
algorithm for being able to utilize multi-core or multiple boxes.
Whatever language you choose.

CH> later, and D doesn't have much in the way of concurrency handling.  I'm
CH> not sure that Hydra counts...though it sounds like I need to look into
CH> it.  The question would be how to programs running on separate virtual
CH> machines communicate with each other.

Two different issues, Hydra addresses one single machine and does not
support current Squeak (recent discussion on Squeak dev). The other
issue is communicating via network. This is where you'll end up.


-- 
Cheers,

Herbert   



More information about the Beginners mailing list