Re: [Newbies] Pre-Getting started info: Unicode, utf8, large memory need - Beginners

29 Apr 2010


      I think you might benefit from looking at Gemstone, especially the
 free version.  You haven't mentioned the total size of your
 planned DB, but up to 4GB is free. After that you pay but its
 sufficient to prove what you're doing. They seem to have the features
 you're looking for.
See:
http://seaside.gemstone.com/
for their free version.
They have a mailing list here:
http://seaside.gemstone.com/mailman/listinfo/beta
...
Message: 7
Date: Thu, 29 Apr 2010 11:26:41 -0700
From: Charles Hixson charleshixsn@earthlink.net
Subject: Re: [Newbies]  Pre-Getting started info: Unicode, utf8, large
   memory	need
To: "A friendly place to get answers to even the most basic questions
   about	Squeak." beginners@lists.squeakfoundation.org
Message-ID: 4BD9CF61.1050101@earthlink.net
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
On 04/28/2010 09:31 PM, Herbert König wrote:
...
Hi Charles,
seems you are on top of things. So just a few remarks. My experience
is from Squeak 3.8 so you should check if what I say holds true for
current Squeak.
Check out the UTF8 speed. I combine tab delimited files from
disparate sources into more complex objects and write out new
files. First thing was to change to non UTF8 for speed reasons.
Seems you can't do this.
I'm not worried about speed for this first part, and for the
follow-up I'm more worried about computational speed than utf8
reading speed.  If I can't depend on virtual memory and automatic
roll-in/out (nobody seems to offer that!) then it means LOTS of
database interaction.  Which is where I get worried about Magma...as
apparently it holds a partial reference to everything in RAM.
...
CH>  I looked at Magma, and couldn't figure out whether it would be
CH> useful or not.  I've no idea just how fast it is, how capacious
CH> it is, or how much
Chris Muller is on Squeak dev and I'm sure he will be able to tell
you if you would hit the limits of Magma. Gjallar (www.Gjallar.se)
uses Magma in a commercial project (last time I looked).
CH>  ahead of time.  And I want locally separate files, so I guess
CH> I'd probably use sqlite or Firebird.  With Sqlite I might need
CH> to have multiple databases to handle the final system, so it
CH> would probably be best to partition things early.  (Either that
CH> or build some sort of hierarchical storage system that rolled
CH> things from database to database depending of how recently it
CH> was accessed.)
SqueakDbx or (openDbx in other languages) might be of interest. I
use mysql from Squeak in a commercial setting, no problems.
That is of interest, but MySql is in the same boat as PostGreSQL with 
having a system level database rather and separate database files.
This makes many of the uses that I intend problematical...and
difficult at best.  Both Firebird and Sqlite, however, allow
specified db files. Sqlite is more common, so that's probably what
I'll choose, even though Firebird has a reputation for being more
efficient.  (However I think both are supported by openDbx, so
probably also by SqueakDbx.)
...
CH>  I'm guessing that FileStream would handle file BOM markers
CH> gracefully. (Most of my files are utf8 with BOM markers at the
CH> head.)  This isn't
Just try it to be sure..
Yeah, that will be a part of the first test.
...
CH>  (I wouldn't need any fancy mapper.  If I weren't dealing with
CH> LOTS of variable length arrays of variable length strings, I
CH> could just fit the data into a simple C struct without any
CH> pointers whatsoever.  So all I need is to be able to save a
CH> list of lists of chars, plus a few integers that would all fit
CH> comfortably into 32 bits.  [Many of them would fit into 8
CH> bits.])
CouchDB has caught my attention for inhomogeneous data, scalability,
replication. But then I consider javascript a nice functional
language and I like JSON (available in Squeak). At least look at
map reduce algorithm for being able to utilize multi-core or
multiple boxes. Whatever language you choose.
Multiple boxes isn't particularly interesting, but I'm expecting the 
number of cores/box to ramp up quickly over the next decade...and
that *is* interesting.
...
CH>  later, and D doesn't have much in the way of concurrency
CH> handling.  I'm not sure that Hydra counts...though it sounds
CH> like I need to look into it.  The question would be how to
CH> programs running on separate virtual machines communicate with
CH> each other.
Two different issues, Hydra addresses one single machine and does
not support current Squeak (recent discussion on Squeak dev). The
other issue is communicating via network. This is where you'll end
up.
I don't expect to end up "communicating via network", except,
perhaps, via localhost.  But I do expect to end up running several
processes, probably on different cores.  This causes many, but not
all, of the same problems.  (Current support is less important, as
this is something a bit off in the future.  But it needs to be
planned for now, before I start writing the code.)  Guess I'll see if
I can find that "Squeak dev" discussion.  Perhaps Dbus is the correct
answer...I've only skimmed over its specs, but it looks plausible.
(Getting info back from separate processes seems a major problem with
most of the approaches.  It may well turn out that TCP over
UnixSockets is the best approach available..though I *would* like
something better.)


Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners
End of Beginners Digest, Vol 48, Issue 34