On 04/28/2010 09:31 PM, Herbert König wrote:
Hi Charles,
seems you are on top of things. So just a few remarks. My experience is from Squeak 3.8 so you should check if what I say holds true for current Squeak.
Check out the UTF8 speed. I combine tab delimited files from disparate sources into more complex objects and write out new files. First thing was to change to non UTF8 for speed reasons. Seems you can't do this.
I'm not worried about speed for this first part, and for the follow-up I'm more worried about computational speed than utf8 reading speed. If I can't depend on virtual memory and automatic roll-in/out (nobody seems to offer that!) then it means LOTS of database interaction. Which is where I get worried about Magma...as apparently it holds a partial reference to everything in RAM.
CH> I looked at Magma, and couldn't figure out whether it would be useful or CH> not. I've no idea just how fast it is, how capacious it is, or how much
Chris Muller is on Squeak dev and I'm sure he will be able to tell you if you would hit the limits of Magma. Gjallar (www.Gjallar.se) uses Magma in a commercial project (last time I looked).
CH> ahead of time. And I want locally separate files, so I guess I'd CH> probably use sqlite or Firebird. With Sqlite I might need to have CH> multiple databases to handle the final system, so it would probably be CH> best to partition things early. (Either that or build some sort of CH> hierarchical storage system that rolled things from database to database CH> depending of how recently it was accessed.)
SqueakDbx or (openDbx in other languages) might be of interest. I use mysql from Squeak in a commercial setting, no problems.
That is of interest, but MySql is in the same boat as PostGreSQL with having a system level database rather and separate database files. This makes many of the uses that I intend problematical...and difficult at best. Both Firebird and Sqlite, however, allow specified db files. Sqlite is more common, so that's probably what I'll choose, even though Firebird has a reputation for being more efficient. (However I think both are supported by openDbx, so probably also by SqueakDbx.)
CH> I'm guessing that FileStream would handle file BOM markers gracefully. CH> (Most of my files are utf8 with BOM markers at the head.) This isn't
Just try it to be sure..
Yeah, that will be a part of the first test.
CH> (I wouldn't need any fancy mapper. If I weren't dealing with LOTS of CH> variable length arrays of variable length strings, I could just fit the CH> data into a simple C struct without any pointers whatsoever. So all I CH> need is to be able to save a list of lists of chars, plus a few integers CH> that would all fit comfortably into 32 bits. [Many of them would fit CH> into 8 bits.])
CouchDB has caught my attention for inhomogeneous data, scalability, replication. But then I consider javascript a nice functional language and I like JSON (available in Squeak). At least look at map reduce algorithm for being able to utilize multi-core or multiple boxes. Whatever language you choose.
Multiple boxes isn't particularly interesting, but I'm expecting the number of cores/box to ramp up quickly over the next decade...and that *is* interesting.
CH> later, and D doesn't have much in the way of concurrency handling. I'm CH> not sure that Hydra counts...though it sounds like I need to look into CH> it. The question would be how to programs running on separate virtual CH> machines communicate with each other.
Two different issues, Hydra addresses one single machine and does not support current Squeak (recent discussion on Squeak dev). The other issue is communicating via network. This is where you'll end up.
I don't expect to end up "communicating via network", except, perhaps, via localhost. But I do expect to end up running several processes, probably on different cores. This causes many, but not all, of the same problems. (Current support is less important, as this is something a bit off in the future. But it needs to be planned for now, before I start writing the code.) Guess I'll see if I can find that "Squeak dev" discussion. Perhaps Dbus is the correct answer...I've only skimmed over its specs, but it looks plausible. (Getting info back from separate processes seems a major problem with most of the approaches. It may well turn out that TCP over UnixSockets is the best approach available..though I *would* like something better.)