I think you *did* answer my questions. In a way that means a lot of extra work for me. Too much of what I want to do depends on things that are currently experimental in Smalltalk. It sounds like the image can't load lazily, which would probably be necessary if this were to work at all. (Yeah, the 64-bit image could hold enough, but I don't have the RAM to hold it all, and getting that much RAM is ridiculous, when most of it would be rolled out most of the time.)
If I'm going to need to use a database, and handle my own rolling in and out anyway, then Smalltalk isn't a good choice. And while multiple processing is only a speed-up thing, that's a pretty important thing in and of itself.
Gemstone isn't a good choice as I need a FOSS distributable. (Actually, if I'm reading the web site properly they don't mention what their license is, and it seems as if their Smalltalk version is Pharo...which we've already covered.)
FWIW, I'm well aware that I'm trying to run too much program on too small a system. I know this implies a massive speed penalty. But that's true whatever approach I take. I was hoping that I could avoid doing my own memory management, and for that Smalltalk appeared the only feasible choice. Apparently, however, I'm trying something a bit beyond the bleeding edge at the current state of the art.
As to more details as to what I'm planning: So what I'm going to need to do is connect the graph nodes by id#s, and roll them in from a database and stick them in a dictionary (indexed by id#, as most of the nodes won't have any other unique and persistent id). This is necessary as each node will link to up to around 80 other nodes, with some of the links being bidirectional, but not dependably so. And I'll need another index of "words" which are indexes from external symbols into nodes. Doing it this way, most of it can be kept rolled out most of the time, but there's an obvious speed penalty. So I'll need to track which references are stale and roll them out to disk (or just drop them, if they aren't dirty). Etc. Much of this would have been handled automatically in Smalltalk, but not the automatic roll out, apparently. (In Smalltalk I'd use references rather than id#s, in fact id#s wouldn't have been needed.) I'll probably write the first version in Python (rather than Ruby, because Doxygen documentation for Python is better than I can generate for Ruby, though Ruby is in some other ways better). Then, when it's working I'll translate it into D or Ada. (Not yet decided, though D has the inside track. Ada has wider support, but D is garbage collected and has variable sized arrays and built-in hash tables. Ada currently has a better interface to databases, but D is improving much more rapidly. And D program design structures are more similar to those of Python. Of course Vala is an outside chance. But it's been developing quite slowly. And Go seems headed in a different direction, even though it has an easier support for concurrency.)
P.S.: Were Smalltalk suitable I'd be needing to repartition my disk to give me a much larger virtual memory space. Currently I'm only set up for around 1.5 Gigabytes, which should be enough for the first few months, but would limit what else I could be doing towards the end of that time.
P.P.S: I also considered a graph database, Neo4j, but they don't support enough information on the links...though I could coerce integers into floating point, the loss of precision was worrying. This isn't a problem that would show up until the id#s started to get large, but that's not very reassuring. Also too much appears to need to be decided at compile time rather than at run time, and this is a very dynamic system (or it had better be!).
Thank you for your help, and good reporting of the current state of the environment.
On 10/31/2013 11:40 AM, Levente Uzonyi wrote:
On Thu, 31 Oct 2013, Charles Hixson wrote:
I'm contemplating a project that would benefit greatly by a persistent memory image, though I'll eventually (in a year or so) need the 64-bit image, but: The image will be a lot larger than RAM. It would include a directed graph
The current garbage collector is not suitable for large images. GC delays become noticable when the image grows over a few hundred MBs. Eliot is working on a better one, but we don't know how it performans until it's ready.
I don't see how your image could be a lot larger than RAM. It's technically possible, but it's pretty likely that it would be too slow to be practical.
that had an index of a million or so entries, and most nodes wouldn't be indexed. So in order to even load it would need to use some sort of lazy access. And I'm not even sure that a Dictionary of over a million items is reasonable. (Naturally none of the examples address this problem.)
The perfomance of Dictionary mainly depends on the implementation of #hash and #= of the objects you want to store in it.
Additionally, all of my (written) documentation is so old that it doesn't even discuss multi-processor systems, so I don't know whether modern Smalltalks make any use of additional available processors.
Squeak/Pharo don't support them from a single image. There are experimental VMs designed for multi-processor systems (RoarVM, HydraVM), but AFAIK none of them is ready for production use.
I'd really like some advice, and possibly some references. I know that Smalltalk has the reputation for being slow (yes, I've been reading about the recent speed-ups), but much of what I'd need to write in any other language seems like it may already be present in Smalltalk, so if it would work, I'd like to choose it. But I won't be able to test this until the application has been running for quite awhile, so I would be very desirable that I know ahead of time.
It's hard to tell more without knowing more details about the project.
Levente
P.S.: you might want to check out GemStone/S http://gemtalksystems.com/index.php/products/gemstones/
-- Charles Hixson
Beginners mailing list Beginners@lists.squeakfoundation.org http://lists.squeakfoundation.org/mailman/listinfo/beginners