[Newbies] Re: Is Squeak/Pharo an appropriate language choice?

Levente Uzonyi leves at elte.hu
Thu Oct 31 20:42:53 UTC 2013

On Thu, 31 Oct 2013, Charles Hixson wrote:

> I think you *did* answer my questions. In a way that means a lot of extra 
> work for me.
> Too much of what I want to do depends on things that are currently 
> experimental in Smalltalk.  It sounds like the image can't load lazily, which 
> would probably be necessary if this were to work at all.  (Yeah, the 64-bit 
> image could hold enough, but I don't have the RAM to hold it all, and getting 
> that much RAM is ridiculous, when most of it would be rolled out most of the 
> time.)
> If I'm going to need to use a database, and handle my own rolling in and out 
> anyway, then Smalltalk isn't a good choice.  And while multiple processing is 
> only a speed-up thing, that's a pretty important thing in and of itself.

If I understand correctly, Magma[1] and Glorp[2] can both help you with 
this project. The former is a pure smalltalk object database, the latter 
is an ORM using PostgreSQL.

> Gemstone isn't a good choice as I need a FOSS distributable. (Actually, if 
> I'm reading the web site properly they don't mention what their license is, 
> and it seems as if their Smalltalk version is Pharo...which we've already 
> covered.)

GemStone/S is not open source, but there's a free version with some 
resource limitation (1 CPU/16 GB IIRC). They use their own Smalltalk 
implementation, but it has no GUI, so they wrote tools in Squeak/Pharo, 
which let you develop your code.

> FWIW, I'm well aware that I'm trying to run too much program on too small a 
> system.  I know this implies a massive speed penalty. But that's true 
> whatever approach I take.  I was hoping that I could avoid doing my own 
> memory management, and for that Smalltalk appeared the only feasible choice.

Magma and Glorp can both help you with this.


[1] http://wiki.squeak.org/squeak/2665
[2] http://glorpwiki.wikispaces.com/How+Glorp+Works

> Apparently, however, I'm trying something a bit beyond the bleeding edge at 
> the current state of the art.
> As to more details as to what I'm planning:
> So what I'm going to need to do is connect the graph nodes by id#s, and roll 
> them in from a database and stick them in a dictionary (indexed by id#, as 
> most of the nodes won't have any other unique and persistent id).  This is 
> necessary as each node will link to up to around 80 other nodes, with some of 
> the links being bidirectional, but not dependably so.  And I'll need another 
> index of "words" which are indexes from external symbols into nodes.  Doing 
> it this way, most of it can be kept rolled out most of the time, but there's 
> an obvious speed penalty.  So I'll need to track which references are stale 
> and roll them out to disk (or just drop them, if they aren't dirty).  Etc. 
> Much of this would have been handled automatically in Smalltalk, but not the 
> automatic roll out, apparently.  (In Smalltalk I'd use references rather than 
> id#s, in fact id#s wouldn't have been needed.)
> I'll probably write the first version in Python (rather than Ruby, because 
> Doxygen documentation for Python is better than I can generate for Ruby, 
> though Ruby is in some other ways better). Then, when it's working I'll 
> translate it into D or Ada. (Not yet decided, though D has the inside track. 
> Ada has wider support, but D is garbage collected and has variable sized 
> arrays and built-in hash tables.  Ada currently has a better interface to 
> databases, but D is improving much more rapidly.  And D program design 
> structures are more similar to those of Python.  Of course Vala is an outside 
> chance.  But it's been developing quite slowly.  And Go seems headed in a 
> different direction, even though it has an easier support for concurrency.)
> P.S.:  Were Smalltalk suitable I'd be needing to repartition my disk to give 
> me a much larger virtual memory space.  Currently I'm only set up for around 
> 1.5 Gigabytes, which should be enough for the first few months, but would 
> limit what else I could be doing towards the end of that time.
> P.P.S:  I also considered a graph database, Neo4j, but they don't support 
> enough information on the links...though I could coerce integers into 
> floating point, the loss of precision was worrying. This isn't a problem that 
> would show up until the id#s started to get large, but that's not very 
> reassuring.  Also too much appears to need to be decided at compile time 
> rather than at run time, and this is a very dynamic system (or it had better 
> be!).
> Thank you for your help, and good reporting of the current state of the 
> environment.
> On 10/31/2013 11:40 AM, Levente Uzonyi wrote:
>> On Thu, 31 Oct 2013, Charles Hixson wrote:
>>> I'm contemplating a project that would benefit greatly by a persistent 
>>> memory image, though I'll eventually (in a year or so) need the 64-bit 
>>> image, but:
>>> The image will be a lot larger than RAM.  It would include a directed 
>>> graph
>> The current garbage collector is not suitable for large images. GC delays 
>> become noticable when the image grows over a few hundred MBs. Eliot is 
>> working on a better one, but we don't know how it performans until it's 
>> ready.
>> I don't see how your image could be a lot larger than RAM. It's technically 
>> possible, but it's pretty likely that it would be too slow to be practical.
>>> that had an index of a million or so entries, and most nodes wouldn't be 
>>> indexed.  So in order to even load it would need to use some sort of lazy 
>>> access.  And I'm not even sure that a Dictionary of over a million items 
>>> is reasonable.  (Naturally none of the examples address this problem.)
>> The perfomance of Dictionary mainly depends on the implementation of #hash 
>> and #= of the objects you want to store in it.
>>> Additionally, all of my (written) documentation is so old that it doesn't 
>>> even discuss multi-processor systems, so I don't know whether modern 
>>> Smalltalks make any use of additional available processors.
>> Squeak/Pharo don't support them from a single image. There are experimental 
>> VMs designed for multi-processor systems (RoarVM, HydraVM), but AFAIK none 
>> of them is ready for production use.
>>> I'd really like some advice, and possibly some references.  I know that 
>>> Smalltalk has the reputation for being slow (yes, I've been reading about 
>>> the recent speed-ups), but much of what I'd need to write in any other 
>>> language seems like it may already be present in Smalltalk, so if it would 
>>> work, I'd like to choose it.  But I won't be able to test this until the 
>>> application has been running for quite awhile, so I would be very 
>>> desirable that I know ahead of time.
>> It's hard to tell more without knowing more details about the project.
>> Levente
>> P.S.: you might want to check out GemStone/S 
>> http://gemtalksystems.com/index.php/products/gemstones/
>>> -- 
>>> Charles Hixson
>>> _______________________________________________
>>> Beginners mailing list
>>> Beginners at lists.squeakfoundation.org
>>> http://lists.squeakfoundation.org/mailman/listinfo/beginners
> -- 
> Charles Hixson
> _______________________________________________
> Beginners mailing list
> Beginners at lists.squeakfoundation.org
> http://lists.squeakfoundation.org/mailman/listinfo/beginners

More information about the Beginners mailing list