[Newbies] Re: Is Squeak/Pharo an appropriate language choice?
asqueaker at gmail.com
Sat Nov 2 18:00:36 UTC 2013
Hi Charles, when I saw the description of what you were looking for --
- an object model that larger than the size of available RAM
- transparent access
- keyword access
- multi-core access and updates
I wanted to let you know Magma fits that problem domain like a glove.
Whether it can meet your performance requirements -- only you can
decide, but maybe I can at least clarify some of your questions.
> Long answer (excuse the rambling, I was thinking it through as I wrote it):
> If I'm understanding http://wiki.squeak.org/squeak/2639 correctly, which I
> may not be, I'd still need to recode the entire graph structure to be
> designed in terms of id#s (keys) rather than direct references.
> I.e., I'd need to code it in terms of two collections one of which would
> contain keys that, when interpreted, referenced itself. This does appear to
> move the plan into the area of the possible, but at the cost of the
> advantage that I'd hoped Smalltalk would provide of a large persistent
> image. I thought at first when it was talking about transparency that this
> wouldn't be necessary, but:
No. There is no inherent requirement for any object to have id's.
You can, of course, but ODBMS's, whether GemStone or Magma, access
their objects transparently via direct pointer.
> Magma can maintain and quickly "search" large, flat structures, but the
> normal Smalltalk collections such as Bag or OrderedCollection are not
> suitable for this. The contiguous ByteArray records Magma uses to store and
> transport Smalltalk objects would be impractical for a large Smalltalk
> Seems to mean that the Graph couldn't be stored as something that Magma
> would recognize as a graph.
I'm not sure what you mean by "recognize as a graph" but I don't think
that's correct. MagmaCollections are treated the same as regular
collections, except that they can be very large and with increased
concurrency between sessions.
> So does "Objects are persisted by
> reachability", though that has other possible interpretations. But since
> the graph would contain a very large number of cycles in multiple
> "dimensions"... OTOH http://wiki.squeak.org/squeak/2638 on Read Strategies
> appears to mean that it wouldn't automatically (or rather could be set to
> not automatically) pull in items that are references within the object being
ReadStrategies are a performance optimization only. You should never
use them except in very special cases after observing and diagnosing
> Again, http://wiki.squeak.org/squeak/5722 , may mean that a class with named
> variables holding 4 arrays of arrays of length 3 (reference float float) and
> a few other variables containing things like bools and strings and ints,
> would be handled without problem. But note that each of those references is
> to an item of the same type, and it could include cycles. So I can't decide
> WHAT it means. Do I need to recode the references as id#s? Does that even
> suffice? (If it does, then it's still a good deal. But if I must name each
> entry separately, it's not a good deal at all, as the number of entries in
> each of the 4 outer level arrays is highly variable, and though I intend to
> apply an upper limit, only experiment can determine what a reasonable upper
> limit is.)
Sorry if I'm having trouble understanding your question here. Why
would you need to "recode the references as id's?" ODBMS's preserve
the graph in the exact shape it was committed, including cycles.
> And yet again (if I'm understanding correctly) I'm going to need to violate
> just about every one of the hints on performance in
> http://wiki.squeak.org/squeak/2985 . I'm not sure how much MagmaArray keeps
> in RAM of things that aren't currently in use. At one point it sounded like
> 6 bytes. This is actually a lot of overhead in this kind of a system.
MagmaArray's keep just one "page" of objects in memory at a time. The
default page is 125, meaning 125 objects it references. But you can
change that to anything you want as long as its > 0.
> Additionally, it appears that Magma doesn't have anyway to detect that a
> reference is "stale" (i.e., hasn't been referenced in a long time), an use
> that to decide to roll it out. It looks as if this needs to be done by the
> program...but that time-stamp (and a few other items mustn't (well,
> needn't...but I sure would need to overwrite it when I read it in) itself be
> included in the items rolled out. So I need to solve THAT problem.
When you said, "hasn't been referenced in a long time" I assume you
meant "hasn't been ACCESSED in a long time". When you say "roll it
out" I assume you mean remove it from memory so RAM can be recovered?
If so, you should know that Magma only references retrieved objects
via Weak collections. If your app is no longer referencing them,
they'll get "rolled out" automatically. If your app is, obviously
> Magma seems to be a good object database, but I can't see that it makes
> Smalltalk a desirable choice for this project (It may, this could be a
> documentation problem...either my not understanding it or the information
> not being clear.) If I'm going to recode the references into id#s, then
> either Ruby or Python make it trivial to turn the object into a string (and
> to reconstitute it later), and they also make it trivial to leave out any
> volatile variables. Perhaps Magma does the latter, but this wasn't clear.
> Definitely a part of my problem is that I don't have a clear image of how I
> would proceed. The only examples given were small fragments, extremely
> useful in clarifying points, but insufficient to yield a larger idea of how
> to use things. (E.g., I have no idea how to do Ma Object Serialization, but
> I may need to implement it anyway.)
You could install and experiment with it..? That's the Smalltalk way.
> Perhaps this is all because I don't really know Smalltalk well...which I
> assuredly don't. I was hoping to use Smalltalk to avoid the database
> problem, trading RAM (including virtual RAM) consumption for capacity, but
> it looks as if I end up at a database anyway. And in that case I should use
> a language that I'm already familiar with. (I'd really been hoping that the
> persistent image would be the answer.) If I do a decomposition I could even
> get away with using a key-value store. The only problem is that the id#
> requires lookup via an indirect reference. (Is it in the Directory? If
> not, get it from the database, if not, it's a new value.) Once I do the
> recoding of references to id#s, the database portion is "trivial, but
> annoying". But now I've added thousands of additional indirections/second.
> However, IIUC, Magma would be doing that under the hood anyway (as opposed
> to the image, which would be handled in hardware memory translation), and If
> I code it, I can put in things like automatically rolling out when it's
> stale. (By the way, does "stub" mean remove from memory, or remove from the
> database? From context I decided it probably meant remove from memory, but
> I couldn't decide whether dirty data would be written before being removed
> from memory, and I couldn't be really sure it wasn't just being deleted.
> That needs rephrasing by someone who knows what it's supposed to mean.)
#stubOut: is something I, myself, have rarely ever used. It means
convert the object back to a Proxy. If there's a chance it has been
changed, then it should only be used right after a commit because it
does NOT imply any writes to the DB.
> To me this appears to be, again, not the project that justifies
> implementation in Smalltalk. Perhaps if I were already experienced in
> Smalltalk I wouldn't see things that way, as Magma clearly means that
> Smalltalk *can* handle doing the project.
Ok, good luck.
> Thank you for your suggestion.
> Charles Hixson
> Beginners mailing list
> Beginners at lists.squeakfoundation.org
More information about the Beginners