[squeak-dev] binary development (was: 3.11 and the trunk)

Tue Aug 25 19:35:23 UTC 2009

2009/8/25 Eliot Miranda <eliot.miranda at gmail.com>:
>
>
> On Tue, Aug 25, 2009 at 3:57 AM, Igor Stasenko <siguctua at gmail.com> wrote:
>>
>> 2009/8/25 Eliot Miranda <eliot.miranda at gmail.com>:
>> >
>> >
>> > On Wed, Aug 19, 2009 at 6:56 PM, Igor Stasenko <siguctua at gmail.com>
>> > wrote:
>> >>
>> >> 2009/8/20 Eliot Miranda <eliot.miranda at gmail.com>:
>> >> > Hi Igor,
>> >> >
>> >> > On Wed, Aug 19, 2009 at 6:00 PM, Igor Stasenko <siguctua at gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> 2009/8/20 Jecel Assumpcao Jr <jecel at merlintec.com>:
>> >> >> > Colin Putney wrote on Wed, 19 Aug 2009 14:25:21 -0700:
>> >> >> >> On 19-Aug-09, at 10:15 AM, Jecel Assumpcao Jr wrote:
>> >> >> >>
>> >> >> >> > For example, I would far prefer to
>> >> >> >> > see Squeak move to a binary based development model (I would
>> >> >> >> > mention
>> >> >> >> > Projects and Etoys here) than the current source based things
>> >> >> >> > we
>> >> >> >> > are
>> >> >> >> > doing (trunk, bob or whatever).
>> >> >> >>
>> >> >> >> Forgive me for seizing on a throw-away comment like this, but
>> >> >> >> would
>> >> >> >> you mind expanding on this a bit? Are you saying you prefer
>> >> >> >> something
>> >> >> >> spoonish, where CompiledMethods  are passed directly from image
>> >> >> >> to
>> >> >> >> image? Something else?
>> >> >> >
>> >> >> > Heh, I got asked about this on IRC as well. Though I had actually
>> >> >> > started to explain this a little in the original email, I ended up
>> >> >> > deleting it to keep on topic. With a new subject line I don't feel
>> >> >> > I
>> >> >> > have to worry about that. Some details about this (with a few
>> >> >> > drawings)
>> >> >> > can be found in the Chunky Squeak wiki page:
>> >> >> >
>> >> >> > http://wiki.squeak.org/squeak/584
>> >> >> >
>> >> >> > The idea is to be more like the Etoys users which can load binary
>> >> >> > projects containing not only the code they need but also hand
>> >> >> > crafted
>> >> >> > objects which have no source (like a drawing, some nested Morphs
>> >> >> > or
>> >> >> > even
>> >> >> > some text). This is very simplistic compared to Spoon, and my
>> >> >> > proposal
>> >> >> > was even more simplistic. In particular, this doesn't handle the
>> >> >> > case
>> >> >> > where any changes to bytecodes or object format are needed.
>> >> >> >
>> >> >>
>> >> >> The central question, which arising immediately is, what is the
>> >> >> credible way(s) to reproduce such artifacts?
>> >> >> When we having a source code, we could (re)compile it on a different
>> >> >> system. But what you propose to do with pure binary data, a soup of
>> >> >> objects, in respect that it is incredibly hard to understand, what
>> >> >> bits you need and what's not, in case if you need to do clean-up ,
>> >> >> refactor, rewrite and simply analyze what is happening.
>> >> >> This is what making a huge difference, for instance, between
>> >> >> applications with open source code and applications shipped in
>> >> >> binary
>> >> >> form - you can only report bugs, but can't realy make any
>> >> >> suggestions
>> >> >> about what happening.
>> >> >> I don't think that developers of Squeak should be victims of such
>> >> >> situation(s).
>> >> >
>> >> >     it is possible to have your cake and eat it too.  One can create
>> >> > a
>> >> > binary format that includes source and includes the meta-source for
>> >> > its
>> >> > creation.  But including a binary representation allows much faster
>> >> > loading,
>> >> > loading without a
>> >> > compiler, and source hiding if one choses not to include the source.
>> >> >
>> >> >
>> >> > There are other advantages, such as not cluttering up the changes file when one loads a package  In the VW parcel system, to which I added source management, we replaced the SourceFiles with a SourceFileManager whose job was to manage the sources and changes file and an arbitrary number of source files for parcels, the binary format.  In
>> >> > the parcel file the source pointers of compiled methods are the
>> >> > positions of
>> >> > their source in the parcel source file.  When one loads a parcel the
>> >> > SourceFileManager adds the file to its set of managed files and
>> >> > assigns
>> >> > an
>> >> > index for the source file.  The parcle loader then swizzles all the
>> >> > source
>> >> > pointers so that they include the source file index along with the
>> >> > position.
>> >> >  So accessing the source for a method loaded form a parcel accesses
>> >> > that
>> >> > parcel's source file.  We used a floating-point like format for
>> >> > source
>> >> > pointers, where the exponent was the source file index, and the
>> >> > mantissa
>> >> > was
>> >> > the position in the file.
>> >> > We didn't create a single file format, having two separate files for
>> >> > binary
>> >> > and source, which is probably a mistake.  A format with a short
>> >> > header,
>> >> > followed by source, followed by binary, followed by metasource, would
>> >> > be
>> >> > easier to manage than three separate files.
>> >> > We didn't include any metasource, but we did include pre-read, load
>> >> > and
>> >> > unload actions.  I did a very bad job on version numbering and
>> >> > prerequisite
>> >> > selection.
>> >> > That's not the whole story but enough to start answering your
>> >> > question.
>> >> >  If
>> >> > there is a well-defined definition of the objects in a package and
>> >> > that
>> >> > definition is included in the package as metasource, then one can
>> >> > comprehend
>> >> > the binary package's contents by examining the metasource and can
>> >> > reproduce
>> >> > creating the package, provided that the tools are careful to impose
>> >> > ordering, etc.
>> >> > best
>> >> > Eliot
>> >>
>> >> I think you inevitably made wrong decisions, because you went this way
>> >> by allowing an
>> >> arbitrary binary data , held by package.
>> >> In such situations it is much more easier to make a mistakes.
>> >> But sure, one who's making no mistakes is one who doing nothing :)
>> >
>> > We didn't disallow representation of arbitrary data but we also didn't
>> > support it.  The only thing the Parcel system supports (as in the tool
>> > set,
>> > rather than what one can extend the framework to do in specific
>> > circumstances) is to represent code, which it does very well.
>> > What are these mistakes?  Can you be specific?  I think the parcel
>> > system
>> > has been a major success.  VW is now deployed as a system of components,
>> > the
>> > base image and a much larger suite of parcels.  Parcels are not tied to
>> > a
>> > particular version or implementation and yet are still fast to publish
>> > and
>> > load.  What's not to like?
>>
>> I referred mainly to your own statements about mistake(s).
>
> Ah, ok,  Sorry :)
>
>>
>> I don't know about parcels so much to tell exactly where is the flaws.
>> I'm still wondering, how you could unload a parcel if its not longer
>> needed, but
>> there are still object(s) which used/created by parcel sitting in image.
>
> Smalltalk has this problem with or without binary loading; they're called
> obsolete classes :)  However, the problem of knowing what to remove when the
> user says "unload" means that a loaded parcel requires a data structure that
> names the classes and methods it loaded.  In addition we maintain overrides,
> the older versions of methods and class definitions, in a stack, so that
> these can be restored when unloading a parcel.  I made lots of mistakes here
> (not allowing the tools to publish a parcel that has code overridden by
> others, not integrating source management and browsing queries with
> overridden code, not compressing the changes correctly with overridden code,
> etc, etc).  Tests would have helped :/
> VW did (does?) test for open instances of applications when we unload a
> parcel so that if the parcel contains a subclass(s) of ApplicationModel
> (VW's top-level GUI app class) all open applications are tested to see if
> they contain instances of the class(es) and a warning is issued.
>>
>> A basic use case is: developer needs some specific tool (like UI
>> design tool) when he working
>> on application. But at the moment when he ships the application, it is
>> no longer needed.
>
> Right.  I don't know of an automatic solution, but a good convention is to
> split all packages into a development and deployment pair where
> the deployment half is a prerequisite of the development half.  Sticking to
> the convention and using good names makes it easier to remember to remove
> deevelopment components and to guess which parts of someone else's
> components are development only.

Yes, and this is what i really missing in smalltalk-80 based
environments: distinction between development
and deployment modes & models.
It would be cool to have some basic things to behave different when in
deployed mode (like preventing access & data overrides).
The main problem in open system (such as smalltalk object memory) is
that when something goes wrong, often you
having two choices: reboot the system or debug and fix the problem in
a living environment.
Often, none of the choices is acceptable, because if we are talking
about end-user application, we don't expect that
user is able to debug & fix the issue. As well as rebooting an image
means loss of data and/or interruption of serving other jobs.

But, if system modelled in modular layers , like kernel -> services ->
interfaces -> working set,  then things
would be much easier to handle.

> I added a bulk instancesOf primitive that answered all instances of an Array
> of classes that my colleague Steve Dahl wanted to use in instance migration
> on class redefinition.  This could be used to look for all instances of the
> classes defined by a parcel prior to unload.  Do a GC, collect all instaces
> of classes defined (rather than redefined) by a parcel and warn if non-empty
> (if in a dev image).

I think that independent tiny layers (isles/vats) is the future system
organization in smalltalk-like VMs.
First, it gives the strong answer to question, what belongs to what.
There is no possibility to reference a foreign object
other than by far ref. You can count/enumerate them easily, and this
approach also makes possible to run code in vats concurrently.
The problem here is how to handle the shared behavior, like Arrays,
Collections etc in order to avoid duplication. Since in smalltalk
everything is objects, and so methods & classes too, they can belong
only to a single island/vat, and therefore , only owning island can
manipulate with it. This creates a major bottleneck in effective
implementation of concurrently (and independently) running the code.
Trade space for speed? Allow each island to have own Array class with
own implementation?
This question remains open for me.

>>
>> >> Obviously one of the side of such problem is uniform object memory,
>> >> where each object could
>> >> reference any other object and limited only by a imagination of people.
>> >> There is no layers or any other means which could establish a certain
>> >> barriers (which we calling a modules)
>> >> in smalltalk.
>> >> It means, that once you integrated the parcel into image, and started
>> >> using it, you may have a hard times trying to unload it.
>> >> It is possible to develop an image as an artifact, which contains both
>> >> binary & sources , but such approach
>> >> having a drawbacks, which we, by the way, trying to overcome nowadays.
>> >> Practice shows that such approach is credible only
>> >> for a small group of individuals, but becomes a bottleneck if you
>> >> adopt such scheme for a wider community.
>> >>
>> >> So, i think , that before entering this domain (allowing binary data),
>> >> first we should solve more basic problems of smalltalk & its design -
>> >> modularity, name spaces, layering & etc etc.. Only the we could return
>> >> to original question and solve it.
>> >>
>> >> --
>> >> Best regards,
>> >> Igor Stasenko AKA sig.
>> >>
>> >
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Best regards,
>> Igor Stasenko AKA sig.
>>
>
>
>
>
>

-- 
Best regards,
Igor Stasenko AKA sig.