Modular != minimal (was Re: [squeak-dev] Loading FFI is broken)

Fri Nov 15 21:02:49 UTC 2013

On 15 November 2013 20:38, Eliot Miranda <eliot.miranda at gmail.com> wrote:
> Hi Frank,
>
>
> On Fri, Nov 15, 2013 at 2:14 AM, Frank Shearar <frank.shearar at gmail.com>
> wrote:
>>
>> On 15 November 2013 02:54, Chris Muller <asqueaker at gmail.com> wrote:
>> > On Thu, Nov 14, 2013 at 4:27 PM, Frank Shearar <frank.shearar at gmail.com>
>> > wrote:
>> >> We talk past each other every time we have this argument.
>> >
>> > Not every time -- I've learned a few things from y'all in this
>> > community.  :)
>> >
>> >> On 14 November 2013 20:47, Chris Muller <asqueaker at gmail.com> wrote:
>> >>> I know module-heads like to say it's all about modularity and not size
>> >>> but I think it being about size is unavoidable.  (And, when I say
>> >>> "size" I'm talking only talking about disk and memory but also
>> >>> coherence which is a valuable thing).
>> >>>
>> >>> Because otherwise "so what" if FFI includes the constants and VMMaker
>> >>> depends on it solely for that portion of it?  How many methods making
>> >>> up FFI are we talking about?  There are plenty of _other_ methods in
>> >>> the image which are not being used by VMMaker, what about them?
>> >>>
>> >>> Acknowledged or not, at some point we're forced to assume a balance
>> >>> between number of extra methods and number of extra packages.  The
>> >>> hand-made-micro-packages approach puts these two metrics at inverse of
>> >>> each other, trading domain complexity for package complexity.
>> >>
>> >> We can argue about the granularity of the packages. I don't really
>> >> care about that. I argue about small packages in the base image only
>> >> because you cannot break the cycles without distinguishing about the
>> >> parts.
>> >
>> > Yes, we're in agreement that should be a criteria for determining
>> > package boundaries / granularities.
>> >
>> >> Please, please show me that I'm wrong so that I stop tilting at
>> >> the tangle web of windmills. Just take System. That would be a good
>> >
>> > Ha, I knew it!  You ALWAYS pick "System" every time we have this
>> > argument.   :)
>>
>> It's one of the most egregious offenders, so I'm bound to pick on it :)
>>
>> >> start. Show me how System makes sense as a package. Because all I see
>> >> is a big fat mess of separate things that have no business being
>> >> together. Projects? Change notifications? UI? Serialisation?
>> >
>> > "big fat mess" and "no business being together" are size / coherency
>> > judgements.  Busted!  :)
>>
>> I use a pejorative term here only because System's (probably) the
>> worst entangler we have. I want the Squeak base image to be like a
>> layer cake. If you draw the dependencies between packages, and Kernel
>> sits at the bottom, then all the dependency arrows point either
>> sideways (with no cycles) or downwards. System is like a giant
>> pineapple sitting in the middle of that cake. It cuts across these
>> various layers, because it provides high level functionality (no good
>> examples off the top of my head because I'm at work and don't have an
>> image open - Project maybe?), low level functionality (change
>> notification), and "support" stuff (which largely looks like a "useful
>> things that we don't know where else to put" bucket).
>
>
> First let me stress that I agree strongly with your desire to see the system
> properly modularized.  Personally I like the image of an onion, but then I
> like clams in white wine with onions and coriander much more than I like
> cake.  Second, I *think* it's impossible with a system like Smalltalk to
> meaningfully onionize the core of the system (i.e. System).  That's because
> it is recursively implemented.  None of the core libraries (arithmetic,
> collections) can exist without the kernel execution classes (behavior,
> method dictionary, compiled method),  None of the core execution classes can
> function without the core libraries (method dictionary is a collection,
> arithmetic is used throughout the core system).

Well, we know it can be done with 3 objects and 5 methods: Ian
Piumarta and Alex Warth showed that years ago already, in C no less.
We can certainly do at least an order of magnitude better than the
current state of affairs.

> Observe that having a separate development environment such as Spoon doesn't
> change much here.  We can easily extract the compiler (and a binary loader)
> from the system; it is then inert (cannot add more code) but still
> functional.  We can use Spoon to create methods in one image and upload them
> to another.  But within an image there will always be circularity which is
> fundamentally to do with teh system being implemented in itself, with
> everything (including code) being objects.  And this property of everything
> being an object is the single most valuable property of the system; it leads
> to the system's liveness.  So at some stage we have to accept the tangle
> that lies at the heart of the system.  It doesn't have to be a tangled mess,
> and can have clean boundaries.  But IMO inevitably at the core of the sytsem
> there will be some small number of packages which are inevitably
> interrelated, inseparable and unloadable.
>
> I'm probably teaching you to suck eggs but I had to let that brain fart
> free.

That is a very disturbing image that I will try to forget as quickly
as possible!

>> > The truth is, I'm pretty sure we've agreed about System for a while.
>> > If all dependency cycles could be removed, I wouldn't care so much
>> > about System being "big and fat" because I see it as the "Smalltalk
>> > programming system", but I think the cycles probably won't be able to
>> > be eliminated without breaking it up and so it's moot to disagree on
>> > System anyway.
>>
>> Parts of System depending on other parts of System in a cyclic manner
>> ("intra-package" cycles) don't matter. Unless you try to break up the
>> package, of course, in which case you can't load the parts without
>> weird preambles and non-MC-friendly things.
>>
>> I understand why you see System as being "the Smalltalk programming
>> system". I'm trying to untangle what exactly "the Smalltalk
>> programming system" actually means, and how it's built. I suppose I'm
>> looking at the packages through a microscope?
>
>
> As you well know that won't work.

I don't know if I missed my mark with that sentence. What I'm trying
to say is that when I look at System I don't see "the Smalltalk
programming system". I see a loose collection of parts. I see the
trees, not the forest. And the trees are bound together with Spanish
moss. Or something.

>  We need to look at it in large chunks.
> For me its core class libraries (essentially Object (so one can add new
> classes that integrate with the system), arithmetic, collections and
> streams), the execution classes, environments (Smalltalk, SharedPools),
> exceptions and some base error reporting facility.  Above that one can add
> System (managing the evolution of the environment) & Compiler.  Above that
> the programming tools.  Etc.

No arguments here.

> This works (I think) by looking at functionality.  Smalltalk is a
> programming system used to express programs.  The most elemental programs
> use arithmetic and/or collections; the next most elemental programs add new
> classes rooted at Object.  All these elements are themselves expressed as
> objects.  These programs may run, and in doing so may raise errors which
> need to be reported.  Note that I *haven't* included how those programs are
> created in the elemental soup.  Whether the image just is, or whether code
> is created by the Compiler/ClassBuilder, or loaded via a binary loader isn't
> elemental; the fact that there are at least three ways to go about it proves
> this.
>
> So IMO if you want to break the system into modules you first come up with a
> model of the system's functionality, and you design modules around that
> model. All forms of shrinkage, unloading, loading, compiling, remote
> debugging, etc, etc are merely arcs in the functionality model.  IMO an
> elegant functionality model is one where the atom of functionality is small
> and can easily be used to create other compound functionalities in as few
> steps as possible.
>
> So a base headless image with arithmetic, collections, file streams,
> execution classes, exceptions, compiler, read-eval-print-loop and error
> reporting to standard out seems close.  Another could be arithmetic,
> collections, file streams, execution classes, exceptions, error reporting to
> standard out, a binary loader and a command-line argument parser such that
> one can specify packages to load.  Another might be arithmetic, collections,
> file streams, execution classes, exceptions, and error reporting to standard
> out, that requires Spoon to load code into it.
>
> Once one's chosen one or more of these bases then other functionalities such
> as a squeak trunk image with the programming tools and morphic, or a squeak
> trunk with MVC, or a headless scripting environment with a
> read-eval-print-loop and lots of file system utilities, can be constructed
> by loading modules, and hence those modules can be derived from what it
> would take to construct a functionality.
>
> If I'm missing the point of this discussion forgive me, but it seems to me
> that without a clear notion of what the atom is there's endless scope for
> confusion and delay.

I see discussions about models and "what should the layers look like"
as having endless scope for confusion and delay. I'd much rather stick
my knife in a crack and see how the stone splits. At least as a first
pass. Finessing the module boundaries is bound to happen _anyway_, and
I'd rather have poor/ok boundaries now than theoretically perfect ones
years from now.

I would dearly love to have a Kernel that did _not_ depend on any
other package. This Kernel would of necessity include certain
collections, and various other bits that might at first blush not look
Kernel-y. But it shouldn't contain ToolBuilder references, or
TextConverter, or Compiler. It probably ought to contain
SystemChangeNotifier though, if only because that needs to log events
that happen right in the guts of the Kernel. (I can imagine other ways
of doing this, but you still have to have _something_ that notifies
interested parties of new classes, etc.)

It seems foolish to lump all Collections together just because they're
collections, as an example of what I'm _not_ looking for.

frank

>> > You had brought up a one-class package which is what got this going
>> > this time..  :)
>>
>> Yes, I did kinda deliberately do that, didn't I? :)
>>
>> >>> This is why want Spoon to make micro-packaging less important.  Let
>> >>> the machine imprint a truly "optimal", application-specific, image
>> >>> that no amount of human-wrangling could ever come close.
>> >>
>> >> Shrinking is useless. You have no idea what you deploy. I _do not
>> >
>> > Dabble DB who wanted to run hundreds of images.  I also have cases
>> > where I need to run many images.  For that, shrinking is not useless.
>>
>> By "useless" I mean "you have no idea what makes up a running process,
>> except by actually inspecting that process."
>>
>> > The idea of Spoon is to deploy only one single "fat" image (with
>> > everything you know you need and more) from which as many minimal
>> > images can imprint from.  Since they only download methods they need,
>> > as they get called, memory usage is optimized.
>>
>> Well. That inflate-as-needed approache requires you have a persistent,
>> reliable network connection between the deployed artifact and some
>> server somewhere. That's pretty much exactly the opposite of what I
>> consider sane deployment practice. I know I'm taking a really strong
>> stance here, and I apologise if I end up sounding harsh. (I can
>> sort've half-see a possible use case where the "single fat image" is
>> on the same machine as the mini images... in which case I'd rather see
>> the mini images constructed explicitly.)
>>
>> In particular, the kind of thing that I'd like to see is an automated
>> and explicit build process that assembles some binary artifact. That
>> goes into CI, which throws stones at it. If that binary artifact
>> passes muster, it's turned into a Debian package (replace with
>> suitable replacement concept for your platform) with a hard version.
>> That package is then deployed to the target machine, and in prod you
>> know _exactly_ what your server's running. (An alternative approach
>> would be to use Docker, in which case you assemble and test a "virtual
>> machine lite" that you can just start running on your target machine.)
>>
>> The first step here is actually being able to assemble that artifact,
>> and that comes down to a ConfigurationOf/Installer script that takes a
>> well-known base image and builds it up to whatever you need.
>>
>> Now my obsession is applying this same process to the base image itself.
>>
>> >> care_ about this theoretically minimal image, because otherwise I'd
>> >> just copy with Guille Polito's doing, and building up a whole new
>> >> object space starting with nil. _That_ is minimal.
>> >
>> > I'm not aware of his work..
>>
>> This looks like a good starting point:
>>
>> http://playingwithobjects.wordpress.com/2013/05/06/bootstrap-revival-the-basics/
>>
>> >> I've been rolling out clearly versioned code, with well-understood
>> >> dependencies, for years now in every language I know except for the
>> >> one I love the most. And in every one of these languages (C#, Ruby,
>> >> Java, Scala) I have had _no_ serious pain in managing dependencies.
>> >> You define your immediate dependencies, and you're _done_.
>> >
>> > Yes, we seem to be getting there.
>>
>> We are, slowly.
>>
>> frank
>>
>
>
>
> --
> best,
> Eliot