Comments on Smalltalk block closure designs, part 1

Sat Apr 28 18:17:48 UTC 2001

Ian Piumarta wrote:
> 
> I just *know* I'm about to say something stupid.  (I haven't been
> following this thread, so I'm completely lost for context.)  But
> thanks for `CC'ing me anyway: I was wondering what to do with a spare
> 15 minutes while the washer finishes washing...

I think the machine likes me ;-)

> > But in spite of your criticism of an evolution of the classical ST80 design
> > towards block closure semantics, I'm continuing to resist to give up my design
> > at the moment. For this case you have given a neat cookbook for implementing
> > block closure semantics, which I could follow then... ;-)
> 
> AFAICMO, Allen's "cookbook" corresponds pretty closely to the way
> closures are (and have been) done in the vast majority of functional
> languages (including various Smalltalk implementations).  Apart from
> anything else, you're arguing against an awful lot of exerience gained
> by an awful lot of people over an awfully long time spent implementing
> (and fine-tuning) an awfully large number of dynamic languages.  (By
> "dynamic language" I guess I mean any language that implements
> closures as fully-reentrant first-class fully-upward anonymous
> functions with static [lexical] scoping and free variables.)

I don't want to argument against
> an awful lot of exerience gained
> by an awful lot of people over an awfully long time spent implementing
> (and fine-tuning) an awfully large number of dynamic languages.

> 
> On the other hand, I do believe that many great breakthroughs have
> been made simply because nobody bothered to tell the people involved
> that what they were trying to do was "impossible".
> 
> So my position, generally speaking, would be: don't listen to any of
> us bumble-headed know-nothings -- just go right ahead and try it.  If
> it works, great!  If if doesn't (or if it's abysmally slow) throw it
> away and move on to something entirely different (and hopefully more
> promising, in the light of experience gained).  A negative result can
> yield information that is every bit as useful as a positive result.

I've learned a lot already now by
- making the design;
- starting to implement while improving it;
- posting the design with
  - provocating this thread, and
  - getting other reactions.

> (And the time and effort spent trying to convince us that it's The
> Right Way is probably comparable to the time and effort required to
> implement it, measure it, and know quantitatively whether or not it's
> right.)

My intent wasn't to convince you by my design, but to get insights, where are
the possible problems with my design.
And now I've gotten much more, namely a raw draft of a more consequent design
by a very experienced man (thanks to Allen).

> > Agreed. But the special logic isn't very complicated here and we
> > save some other logic needed by another design.
> 
> Most of the logic needed to efficiently implement closures in the
> "traditional manner" (split allocation between captured and
> non-captured variables, etc...) happens in the compiler, not in the
> runtime.  One needs to look very closely at what gets figured out
> where, and by whom.  Complex logic in the compiler is preferable, if
> it can eliminate even a little "trivial" logic in the runtime.
> 
> > > You can probably get by with static links that chain together activation
> > > records (MethodContexts) but this has the reification problem that
> > > ultimately is a performance killer.
> >
> > This is a key point: Is this really a performance killer?
> 
> In VisualWorks: I know that big savings were made by redesigning the
> way closures (and free variables in particular) work, along lines
> similar to those that Allen is suggesting.

J4?

> In J3: not at the moment, but only because other parts of the system
> are sooooo hopelessly inefficient.  (Some of the inefficiency is
> caused by the runtime logic required to cope with blocks that aren't
> real closures, and to deal with block bodies that are inline in their
> defining methods -- when truth and beauty are both crying out for them
> to be compiled as separate "anonymous methods".  But that's another
> story, for another bedtime.)

Named CompiledBlocks in VW, I think.

> > I don't know Jitter technology, so I'm very interested in a rationale for
> > this.
> 
> Converting between different representations of the same thing
> (machine-friendly frames on the stack, closure-friendly contexts in
> the heap) is just wasted cycles.  Not to mention additional memory
> traffic, which is to be avoided as often as possible (in any context).

Without putting contexts onto the stack at all, we don't have this reifying
conversions, but only heap accesses (hopefully cached well, see below)
instead.

> Just having the possibility for reification is costly too, since
> potentially each and every return has to check whether the exiting
> frame needs to be reified in the heap.  (Activation might even be more
> complex too, if it has to plan ahead for possible reification.)
> 
> Plus every time you reify something, you create an object.  Even
> though it's probably quite a small contribution to the big picture,
> the GC has more work to do.

Same as previous point.

> Apart from that, the closer you can get to the execution model that
> the architects of your processor had in mind while they were designing
> your processor, the faster your programs will run.  Since most
> processors are designed to run C (or maybe Fortran), activation
> records were meant to be managed (in the vast majority of
> architectures) by specific hardware features implementing strict LIFO
> behaviour.  Anything (such as non-LIFO behaviour) that causes traffic
> between the stack (where the processor's designers decreed that all
> activations should live) and the heap (where reified contexts have to
> live) is pure overhead (and in some cases completely invalidates lots
> of information, that the processor has carefully cached on your behalf
> yet will throw away mercilessly the instant it suspects that you've
> violated any of its implicit LIFO laws).

That seems to be the critical performance issue (also see below).

> > Let's take the simple example, which should occur - as I guess - mostly often
> > in running an ST application in this or a similar form:
> > >          [:x :y | x >= y]
> >
> > Lets assume a collection consisting of SmallIntegers (immediate objects):
> > Is for a Jitter creating and using an activation record onto the stack faster
> > than reusing an activation record at the heap for every evaluation in this
> > case?
> 
> No, there's probably only a few instructions of difference between
> them.  (Assuming the way that contexts are handled inside the VM is
> changed a little -- see "Managing Stack Frames in Smalltalk-80" [Moss
> et al] for one possibility [but watch out for the typo in the example
> code, if you decide to benchmark their algorithm].)  (But the required
> number of memory references is another matter entirely!!)
> 

Memory or just *heap* memory references?

<snipped>

> > Principal question: How intelligent are the caching mechanisms of typical
> > processors (for me its an Intel)
> 
> For Intel (and clones), depends on the manufacturer.  (For the most
> part, fairly dumb.)
> 
> > regarding non localized memory addresses? Is this a problem or not?
> 
> For most caches, dynamic locality is what matters, not static
> locality.  Your program can generate access patterns that span
> gigabytes without any adverse effects, provided that only a reasonably
> small number of addresses are in the "working set" at any time (where
> "reasonably small" is relative to some function of the number of
> distinct cache lines that you have).

As a consequence only using activation records residing in *heap* located
contexts shouldn't be a performance problem, if there aren't too many used at
once ('reasonable small number of addresses').

Is this interpretation correct?

<snipped>

Greetings,

Stephan
-- 
Stephan Rudlof (sr at evolgo.de)
   "Genius doesn't work on an assembly line basis.
    You can't simply say, 'Today I will be brilliant.'"
    -- Kirk, "The Ultimate Computer", stardate 4731.3