[Vm-dev] [Draft Article] Optimizing Contexts

Wed Jul 11 08:57:50 UTC 2007

Cross-posted to: 
    strongtalk-general at googlegroups.com
    vm-dev at lists.squeakfoundation.org
	exupery at lists.squeakfoundation.org
    news at lists.squeakfoundation.org
    comp.lang.smalltalk

Hello lists. This is a draft of an article I am writing for an upcoming
series at the Weekly Squeak. It is about optimizing context creation, I
would like comments on its technical correctness, as well as suggestions
for related content I should include. I would especially like to know if
this article covers the way this is handled in systems I did not
specifically research, such as Strongtalk, and Exupery, and
non-smalltalk systems like Common Lisp, Erlang, Haskell, and OCaml,
since this topic is not really specific to object-oriented systems.

This will not be the first article of the series; however, the first two
paragraphs are the preliminary series introduction. I put them in so
that you can get an idea of what I am writing. Sections enclosed in 
... tags are parts I am especially unsure of; I would appreciate
comments (I used i tags, because my editor, vim, highlights the enclosed
text when in HTML mode).

Look forward to this appearing in the Weekly Squeak:

http://news.squeak.org/

Thank you for your help, and enjoy the article!

=== Speeding Up Smalltalk Series, part 42: Context Optimizations ===

Ever since object-oriented systems became popular, they have had the
image of being very easy to program, but very slow in terms of execution
speed. This image and reality has plagued every popular object-oriented
system, starting with Smalltalk in the 80's and continuing on to Java
and Ruby today. However, this does not have to be the case. Early-bound
languages have gotten much faster thanks to lots of research on
compilers and computer architecture. Similarly, much research has been
done on making late-bound object-oriented systems fast, both with
software and with hardware. This research has not seen as much exposure
as processor research, and so is probably less familiar to you. This
series, "Speeding Up Smalltalk" attempts to summarize a lot of that
research and present it for your enjoyment.

Computation in Smalltalk and Self is done by sending messages to
objects. This seemingly simple operation of sending a message is done so
often and is so flexible that it is always the execution
bottleneck in any implementation of Smalltalk or similar object-oriented
system. Why is that?

---- Description of Context Objects ----

Every time a message is sent, four things happen:
- A search is done through the message receiver for a method that has
  been sent. This was the topic of the previous article, and will not be
  discussed here. 
- A context object is allocated on the heap. 
- The context object is initialized and filled with:
  - local objects, including:
    - pointers to the message arguments
    - pointers for each local variable
    - pointers for temporary storage needed for intermediate results
  - A way to reference variables defined in other scopes. This is a
    property blocks have, but methods don't
    - In Smalltalk, by a pointer to the "home context"
    - In Self, by delegate slots to the enclosing context(s)
  - A return address at which to resume execution
    - Squeak does not need this, since every context has it's own PC, so
      when returning, the PC of the returned-to context is already in
      the right place. This is equivalent to a return address, since a
      child's return address is the parents PC. 
  - A return context that will become current upon return
- The newly-created context is made the "current" context for this
  thread/process

In Squeak, MethodContexts and BlockContexts are subclasses of
ContextPart. ContextPart has three instance variables:
- sender: The context that sent the message being executed. This context
  will become current when the current context returns
- pc: A pointer to the currently executing instruction
- stackp: The pointer to the execution stack. The smalltalk virtual
  machine is a stack processor. local variables are stored in the
  indexed fields of the context. stackp keeps track of which OOP is at
  the top of the stack.  For MethodContexts:

MethodContext extends it with 3 instance variables, and a variable-sized
area for holding the temporaries:
- method: the CompiledMethod holding the code being executed
- receiverMap: <i> Something to do with the names of the variables.  Not
  really sure </i>
- receiver: a pointer to "self", the receiver of the message.

---- Squeak optimization: Static Contexts ----

In Squeak, BlockContexts are partial closures (sometimes called static
blocks/closures [3]). Rather than hold their own code and environment,
BlockContexts piggyback off of another context/code pair. Home is the
environment that the block piggybacks off of. This usually works, but
there are some cases where this scheme will not work:
- <i> like what? Why not? I don't understand why this doesn't always
  work. Something to do with re-entrancy or circular references. Not
  sure why either is a problem. </i>

For BlockContexts:
- nargs: the number of arguments that the block must be called with.
- startpc: The index into <code>self home method</code> where the code
  for the block starts.
- home: the home environment

Squeak's NewCompiler has a different representation of blocks. These are
full closures, storing their own code and environment. They are
sometimes called dynamic closures to distinguish them from the above
"static closures" [3]. They contain:
- method: The code of the block (CompiledMethod2)
- environment: Captured temps   (ClosureEnvironment)

---- Squeak optimization: Context Recycling ----

To avoid having to create a new MethodContext on every message send,
Squeak keeps a pool of pre-allocated MethodContexts, and simply fills
one up when a new one is needed. To support this optimization,
MethodContexts are only created with two sizes, rather than any size
that they could support. The small size has room for a few 
temporaries, and the large size has room for a bunch [4]

---- VisualWorks optimization: Context Caches ----

The most natural way to store contexts is as a linked list of objects,
but that does not map well to the hardware of many processors. Many
processors (including all the Intel varieties) have several 
built-in instructions to mechanize the creation of stack frames, which
are similar to context objects, but have several important differences.
VisualWorks uses these native stack frames for most operations, meaning
that context creation is identical to a C function send in many cases.
Thus, the hardware stack acts as a stand-in for the context objects;
this optimization is called a context cache.

However, stack frames cannot handle context responsibilities beyond a
single call and return; when more is needed, a context object must be
allocated on the heap. The context object can either be a context in
it's own right (a stable context), or act as a proxy to the hardware
stack frame (hybrid context). There are several complications to this
dual scheme; curious readers should refer to [5].

VW and Squeak does not use a stack of context frames; instead it uses a
linked list of real context objects. This is easier, but perhaps slower.
To save on allocation costs, contexts are cached and recycled for
various method invocations.

<i> Not sure where to incorporate this </i>

There are three types of blocks:
- open/clean blocks (pure function with no closure)
- copying blocks
- full blocks 

---- Or you can cheat... ----

The most popular object-oriented languages (C++, Java) chose to dump the
flexibility of blocks in exchange for runtime speed. These languages use
a native stack frame without providing the escape mechanism of a context
cache, so that blocks and other context behavior that does not follow
stack principles is not supported. This is a tradeoff that eliminates a
good deal of flexibility, and so will not be explored further.

---- Final words ----

Inlining solves many of the same problems as a Context cache does. If
the method body is inlined into the caller, no new context need be
created. However, inlining solves many more problems than just context
elimination; I will dedicate an entire article to it.

---- References ----

- context recycling may be discussed in tims nublue book:
  http://www.rowledge.org/tim/squeak/OE-Tour.html

- context caches are discussed in section 5.2.3 of 
  http://www.kampjes.demon.co.uk/articles/exuperyDesign.pdf

[3] <i> add reference to squeak mailing list or Ian's something to
discuss the difference between static and dynamic closures </i>

[4] Squeak Class comments of <code>InstructionStream</code>,
<code>ContextPart</code>, <code> MethodContext</code>. and
<code>BlockContext</code>.

[5]: Context Management in VisualWorks 5i:
     http://wiki.cs.uiuc.edu/VisualWorks/DOWNLOAD/papers/oopsla99-contexts.pdf

-- 
Matthew Fulmer -- http://mtfulmer.wordpress.com/
Help improve Squeak Documentation: http://wiki.squeak.org/squeak/808