[squeak-dev] Falsehoods programmers believe about Smalltalk

Tue Jan 22 02:26:36 UTC 2019

Hernán Morales Durand wrote on Mon, 21 Jan 2019 18:13:45 -0300
> I have some possible myths, but I'd like to confirm or reject:
> 
> - All Smalltalk bytecode sets are stack-based VM. (?)
> - Bytecodes are always fixed-size. (?)

SOAR (Smalltalk On A RISC, now renamed as RISC-III) used a 32 bit
register based instruction set and Smalltalk sources were translated to
that. They did regret dumping bytecodes later in their project as some
things became more complicated and the increase in memory use was very
expensive at that time.

https://apps.dtic.mil/dtic/tr/fulltext/u2/a172800.pdf

SOM (Simple Object Machine) is a set of Smalltalk VMs where some of them
represent code as abstract syntax trees instead of bytecodes.

http://som-st.github.io/
http://www.hpi.uni-potsdam.de/hirschfeld/projects/som/

> - Most of the time spent by a VM is in the instruction interpreter. (actually it's in the GC right?)

That will vary from one VM to another. On page 179 of the "green book"
you can see a nice graph of the space and time used by different part of
the Apple Smalltalk (from which Squeak evolved) and on page 177 you can
find the numbers used to create the chart.

http://sdmeta.gforge.inria.fr/FreeBooks/BitsOfHistory/

10.2% of the time in the fetch loop, 16.0% in the bytecode interpreter,
39.2% in sends and returns, 22.6% in the memory management and 10% in
primitives.

Adding the first 3 numbers you get 10.2+16.0+39.2 = 65.4% in the
instruction interprter while the GC is part of the 22.6% which is the
memory management. That said, in Squeak gcc tricks really helped with
the fetch loop and the stack VM greatly reduced the send/return
overhead. So it might be the case that the GC dominates performance. Or
not - we have to measure and see.

> - You cannot serialize objects containing blocks. (IIRC one can use MessageSends)

Given that the image contains blocks, that can't be true. Obviously
serializing a subset of objects is a harder problem than just dumping
memory, but I consider images a proof of existence.

> - Image cannot be bootstrapped. (This is possible in ST/X and now in Pharo I think).

Little Smalltalk is a good example of taking a textual representation
and bootstrapping an image from it. GNU Smalltalk didn't even use images
the last time I looked at it. I consider Self to be a Smalltalk (just
not a Smalltalk-80) and it can start with either a snapshot (its name
for image) or with an empty world and load text files (possible because
the source to bytecode compiler is included in the VM).

> - All Smalltalks includes UI classes. (GemStone doesn't have AFAIK)

The MS-DOS port of Squeak had no GUI, just a command line prompt. That
was also the case for GNU Smalltalk and Little Smalltalk.

> .- All implementations uses direct pointers, (GST?)

The RoarVM for Squeak uses object tables. In fact, the lack of direct
pointers in early implementations is what led to the use of #become:
which complicated the adoption of direct pointers. VisualWorks has an
indirection pointer in the header - see slide 7 of

https://www.slideshare.net/esug/spur-a-new-object-representation-for-cog

> - All implementations uses green threads. (VAST? MT?)

I would say this was a side effect of patching the original Smalltalk,
which was its own operating system (and so the idea of green thread
doesn't apply) to run on top of Unix on commercial workstations. All the
old code supposed the mix of cooperative and preemptive multithreading
that breaks down if you have multiple native threads.

Some from scratch Smalltalks copied this model while others (I am pretty
sure it was the case for MT, as you mentioned) had their libraries
written with native threads in mind.

-- Jecel