Complexity and starting over on the JVM (VM issues)

Paul D. Fernhout pdfernhout at kurtz-fernhout.com
Sat Feb 9 00:40:52 UTC 2008


Ignacio Vivona wrote:
> Did you take a look at http://pypysqueak.blogspot.com/?

I don't know if this prompted your reply, but I just noticed the "my" in "my
making it something awesomely better" was a typo in the previous post; it
was intended to be a more inclusive "by" as in "And not by making it a free
slow VisualWorks, but *by* making it something awesomely better than VW
(like by leveraging on the Java ecosystem)?" Freudian slip, perhaps? :-)

I came across PyPySqueak while composing these replies, but I have trouble
even figuring out what it is supposed to be. :-)  My guess is it somehow
compiles Smalltalk code to run on CPython's VM?

While I think a Python-compiler-in-Python-sort-of-with-types is a great idea
(to allow Jython and CPython to share the same codebase), I guess I'm not
seeing the value of PyPy in that particular application for dealing with
Smalltalk code over, say, ANTLR.
  http://www.antlr.org/
or Bistro (which uses ANTLR to pares a Smalltalk syntax (enriched or
polluted with types, depending on your perspective :-) to interoperate with
Java):
  http://www.educery.com/papers/bistro/intro/
or even my previously mentioned homebrew code in Jython, given Smalltalk is
so easy to parse (one of the nice things about it).

But maybe I am missing something? It sure looks like it has attracted
significant interest and effort, judging from the blog pictures.

Ah, here is something more (still a bit confusing to me):
  "First day, Discussions"
  http://pypysqueak.blogspot.com/2007/10/first-day-discussions.html
"We identified the following options as possible main goals for this week's
sprint
* to implement a Squeak-bytecode interpreter in RPython
* to define and implement an RSqueak as PyPy frontend
* to write a Squeak backend for PyPy"

I'll admit I still don't get everything they intended, other than as fun
exercise for getting the PyPy and RSqueak people to share knowledge?

No offense intended to either the PyPy people or the RSqueak people, but I
still question the whole "language implemented in itself" ideal for the
Squeak going onto the JVM, see:
  "Design Principles Behind Smalltalk, Revisited"
http://lists.squeakfoundation.org/pipermail/squeak-dev/2006-December/112306.html
Excerpt:
"And, if it is so often the conceptual barrier that is the ultimate hurdle,
then are technical barriers of different syntax and different tools really
so big? Many humans become fluent in multiple human languages and their
accompanying cultures, which is typically a harder thing than learning new
computer languages. If one needs to switch mental gears conceptually to
work on the VM, [given the need to think about native types with Smalltalk]
then is it *really* so bad if the VM is written directly
in C like GNU Smalltalk does? Now, I know a lot of positive arguments can
be made for the utility and convenience of the Squeak VM written in
Smalltalk (especially given C's quirks as a not-quite-cross-platform
language), but in the end, my point is, keeping everything in one syntax
may not really save that much time for the community, all things
considered. Even when the syntax is the same, the underlying domain
semantics may be very different, and those semantics, or meaning of all
the objects, are what take the time to learn. To build a new VM, one still
needs to spend a long time understanding what a VM is and how it could
work, and no choice of familiar tools or use of one single syntax will
make that extremely easier (a little easier, yes). A better choice of
abstraction perhaps might make maintain a VM easier for those who get the
abstraction, but not a choice of language by itself all other things being
equal. Were the Squeak VM coded in some other portable language (like Java
or Free Pascal or OCaml) then it might not take very much more trouble to
maintain -- and such a VM might even be easier to develop, as one could
use the complete tool set of that other system to debug the VM in the
language it was maintained in, rather than facing a technical barrier :-)
of seeing C code for the VM in the debugger instead of the original
Smalltalk source which was translated from. Granted, if the Squeak VM was
coded in, say, OCaml, one would have a barrier to an VM maintainer of
learning that language and its paradigms, but I would argue that the
barrier would remain more conceptual than technical, and the syntax
problem would be the lesser issue."

I think the advantage of the Squeak VM being in Smalltalk is also related to
the disadvantage of trying to debug pointer-using C code for any purpose
(having spent too many years of my youth doing that), especially given C
compilers varying in the size of basic types across architectures. :-) If
the Squeak VM was in, say, Java, I think it is going to be fairly
straightforward to debug the Squeak VM natively, since Java code using
JVM-ish tools is just easier to debug than C code using C-ish tools. Still,
there is nothing that prevents one from writing and debugging any sort of VM
in Smalltalk and translating it later to Java or whatever in the future,
even if the first VM is handwritten Java. And it sounds like Dan and some
others already have variations on that anyway. So it looks like several
options could co-exist in the Squeak/JVM world.

If I did a port of Smalltalk to the JVM just by myself, frankly, I would
just write the VM directly in Java or possibly Scala (a typed JVM language
which is functionally oriented).
 http://www.scala-lang.org/
"Scala is a general purpose programming language designed to express common
programming patterns in a concise, elegant, and type-safe way. It smoothly
integrates features of object-oriented and functional languages. It is also
fully interoperable with Java."

Or maybe even just put it on top of "Clojure/JVM" if I was really lazy.
  http://clojure.sourceforge.net/
"Clojure provides easy access to the Java frameworks, with optional type
hints and type inference, to ensure that calls to Java can avoid reflection."
Clojure already supports dynamic programming, so there might be a lot of
leverage there (just put a Smalltalk->Lisp translator there, add some method
dispatching and "doesNotUnderstand:" code for glue, and start worrying about
semantic mismatches for objects or exceptions, or work around with Clojure
bugs as it is a new system. :-)

Or, if I felt industrious and hard-working, maybe I might even try it
several ways (Java, Scala, Clojure, JVM bytecodes, maybe even others) and
compare, since a VM is not really *that* hard to code. This seems especially
likely if you build a Squeak VM on another VM like the JVM and take
advantage of that VM's services and don't try to avoid peculiarities for
cross-platform reasons (given the JVM is already cross-platform). For me,
beyond learning Scala and Clojure, the biggest problem would be to modify
the Smalltalk compiler to spit out either of Java, Scala, Clojure, or plain
JVM bytecodes on demand (which might take some thinking about, for semantic
mismatch issues). But Smalltalk is such an elegantly simple language that
even that would likely not be too hard.

It would be interesting to see the similarities and differences needed in
compilers among Algol-inspired (Java), ML-inspired (Scala), Lisp-inspired
(Clojure), and Assembler-inspired (JVM bytecode) targets for compiling
Squeak code. Would one intermediate language be a lot more elegant? Would
another be faster? Would one be more verbose? Another harder to debug? And
so on? Might make a good PhD thesis for someone. If there isn't one like
that already. :-)

A Smalltalk/JVM using Java objects (and Java threading and other libraries,)
would only really be managing the Smalltalk message dispatching event loop
(perhaps via Scala Actors?). It would not even be a full VM, as it would
just manage calling in and out of Smalltalk compiled to the intermediate
language or JVM bytecodes. And, if using native JVM objects as Squeak
objects, then just method dispatching support infrastructure code will be
needed to be generated somehow and debugged (an expansion on the Java I
first posted at the start of this thread), and I would think that could be
done *much* easier to just write in first in plain old Java (or Scala or
Clojure), given that we are not talking that much code. I'm guessing
probably less than ten pages worth for the core dispatching algorithm (maybe
even derived from Jython's Java implementation :-), but even if it was a lot
more code I doubt it would be any easier to maintain in Smalltalk because it
would be so specific to those language domains.

So, I would expect that a native coding approach could both make the system
easier to get running on the JVM. It would also get rid of even more
original Squeak code which might remain under potentially problematical
license constraints. Between a new (but small) native glue layer or Squeak
VM, and using GNU Smalltalk code for core classes, then we are just left
with licensing issues for Squeak tools and application, which would be
around the edges of this system (so easier to replace or discard). And many
of those applications, like Croquet already have straight forward licenses.
And others, like the refactoring browser have other free versions from other
Smalltalks. Back in 2000, when I was most interested in solving the Squeak
license issue (as I saw it), the option of using a truly "Free as in
freedom" JVM was not around (plus it was very buggy across platforms), and
also there was not as much relicensing work done to give stuff at the edges
of Squeak clearly OSI -approved "open source" licenses, and GNU Smalltalk
was not as far along. So this approach to create a "Free and open source"
Squeak-derivative on the JVM is really only feasible fairly recently (since
2007 at the earliest) compared to the state of everything back in 2000.

I think a legitimate objection to what I have outlined is that I am just
letting the supporting the JVM on various hardware be somebody else's
problem (and I am also stuck with depending on these people). I think the
JVM is also written in Java these days, as well, so it is a "language in a
language" to a high degree now, and didn't I just say I didn't care about
that? :-) Java is really more a systems programming language these days
IMHO, so I'd argue it's verbose typed syntax maps onto low level VM coding
better, and that's about the only thing Java is really good for as a
language (not a VM) compared to alternatives (although Scala might be even
better for making JVMs, maybe. :-) It is true, I'm relying on other people
to support the JVM, but that's also part of the *benefit* of not trying to
do everything else yourself. There are dozens of people (maybe hundreds)
paid to worry about making the JVM work right, so I can worry instead about
other things. Still, I would not advance that position if the JVM was not
free, but since recently it *is* now free, there is nothing (in theory) to
stop me from modifying the JVM however I wanted (even though in practice I
would never do that :-). Sure the JVM is still a bit buggy. But likely so is
the Squeak VM on some platforms and in some situations. Everything had bugs
or limits relative to some purpose, if not technical limits, then conceptual
limits. Nothing is a perfect choice (in part because we all have different
priorities and so different definitions of perfection). It's more a question
of "is it good enough and does it free me to focus on things I find more
interesting or important?" Ironically, just as many people are jumping ship
from Java and the JVM (to dotnet (or mono) languages or Ruby or Haskell or
Python or the Common Lisp resurgence or whatever), I think the JVM is
finally "good enough" (even if Java still isn't for most high-level purposes
IMHO. :-)

And, in practice, when using the Squeak VM, I'm just as dependent on Tim and
other Squeak VM maintainers (past, present or future), and it's true they
have done and still do a great job and are more responsive than Sun will
likely be. I'm not saying there are not some bad aspects of such a switch
over. But it is also not like the current Squeak VM in C will go away
anytime soon or stop working if there is also a Squeak on the JVM.

It would be true that you no longer could test and debug the Squeak VM
development for the JVM in emulation under Squeak with such an approach
(without slowly emulating the JVM in Smalltalk, which I think has already
been done to some extent by Paolo Bonzini for GNU/Smalltalk, although maybe
he just cross-compiles?). Still, I feel the tradeoff of having the new VM
under analysis go 100X to 1000X faster (essentially, real time!) under the
JVM might more than make up for having to use the JVM's debugging tools on
the new Squeak VM (or perhaps it might also use a lightweight Spoon-inspired
communications infrastructure above the JVM for most debugging and remote
development). At some point, these JVM debugging tools presumably could be
accessed or implemented through Squeak/JVM to make one's life even easier.
And Spoon-like VM-to-VM communication facilities could ride above that,
making for debugging at a higher level as long as they were operational.

Another important issue here is there are tens of thousands of people who
like free software and know Java debugging who might in theory be able to
help with such a VM debugging process (including at most universities).

There is also lots of related development documentation and tools at this
point (like for Eclipse) now that the Java ecosystem has matured. Java is in
this sense just the new "C" (with many strengths and, yes, a few weaknesses
relative to C). For a point by point language comparison, see:
  http://www.cs.princeton.edu/introcs/faq/c2java.html
or, as to performance:
  http://www.idiom.com/~zilla/Computer/javaCbenchmark.html
"Five composite benchmarks listed below show that modern Java has acceptable
performance, being nearly equal to (and in many cases faster than) C/C++
across a number of benchmarks."
  Or:
  http://www.stefankrause.net/wp/?p=4
"It’s hard to draw a conclusion because the results don’t speak just one
language.
But a few things can be said without regretting:
    ...
    * Saying that C is generally several times faster than java is -
according to those benchmarks - simply wrong. If you’re allowed to choose
the fastest JVM the worst case for java was 30%. In other benchmarks Sun and
JRockit were even able to beat ICC. Not by much but I guess it’s
nevertheless just very remarkable that it’s possible to beat ICC. Another
interesting figure is that Bea was less than 14% slower than GCC in the
worst case (nbody) and was in two cases faster than GCC.
    * Saying that Java is faster than C can also be pretty wrong, especially
if you have to stick with one JVM. The worst case in these benchmarks were
30% slower for JRockit to 2.44 times slower for Sun’s JDK 6U2."

For me, given most (not all) cross-platform issues go away, especially for
turbo-boosted Smalltalk->Java code, I am willing to take even a worst case
50% performance hit in VM performance. Imagine knowing that on *any*
platform where Squeak/JVM runs, you could just hit a button on your
(carefully written) Smalltalk code in the Class browser and it would
suddenly run 10X faster and still be debuggable by JVM-oriented tools
written in Squeak. :-) For me, that would be one of the biggest wins with
Squeak on the JVM, even with a 50% loss of GUI speed, since I am interested
in numerical simulations (but with fancy GUIs). I'll grant that other
people's needs may differ. RSqueak (which I take it is essentially Slang?)
  http://wiki.squeak.org/squeak/2267
or PyPy just can't hope for that kind of possible instant speed up on *all*
supported platforms, at least not without a lot of hard-to-maintain work
wrapping C compilers and C-oriented debuggers across many platforms, just
trying to duplicate functionality that essentially comes for free with any
JVM and has been heavily tested. Granted Python does have install tools
which do some of that across all platforms for C extensions, so it's not
impossible to imagine it being doable, but even then, just think about the
testing permutations for the C solution if you wanted to ship an application
relying on the performance boost.

For free software with unpaid community support, as a designer I'd much
rather sacrifice 30% or 50% of the potential speed by using Java vs.
optimized C if that meant have less low-level weirdness support headaches
for the community (now that a decade of suffering later most of the JVM bugs
are out -- I would not have said this in 2000).

I could also hope that a smoothly running Squeak/JVM might even potentially
entice Stéphane into thinking about doing Squeak maintenance again someday.
:-)  And even if it doesn't, then it hopefully still would make it easier
for everyone else.

I know this sounds like a "sales pitch", but the biggest potential
"customer" (or "sucker" :-) I am targeting is really just myself. :-) This
documentation is all part of my own decision making process (but a bit in
public this time, to invite agreement or disagreement on the substance of
it). If I myself remain unpersuaded (it is to be seen what I do), it is
purely weighing using an existing JVM solution (like Scala, Clojure, or
Jython) instead of doing something in Smalltalk.

--Paul Fernhout
P.S. There was also an "is" that was supported to be an "if", as in "the
biggest problem with Chandler is if they [had used Squeak that effort might
have worked out a little better. :-) ]"

P.P.S. Another Smalltalk->Java link:
  http://per.bothner.com/papers/smalltalk.html



More information about the Squeak-dev mailing list