Re: Animorphic ST (Strongtalk) released!

19 Jul 2002


      On Friday, July 19, 2002, at 10:53  Uhr, David Griswold wrote:
...
The big issue is the speed, which you aptly captured with the word 
"lumbered".  There are two speed issues:

Startup time: Strongtalk already doesn't start as quickly as Squeak

because of the initial compilation "bulge".  Snort would start
even more slowly, not only because the compiler is being interpreted, 
but also because it is being compiled.  This might not be to
bad a problem, or it might.  The critical issue is what the performance 
locality profile of the compiler is.  In other words, does
the compiler spend most of its time in some small fraction of its code, 
or does most of the compiler get exercised pretty heavily.
I don't have an answer for that, but the more locality it has, the 
faster it would speed up.
Just an idea:  maybe too much effort is being spent on speeding up the 
whole system, and doing it both transparently and dynamically during 
execution.  Maybe speeding up the whole system dynamically by some 
factor isn't all that important.  Maybe it is more important to know you 
can speed up certain portions to within an epsilon of C without going 
through strange contortions?
<quote>
    Would anyone care if it ran 10 times faster?
    	and how much work would this be?
Yes, yes, yes. I think the Smalltalk community has always underestimated 
the extent to which performance matters in the real world
(and the extent to which proponents of other languages use it as a 
weapon against Smalltalk).  Of course it is true that first you
need to write your algorithms correctly, and that *most* applications 
don't need ultimate speed.  However, it can be very hard to
predict ahead of time whether your application will need some critical 
algorithm to run at top speed, and once you are in that
situation, it can be incredibly painful to have to write some algorithms 
in another language, especially if by chance they need to
extensively access, create, or modify shared data structures.  So this 
makes it a gamble: are you willing to *bet* your success on
the hope that your application will never be compute bound?
</quote>
My experience with Objective-C very much confirms this.  Speed really 
isn't that much of an issue for most of the code, which can actually be 
written in WebScript (a slow interpreter of a language very close to 
Objective-C).  However, knowing that you can get to the metal without a 
drastic rewrite (and I do consider Slang + pluginizaation fairly 
drastic) is both psychologically comforting and tremendously useful in 
practice.
I also wouldn't mind having to make annotations in order to get the last 
bit of performance out of specific pieces of code.
...

Asymptotic compiler speed: once the compiler is running essentially

full speed, is that fast enough?  Even with all the Strongtalk
optimizations in place, such code is still a lot slower than fully 
optimized C++ (in principle it could get fairly close to C++
performance, but the limiting factors are the fact that the best 
algorithms for more advanced optimizations like scheduling and
register allocation are usually too slow to run in realtime, as well as 
being a giant pain to write efficiently).  So I would expect
that a compiler similar to the Strongtalk one written in Smalltalk 
would run about 5-10 times slower than the Strongtalk compiler
(although with tuning you might eventually be able to speed it up a 
lot).
Strongtalk is still 5-10 times slower than C/C++?  So doing the compile 
off-line would probably be a better idea.  Could 'C' be generated?  That 
way, there would be a fall-back that would at least allow static 
compilation.
...
If you have to recompile all the perf. critical code everytime the 
system starts up, all this might mean very slow startup, and
noticeable pauses for compilations thereafter.  But one way to get 
around all of this might be to keep the compiled code in
ObjectMemory and save it with the image, in which case you could start 
the system in an already optimized state (complicates image
portability, but worth it).
Why not store the generated code as a dynamically generated, dynamically 
loaded shared library, instead of storing within the image?  For 
example, why not have a compiled binary associated with each module?  
I'd wager that the core modules would not need to be recompiled a lot 
after they\ve settled.  That way, the core classes would really start 
becoming more part of the VM, or rather, the distinction between VM and 
objects would start to disappear (with much of the VM also being written 
'in objects').  It all becomes more a 'substrate' for running 
user-defined objects.
If you work a little on the calling-conventions, you might even be able 
to make the whole thing a lot more interoperable with other compiled 
code on the system.
...
In this case, people would generally want to do 'training' runs of 
their code to get them mostly
optimized, and then save the image.  Compilation then becomes a fairly 
rare thing after training is done.
More of a static compiler, very similar to the way Slang is used today, 
just without the language limitations we currently have with Slang.
...
Then, if compilation becomes a rare thing, you could even have 
different compiler modes for fast vs. good compilation, and then run
really good optimizations to get closer to C perf. during the training 
runs, and use a faster compiler (perhaps even really simple,
fast non-inlining one) during normal execution, if the really good one 
is too slow.
Depending on your needs, you might not even need a compiler during 
execution, for example for a 'packaged' binary...
...
This would let your good compiler eventually do
better optimizations than we could afford, with the sky the limit!
Yeah!
Marcel
--
Marcel Weiher				Metaobject Software Technologies
marcel@metaobject.com		www.metaobject.com
Metaprogramming for the Graphic Arts.   HOM, IDEAs, MetaAd etc.