On Friday, July 19, 2002, at 10:53 Uhr, David Griswold wrote:
The big issue is the speed, which you aptly captured with the word "lumbered". There are two speed issues:
- Startup time: Strongtalk already doesn't start as quickly as Squeak
because of the initial compilation "bulge". Snort would start even more slowly, not only because the compiler is being interpreted, but also because it is being compiled. This might not be to bad a problem, or it might. The critical issue is what the performance locality profile of the compiler is. In other words, does the compiler spend most of its time in some small fraction of its code, or does most of the compiler get exercised pretty heavily. I don't have an answer for that, but the more locality it has, the faster it would speed up.
Just an idea: maybe too much effort is being spent on speeding up the whole system, and doing it both transparently and dynamically during execution. Maybe speeding up the whole system dynamically by some factor isn't all that important. Maybe it is more important to know you can speed up certain portions to within an epsilon of C without going through strange contortions?
<quote> Would anyone care if it ran 10 times faster? and how much work would this be?
Yes, yes, yes. I think the Smalltalk community has always underestimated the extent to which performance matters in the real world (and the extent to which proponents of other languages use it as a weapon against Smalltalk). Of course it is true that first you need to write your algorithms correctly, and that *most* applications don't need ultimate speed. However, it can be very hard to predict ahead of time whether your application will need some critical algorithm to run at top speed, and once you are in that situation, it can be incredibly painful to have to write some algorithms in another language, especially if by chance they need to extensively access, create, or modify shared data structures. So this makes it a gamble: are you willing to *bet* your success on the hope that your application will never be compute bound? </quote>
My experience with Objective-C very much confirms this. Speed really isn't that much of an issue for most of the code, which can actually be written in WebScript (a slow interpreter of a language very close to Objective-C). However, knowing that you can get to the metal without a drastic rewrite (and I do consider Slang + pluginizaation fairly drastic) is both psychologically comforting and tremendously useful in practice.
I also wouldn't mind having to make annotations in order to get the last bit of performance out of specific pieces of code.
- Asymptotic compiler speed: once the compiler is running essentially
full speed, is that fast enough? Even with all the Strongtalk optimizations in place, such code is still a lot slower than fully optimized C++ (in principle it could get fairly close to C++ performance, but the limiting factors are the fact that the best algorithms for more advanced optimizations like scheduling and register allocation are usually too slow to run in realtime, as well as being a giant pain to write efficiently). So I would expect that a compiler similar to the Strongtalk one written in Smalltalk would run about 5-10 times slower than the Strongtalk compiler (although with tuning you might eventually be able to speed it up a lot).
Strongtalk is still 5-10 times slower than C/C++? So doing the compile off-line would probably be a better idea. Could 'C' be generated? That way, there would be a fall-back that would at least allow static compilation.
If you have to recompile all the perf. critical code everytime the system starts up, all this might mean very slow startup, and noticeable pauses for compilations thereafter. But one way to get around all of this might be to keep the compiled code in ObjectMemory and save it with the image, in which case you could start the system in an already optimized state (complicates image portability, but worth it).
Why not store the generated code as a dynamically generated, dynamically loaded shared library, instead of storing within the image? For example, why not have a compiled binary associated with each module? I'd wager that the core modules would not need to be recompiled a lot after they\ve settled. That way, the core classes would really start becoming more part of the VM, or rather, the distinction between VM and objects would start to disappear (with much of the VM also being written 'in objects'). It all becomes more a 'substrate' for running user-defined objects.
If you work a little on the calling-conventions, you might even be able to make the whole thing a lot more interoperable with other compiled code on the system.
In this case, people would generally want to do 'training' runs of their code to get them mostly optimized, and then save the image. Compilation then becomes a fairly rare thing after training is done.
More of a static compiler, very similar to the way Slang is used today, just without the language limitations we currently have with Slang.
Then, if compilation becomes a rare thing, you could even have different compiler modes for fast vs. good compilation, and then run really good optimizations to get closer to C perf. during the training runs, and use a faster compiler (perhaps even really simple, fast non-inlining one) during normal execution, if the really good one is too slow.
Depending on your needs, you might not even need a compiler during execution, for example for a 'packaged' binary...
This would let your good compiler eventually do better optimizations than we could afford, with the sky the limit!
Yeah!
Marcel
-- Marcel Weiher Metaobject Software Technologies marcel@metaobject.com www.metaobject.com Metaprogramming for the Graphic Arts. HOM, IDEAs, MetaAd etc.