Folks,
I've been running Tim Rowledge's NCM VM and image (see http://sumeru.stanford.edu/tim) on win32 successfully for a while now. I'd like to offer a short performance report, because I think (and feel) that I get appreciable results.
The two machines I have infected with Squeak are: A Toshiba Satellite 2520 (K6-2 at 300 MHz, 64 MB, Win98) and a Dell XPS system (P-II at 300 MHz, 128 MB, NT4SP4). The results have not failed to impress me:
<Tosh/Win98> benchFib benchmark [approx.] [sends/s] [bytecodes/s] -------------------------------------------- Squeak 2.3b 595,000 10,245,000 NCMVM 2.3 734,453 10,869,000 Jitter 2.3b 994,000 16,233,000
<Dell/NT4SP4> benchFib benchmark [approx.] [sends/s] [bytecodes/s] -------------------------------------------- Squeak 2.3b 654,000 11,111,000 NCMVM 2.3 768,757 12,165,000 Jitter 2.3b 913,000 16,666,000
Please note that NCM ("new compiled method format") does not try to improve upon Jitter, it is an improvement upon the non-Jitter base VM/image (yet). I included the Jitter figures for relative comparison. As I said, I'm impressed.
I have measured neither memory consumption nor J2 performance, though I'd be very curious if others have. At these significant rates, I wonder what a J2-NCM VM could do...
All in all, I'd vote for merging Tim's improvements into mainline Squeak (2.5?). Not only does NCM enhance the malleability of the VM, it's faster too. How much more could we probably want?
Regards, Helge
Helga -
I've been running Tim Rowledge's NCM VM and image (see http://sumeru.stanford.edu/tim) on win32 successfully for a while now. I'd like to offer a short performance report, because I think (and feel) that I get appreciable results.
<figures snipped>
All in all, I'd vote for merging Tim's improvements into mainline Squeak (2.5?). Not only does NCM enhance the malleability of the VM, it's faster too. How much more could we probably want?
Thanks for your performance report. We have been considering th NCM changes since Tim first got them working. Simplicity and malleability are certainly positive features. Performance, if it's real, is as well.
The main reason I have balked at this change so far is the cost in space for small implementations I believe the current design, even if it makes maximum use of compact headers requires two extra pointer fields and two extra objects, each with a one-word header. This is 16 bytes per method. Taking the 2.2 Mini image as a benchmark (4615 CMs), this would add 73k to an image that is only 530k -- or subtract 73k from what you could do with it on a PDA. I consider this to be material.
A competing change we have been weighing for a long time is to do away with compact headers. This would add about the same amount of space (70k) to the mini image, but would yield greater simplifications and speedups in the virtual machine (IMO). Moreover if we decide to drop compact headers, suddenly the penalty for NCMs increases to 110k. At that point we would have added 180k to the 530k mini image. This is why I am being conservative.
Since you have published the numbers (and since I am certainly curious about them), I have to ask if they represent an apples-to-apples comparison. Are the VMs generated from the same compiler, and are the only differences in the handling of methods? I would be surprized to find more than a nominal increase in speed due to this change, since there is very little difference once a method is in the cache.
Finally, and I am glad you raise the point, I am interested in how this all will appear when seen through the eyes of Jitter. One of the beauties of Jitter is that it affords an opportunity to lay a completely different set of speed/space tradeoffs on top of the existing system. While it has consumed a lot of Ian's brain cells (fortunately there are plenty ;-), it is exhibiting remarkable freedom from the apparent costs of our current CompiledMethods. Ian may want to say more this and about compact classes which are somewhat more of an issue.
I know Tim has probably been frustrated at our apparent recalcitrance in not immediately adopting NCMs, so I'm glad you have induced me to articulate the issues.
- Dan
At 22:14 24.03.99 -0800, Dan Ingalls wrote:
The main reason I have balked at this change so far is the cost in space for small implementations. [...] At that point we would have added 180k to the 530k mini image. This is why I am being conservative.
Thanks for the rationale. I had "paged out" this important limiting factor.
On the other hand, if such a speedup was reproducable on a PDA system, it might still be worth considering NCM to squeeze out more sends/sec from the processor. Depends on the specific hardware situation given.
Since you have published the numbers (and since I am certainly curious about them), I have to ask if they represent an apples-to-apples comparison. Are the VMs generated from the same compiler,
Touché. I have to admit that I just *assumed* they were comparable. Andreas' off-the-shelf VM was compiled by the current Visual-C++, correct?
and are the only differences in the handling of methods?
I see that Tim jumped to my help with a full explanation. :)
I know Tim has probably been frustrated at our apparent recalcitrance in not immediately adopting NCMs, so I'm glad you have induced me to articulate the issues.
I did not want to sound "demanding"- it's just that I was happy with the results I got, wondered if others shared the same experience or what the general reception was.
Dan, Tim, Ian- Thank you all for your explanations and clarifications.
Cheers, Helge
Dan's worries about image size in the recent NCM thread reminded me to ask a question I had almost let slip out of my mind:- when/who/how is the VM going to get cleaned out; by which I mean to ask if anyone is actively pursuing moving many of the 'extra' prims out into Plugin- land thus making a minimum VM much more easy to produce?
It seems to me that all the serial, sound, socket, midi, etc stuff ought not be in the base VM code. Now that we can extend the VM with plugins, perhaps we can get back to a smaller prim table. I'd suggest that the basic VM ought to have no prims that are not required for the image to do a startup. There may be some exceptional prims that would be less useful with the small overhead of a plugin-call and that have to stay as normal prims. But otherwise even the file prims seem good candidates to move out!
Along with minimum images, headlessness and loadable partial-images, a small VM might make a practical scripting utility possible. It would certainly load and startup more rapidly.
So, is anyone already doing this?
tim
Dan's worries about image size in the recent NCM thread reminded me to ask a question I had almost let slip out of my mind:- when/who/how is the VM going to get cleaned out; by which I mean to ask if anyone is actively pursuing moving many of the 'extra' prims out into Plugin- land thus making a minimum VM much more easy to produce?
.. . .
Along with minimum images, headlessness and loadable partial-images, a small VM might make a practical scripting utility possible. It would certainly load and startup more rapidly.
So, is anyone already doing this?
Before assigning that task, let's see if we can't tune the primitive plugins somewhat better first. I haven't studied the code for awhile, but as I recall, the present primitive plugin (at least for the Mac) performs a table lookup of the plugin name in the module tables with EACH AND EVERY CALL to the primitive. Accordingly, primitives deep in an inner loop can face a substantial overhead just from the primitive calls.
So some sort of caching arrangement needs to be accomplished to assure:
(1) that modules are loaded exactly once for each session in which they are used (but the caches must be cleared on image startup shutdowns;
(2) that each primitive is looked up exactly once for each session in which it is used (likewise, with caches cleared on image startup/shutdowns; and
(3) that the overhead of this protocol isn't itself large in comparison with the overhead of present-day primitive calls once the module is loaded and the primitive has been called at least once.
If this is already in place and I missed it, mea culpa. But without it, let's leave critical innerloop-style primitives inside the VM until we have polished and refined the primitive plugin process somewhat; or else Tim's project (one that ought to be undertaken) may lead to a substantial degradation of performance.
Some thought ought to go to plugin library naming conventions and version-testing conventions as well.
From: Dan Ingalls [mailto:DanI@wdi.disney.com] Sent: 25 March 1999 06:15
[...]
We have been considering th NCM changes since Tim first got them working. Simplicity and malleability are certainly positive features. Performance, if it's real, is as well.
The main reason I have balked at this change so far is the cost in space for small implementations
[...]
A competing change we have been weighing for a long time is to do away with compact headers. This would add about the same amount of space (70k) to the mini image, but would yield greater simplifications and speedups in the virtual machine (IMO).
[I know, I'm late as usual. I've been off-site for a couple of weeks and, while I could get at my mailbox, I couldn't mail out]
For interest, I tried bodging the SystemTracer to write out an image with 3-word headers for every object in the distribution 2.3 image. It took a 3.5Mb image to a shade over 5Mb if I made the correct changes --- a large penalty on a PDA. The flip side is that some of us [hi Adrian] trying to run multi-user systems on Squeak are willing to pay that space for a large speedup if we could get one. I've had my head in the object allocation and GC stuff, so may have a biased view, but it looks like some considerable speed could be gained by doing away with the twisty turny maze of tests about object formats. Has anyone done any experiments in this area? If not, it looks like I'll have to do this anyway at some point, as adding an optional fourth security word to the header causes all kinds of merry hell if you try to keep the existing variable-length headers.
All thoughts welcome!
- Peter
squeak-dev@lists.squeakfoundation.org