Hi, Kevin,
It's incredible how critical the cache is for Squeak performance.
From comparing Tim and your figures, it looks like the larger
cache of the SA110 nearly doubles Squeak performance at the same clock speed. Of course, some of the increase is from the C code optimization in Tim's VM, but it's hard to imagine that accounting for more than 20-30% of the difference.
For comparison, here are the figures for two Sharp Zaurus PDA's, both with Hitachi SH-3 processors:
~120 MHz: 2941176 bytecodes/sec, 133739 sends/sec ~60 MHz: 1801801 bytecodes/sec, 61584 sends/sec
I can't fire up my Casio E-105 right now, but I think it had performance similar to the faster SH-3 above.
If you manage to get the comiler to do some optimization, let me know how much difference that makes.
We actually have an iPaq and I am looking forward to getting Squeak up on it. You mentioned that you were writing down the steps needed to put Squeak on the iPaq. Let me know when that's available (but no rush; it's the holidays!). If I can get the proper development environment set up, I may try to compile a VM myself. Perhaps there is some level of C optimization that produces correct but faster code.
One question: How much RAM is available for the Squeak object heap? (See the VM statistics available under the "help" menu.)
-- John
Kevin wrote:
I managed to get the image on a read/write segment of memory and I got that benchmark for you...the result of "0 tinyBenchmarks" is: 4667444 bytecodes/sec; 155544 sends/sec
Note that this is on an unoptimized VM (compiling with optimizations has problems on ARM).
Tim Rowledge wrote:
That's not too bad for an unoptimized VM; my 202MHz SA110 Acorn (2 x bigger cache than iPaq SA1100, but much slower main memory bus - hey, it's six years old!) gets 10m & 374k on the same test. I _think_ I got up to about 8m bc/sec on the original Itsy port. On a 276MHz SA110 NetWinder I think it's more like 12m & 600k.
John.Maloney@disney.com wrote:
Hi, Kevin,
It's incredible how critical the cache is for Squeak performance.
From comparing Tim and your figures, it looks like the larger
cache of the SA110 nearly doubles Squeak performance at the same clock speed. Of course, some of the increase is from the C code optimization in Tim's VM, but it's hard to imagine that accounting for more than 20-30% of the difference.
Certainly is - my CC is now five years old with no great likelihood of a replacement :-( I DO have a gcc for Acorn but the libraries don't seem to be very helpful at the moment. Maybe I'll have time to play with it soon.
I've claimed for years that the key determinant of speed for Smalltalk is memory bandwidth. Any trick you can do to provide more actual or virtual bandwidth will help. Caches simply fake out the cpu to make it think you have faster memory than is really the case. JITters pre-process to remove some of the bandwidth usage. Better C compilers likewise. What I really want is a nice simple machine with a gigaHertz clock and memory that can keep up :-)
We actually have an iPaq and I am looking forward to getting Squeak up on it. You mentioned that you were writing down the steps needed to put Squeak on the iPaq. Let me know when that's available (but no rush; it's the holidays!). If I can get the proper development environment set up, I may try to compile a VM myself. Perhaps there is some level of C optimization that produces correct but faster code.
There's a resonable chance that you could use higher optimisation for most of the vm and just drop it for the parts that have problems. Ought to be a relatively simple makefile hack, surely? Particularly if all the plugins are made external.
tim
At 6:42 PM -0800 12/28/00, Tim Rowledge wrote:
I've claimed for years that the key determinant of speed for Smalltalk is memory bandwidth. Any trick you can do to provide more actual or virtual bandwidth will help. Caches simply fake out the cpu to make it think you have faster memory than is really the case. JITters pre-process to remove some of the bandwidth usage. Better C compilers likewise. What I really want is a nice simple machine with a gigaHertz clock and memory that can keep up :-)
That matches my experience with Squeak. It was surprising slow on the PPC 603, which had small caches, and unexpectedly fast on the G3, which has a very high-bandwidth second-level cache.
Re:
There's a resonable chance that you could use higher optimisation for most of the vm and just drop it for the parts that have problems. Ought to be a relatively simple makefile hack, surely? Particularly if all the plugins are made external.
Sounds plausible. I don't have a development environment for StrongARM, so I'll leave it up to Kevin for now.
In a private message, Joern Eyrich said he got over 10M bytecodes/sec on the iPaq with a WinCE-based VM, so C optimization must make a bigger difference than I expected. Either that, or WinCE is more efficient than Linux... no, let's not even *think* about that! :->
-- John
[snip]
Hi John, Tim:
Just got back from some much-needed holiday time so I'm just wading thru my email now:
Re:
There's a resonable chance that you could use higher optimisation for most of the vm and just drop it for the parts that have problems. Ought to be a relatively simple makefile hack, surely? Particularly if all the plugins are made external.
Sounds plausible. I don't have a development environment for StrongARM, so I'll leave it up to Kevin for now.
Yes, I think this should be easy to do. The ARM GCC only has problems with the sqUnixSound plugin, from what I can tell. I'll try rebuilding with optimizations on everything except the sound and see how it goes. I was just in a rush to get something working for the holidays. ;)
I'm building it on a native ARM system provided by the handhelds.org/Compaq folks so I won't be able to upgrade the GCC to a 'fixed' version...that'll have to happen at their convenience, I'm afraid. :) Perhaps I'll try setting up the cross-compiler environment...it's been quite a while since I've set up GCC as a cross compiler but I don't remember it being as nightmarish as the handhelds.org site claims.
In a private message, Joern Eyrich said he got over 10M bytecodes/sec on the iPaq with a WinCE-based VM, so C optimization must make a bigger difference than I expected. Either that, or WinCE is more efficient than Linux... no, let's not even *think* about that! :->
-- John
Eww! Well that throws the gauntlet down now, doesn't it? :) I'll try and get a new VM compiled as soon as I get my post-holiday sorting and putting-away done.
Hi John:
[snip]
If you manage to get the comiler to do some optimization, let me know how much difference that makes.
I'll try rebuilding with optimization turned off only for the problem module in question...that hopefully will solve the problem until they upgrade the toolchain on handhelds.org.
We actually have an iPaq and I am looking forward to getting Squeak up on it. You mentioned that you were writing down the steps needed to put Squeak on the iPaq. Let me know when that's available (but no rush; it's the holidays!). If I can get the proper development environment set up, I may try to compile a VM myself. Perhaps there is some level of C optimization that produces correct but faster code.
Yes, I will try and finish the howto as soon as possible. I'll put it up on minnow when it's done. :) (It won't happen today as I'm currently trying to rest off the effects of a nine-hour drive...yikes!)
For now, check out the handhelds.org 'howto' section...that will pretty much get you started with Linux on the iPaq.
They recommend not using a cross-compiling setup because they claim it makes life miserable in the long run...I dunno, I've set up GCC in cross-compiling environments before and it wasn't THAT bad. :) Of course you'd have to cross-build all the X11 libraries as well.
You can just telnet to their 'skiff' cluster of ARM boxen and compile whatever you want without the hassle, however. I think the link on how to log into it is on the howto page as well. If you do this, I've already got the squeak 2.9 build sources in the 'kgf@golden.net' subdirectory.
One question: How much RAM is available for the Squeak object heap? (See the VM statistics available under the "help" menu.)
On my iPaq it says I've got 13,615,464 bytes total memory, with about 12 megs free.
Mind you, I've got the image on read-only flash and not in the read-write DRAM filesystem (which would be ideal in the future once they reduce the power sleep/suspend mode uses in the kernel).
Actually, this brings up a point I was wondering about. The DRAM filesystem is the 'live' filesystem where any user data to be kept and synchronized would exist. The FLASH filesystem has all the stuff that doesn't change at all (such as executables and config files). There's only 32megs of DRAM available, and I imagine this gets split between running applications and the DRAM filesystem so.. as far as maximizing space goes, is there any way to keep the image on the read-only filesystem, but have any changes go to the DRAM filesystem? Is this making any sense? :) (My head is still road-addled and filled with road salt).
-- John
Kevin wrote:
I managed to get the image on a read/write segment of memory and I got that benchmark for you...the result of "0 tinyBenchmarks" is: 4667444 bytecodes/sec; 155544 sends/sec
Note that this is on an unoptimized VM (compiling with optimizations has problems on ARM).
Tim Rowledge wrote:
That's not too bad for an unoptimized VM; my 202MHz SA110 Acorn (2 x bigger cache than iPaq SA1100, but much slower main memory bus - hey, it's six years old!) gets 10m & 374k on the same test. I _think_ I got up to about 8m bc/sec on the original Itsy port. On a 276MHz SA110 NetWinder I think it's more like 12m & 600k.
squeak-dev@lists.squeakfoundation.org