iteration - don't optimize it

Scott A Crosby crosby at qwes.math.cmu.edu
Fri Feb 1 03:37:51 UTC 2002


On Thu, 31 Jan 2002, Mark Mullin wrote:

> Ahh - that got your attention.
>
> No, I'm not really opposed to improving iteration speeds, by all means have
> at it.  Terrific idea. My own vote is cast for a hashing class based on
> SHA5.  I think you could get interesting results from using a perfect
> hash,even if the thing is 120 bits.

Given how memory operates, and assuming a 32-bit limit on addresses, no
more than 28 bit hashes are necessary. Extra bits are just useless drivel.

The existing dictionaries are perfectly scalable, the problems are either
slow hash functions, or misusing ProtoObject>>hashBits.

> For the record, C++ iteration is what we use to deal with problems that
> really vex the computer but that the user could care less about.  Smalltalk
> has a richer and more powerful set of iterators and collections than STL,
> but the costs of such a dynamic flexible solution are more than we can bear.

Careful.. Please show a profiling output before making this claim. I've
gotten 30%, 10x, even 1000x better iteration collection performance
because of fixing bugs, or identifying other performance problems.

My VM is 30% faster overall, about 50% faster iteration performance.

My identityHash patches can handle certain large dictionaries 1000x
faster.

I've fixed other flaws (avoiding expensive hash functions) for 5x
speedups.

My refactored skiplist can handle random inserts at 1k/sec, in a
collection with 100,000 objects.

I don't care how much faster C is at the lowlevel. If its easy to use
superior collections in smalltalk, and hard in C, given sufficient data,
I'll beat C in coding performance and algorithms every time.

Superior algorithms, if they're easy to implement, will beat naive
algorithms on any sufficiently large dataset every time.

>
> I really only keep responding to this because I do feel it's
> counterproductive to go "I don't care what you got, Squeak can do it
> better".  I'm just arguing for a slight modification -  If you don't care
> enough to profile it to death, then Squeak will do perfectly.   Recognizing
> this means we can say, we don't care what your problem is, what Smalltalk
> can't do well directly it can do by being extended.  Legacy systems,
> databases, whatever.

Subscribe to the lazy profiling approach. I've been lazy; going for easy
and low-handing fruit, and done well. It may be easier to, for example,
write it in smalltalk, and improve the C translater and/or the interpreter
than to reimplement it in C. So, don't forget that C profiling of the
intpreter. Usually any screwup is very obvious in the profile.

Ingenuity can do amazing things. Imaging what, say, Squeak would be like
if it had even 1/10 the time spent on it as Java has? If we had, for
example, even a few first-rate compiler guys. A few first-rate PL guys.

Java's come a long way with people like that working on it. Thats why I'd
say the inherent problems aren't in Smalltalk, but in the lack of people
working on it.

I've seen the output of the slang interpreter, and it almost makes me want
to cry. As the interpreter is in flux with the BC patches. (and my new
method cache), I don't plan on really looking at the VM for another few
months..

But its looking like the next big win may be in working on the GC.

Finally,

Smalltalk is not necessarily inherently slow. How many millions of man
hours have been spent on C compilers? Many of the smartest people in the
world have been working for decades on C and static-language compilation.
Contrast this to Smalltalk.

I'll bet you that if you got together a couple of profs here and about a
half-dozen grad students on the project, you'd have a dynamic
compiler/interpreter combo that'd be fully dynamic, yet come very close to
C++, *perhaps* even exceed it. (Dynamic recompilation is very very cool.)

Smart people can do amazing things; the trick is attracting them to
squeak/smalltalk.

> it's using static storage.  For example, in such a world one has a tendancy
> to use a vector to aquire data of unknown length, and then immediately
> reconstruct it as a fixed size array as soon as the length is known.

Actually, the extra overhead of squeak's allocate-an-array.  Fill-it-up.
If it fills up, make a new array larger array and copy it over scheme is
just as fast as the C one.. Actually squeak's technique, can be slightly
faster ~10% (in terms of array write operations.)

> On reflection, if there are those who think I'm arguing for recoding the
> collection classes or using plugins to do collection --  neat idea, but no.

Hell no! :)



Scott





More information about the Squeak-dev mailing list