Ahh - that got your attention.
No, I'm not really opposed to improving iteration speeds, by all means have at it. Terrific idea. My own vote is cast for a hashing class based on SHA5. I think you could get interesting results from using a perfect hash,even if the thing is 120 bits.
On the other hand, maybe I didn't make my original point too well. My thread started over "can one language do everything" and my response was "no, languages are tools and you shouln't pound in a nail with a power saw".
From my own perspective, I live in a world where we drop down to native
platform machine code in a few places, and C++ code may often be written to deal with little issues like "Pentiums usually predict you'll take a reverse branch over a forward branch." When performance gets to be that much of an issue (we have clients that require many billions of high level linear algebra calcs per second) you tend to survey your language tools very carefully.
For the record, C++ iteration is what we use to deal with problems that really vex the computer but that the user could care less about. Smalltalk has a richer and more powerful set of iterators and collections than STL, but the costs of such a dynamic flexible solution are more than we can bear. Nothing I've said should be construed to mean I'm talking about collections the user interacts with directly. In my first post I observed that C/C++ was prefered over Smalltalk when the computers oddest peripheral wasn't directly involved :-)
I really only keep responding to this because I do feel it's counterproductive to go "I don't care what you got, Squeak can do it better". I'm just arguing for a slight modification - If you don't care enough to profile it to death, then Squeak will do perfectly. Recognizing this means we can say, we don't care what your problem is, what Smalltalk can't do well directly it can do by being extended. Legacy systems, databases, whatever.
But I rather violently disagree with those that claim you should just use Smalltalk cause it's iterators aren't as fast as C/C++. Bull. A C/C++ iterator/loop construct is a lot more expensive to write, a lot more expensive to maintain, and a lot less accessable. It's also a hell of a lot faster cause it's compiled, and cause if performance is the central thread, it's using static storage. For example, in such a world one has a tendancy to use a vector to aquire data of unknown length, and then immediately reconstruct it as a fixed size array as soon as the length is known.
On reflection, if there are those who think I'm arguing for recoding the collection classes or using plugins to do collection -- neat idea, but no. I like Smalltalk collections just fine, I was only using collections/iteration as an example of why languages were tools. What is particularly interesting these days is that Smalltalk and it's close relatives seem to be the only ones that truly recognize that the class hierarchy is in fact a derivative language, and implicitly, a tool in its own right. In the real world of C++ it's hard to keep two class libraries from either stepping on each other or being incapable of efficient interoperation, depending on namespace:. In Smalltalk land, we regularly seem to throw 30 peoples work into one unified whole with a lot less sweat.
I wrote a book in 89 under duress about how to use C++, and my primary focus was on 'if you do these things this way, you too can have the ability to create and share class libraries like we do every day in Smalltalk'. Funny, things haven't worked out that way. Yes we have more libraries, but nowhere near what I have expected. Now that's what I think is most unique of all about the Smalltalk tool. Why is it so damn good at collecting, merging, and sharing the work of many participants and everything else just sucks soooo bad.
Mark
On Thu, 31 Jan 2002, Mark Mullin wrote:
Ahh - that got your attention.
No, I'm not really opposed to improving iteration speeds, by all means have at it. Terrific idea. My own vote is cast for a hashing class based on SHA5. I think you could get interesting results from using a perfect hash,even if the thing is 120 bits.
Given how memory operates, and assuming a 32-bit limit on addresses, no more than 28 bit hashes are necessary. Extra bits are just useless drivel.
The existing dictionaries are perfectly scalable, the problems are either slow hash functions, or misusing ProtoObject>>hashBits.
For the record, C++ iteration is what we use to deal with problems that really vex the computer but that the user could care less about. Smalltalk has a richer and more powerful set of iterators and collections than STL, but the costs of such a dynamic flexible solution are more than we can bear.
Careful.. Please show a profiling output before making this claim. I've gotten 30%, 10x, even 1000x better iteration collection performance because of fixing bugs, or identifying other performance problems.
My VM is 30% faster overall, about 50% faster iteration performance.
My identityHash patches can handle certain large dictionaries 1000x faster.
I've fixed other flaws (avoiding expensive hash functions) for 5x speedups.
My refactored skiplist can handle random inserts at 1k/sec, in a collection with 100,000 objects.
I don't care how much faster C is at the lowlevel. If its easy to use superior collections in smalltalk, and hard in C, given sufficient data, I'll beat C in coding performance and algorithms every time.
Superior algorithms, if they're easy to implement, will beat naive algorithms on any sufficiently large dataset every time.
I really only keep responding to this because I do feel it's counterproductive to go "I don't care what you got, Squeak can do it better". I'm just arguing for a slight modification - If you don't care enough to profile it to death, then Squeak will do perfectly. Recognizing this means we can say, we don't care what your problem is, what Smalltalk can't do well directly it can do by being extended. Legacy systems, databases, whatever.
Subscribe to the lazy profiling approach. I've been lazy; going for easy and low-handing fruit, and done well. It may be easier to, for example, write it in smalltalk, and improve the C translater and/or the interpreter than to reimplement it in C. So, don't forget that C profiling of the intpreter. Usually any screwup is very obvious in the profile.
Ingenuity can do amazing things. Imaging what, say, Squeak would be like if it had even 1/10 the time spent on it as Java has? If we had, for example, even a few first-rate compiler guys. A few first-rate PL guys.
Java's come a long way with people like that working on it. Thats why I'd say the inherent problems aren't in Smalltalk, but in the lack of people working on it.
I've seen the output of the slang interpreter, and it almost makes me want to cry. As the interpreter is in flux with the BC patches. (and my new method cache), I don't plan on really looking at the VM for another few months..
But its looking like the next big win may be in working on the GC.
Finally,
Smalltalk is not necessarily inherently slow. How many millions of man hours have been spent on C compilers? Many of the smartest people in the world have been working for decades on C and static-language compilation. Contrast this to Smalltalk.
I'll bet you that if you got together a couple of profs here and about a half-dozen grad students on the project, you'd have a dynamic compiler/interpreter combo that'd be fully dynamic, yet come very close to C++, *perhaps* even exceed it. (Dynamic recompilation is very very cool.)
Smart people can do amazing things; the trick is attracting them to squeak/smalltalk.
it's using static storage. For example, in such a world one has a tendancy to use a vector to aquire data of unknown length, and then immediately reconstruct it as a fixed size array as soon as the length is known.
Actually, the extra overhead of squeak's allocate-an-array. Fill-it-up. If it fills up, make a new array larger array and copy it over scheme is just as fast as the C one.. Actually squeak's technique, can be slightly faster ~10% (in terms of array write operations.)
On reflection, if there are those who think I'm arguing for recoding the collection classes or using plugins to do collection -- neat idea, but no.
Hell no! :)
Scott
Scott A Crosby writes: <snip>
Smalltalk is not necessarily inherently slow. How many millions of man hours have been spent on C compilers? Many of the smartest people in the world have been working for decades on C and static-language compilation. Contrast this to Smalltalk.
I'll bet you that if you got together a couple of profs here and about a half-dozen grad students on the project, you'd have a dynamic compiler/interpreter combo that'd be fully dynamic, yet come very close to C++, *perhaps* even exceed it. (Dynamic recompilation is very very cool.)
Smart people can do amazing things; the trick is attracting them to squeak/smalltalk.
People are amazed when they watch me use my old PPC macos box or even order 68k macos box with Macintosh Common Lisp.
Oh, they say, it's interpreted. Um, no, I respond, compiled. When do you compile it? When I type, there is no choice. No, that can't be true.
one defun and disassemble later and their view of the world changes :-)
squeak-dev@lists.squeakfoundation.org