How I wasted one our of my time :-)
Stefan Matthias Aust
sma at 3plus4.de
Thu Jun 15 17:12:19 UTC 2000
We need to be more careful about the base system. A lot of methods perform
very poor but do not have so. I'd like to tell this story of profiling and
code exploration.
It all began with this harmless expression I wanted to use to count the
number of polymorphic methods in my system.
| b |
b := Bag new.
Smalltalk allClasses do: [:e | b addAll: e selectors].
b sortedCounts
It however took forever to compute. I got impatient, aborted the execution
and tried
... from: 1 to: 100 do: ...
instead of
... do: ...
which is a nice way to enumerate only over a fraction of a
collection. That was much better and I figured out that my impatient
threshold was around 400.
Profiling that expression, I noticed that about 98% of execution time were
spent in Dictionary>>scanFor: and Symbol>>= as well as in
primitives. Looking at #scanFor: and thinking about the result, I realized
that probably Symbol>>hash causes the problem. "But why," I was asking
myself. Symbol uses a primitive (number 75) which should be well tested
and fast.
It might be a good idea to check the distribution, I thought, and wrote the
following code to test.
| m a h |
m := 0.
Smalltalk allClasses do: [:class |
class selectors do: [:sel |
m := m max: sel hash]].
a := Array new: m + 1 withAll: 0.
Smalltalk allClasses collect: [:class |
(class selectors collect: [:sel |
h := sel hash + 1.
a at: h put: (a at: h) + 1]].
a
I looked at the resulting array. First, I was irritated that I got only
4096 elements, then suddenly I remembered that hash values have only 12
bits. Yes,
1 bitShift: 12
answered 4096 so that was why my 24489 methods I tried to store in one big
bag performed so poorly. By average, six symbols should have the same
identical hash value. Let us see:
a count --> error
grr, it's called #sum, again
a sum --> 24489
(You read that number already)
a min --> 0
Oh, some values are never chosen. Does this happen often?
a count: [:each | each = 0] --> 99
Well, only 3997 different values are generated by Symbol>>hash. Are these
values distributed equally? Unfortunately, I didn't find a good
statistical method (chi square test etc) but I tried
a max --> 315
Oops? That's bad. Let's compile a small table of derivates
#(10 15 20 50 100 200) collect: [:d |
d -> (a count: [:e | e >= d])]
-->
#(10->536 15->172 20->79 50->14 100->2 200->1)
It's a lousy distribution which might become a problem.
But back to my story. With better hash values my initial expression should
be much faster. I'm not the first one who noticed this problem so there's
already a solution: PluggableSets and directories. Unfortunately, there's
no PluggableBag but the following code should do the trick:
| b |
b := Bag new.
b setContents: ((PluggableDictionary new: 32000)
hashBlock: [:obj | ((obj hash bitShift: 8)
bitXor: (obj first asciiValue bitShift: 4))
bitXor: obj last asciiValue] fixTemps).
Smalltalk allClasses do: [:e | b addAll: e selectors].
b
It defines a new hash value for symbols with a range 256 times
larger. This was nearly two orders of magnitude faster than the first
approach. Actually, I noticed that
[:obj | obj hash bitShift: 3]
is already sufficient as now the values are much better spread over the
dictionary and resolving the average of 6 clashes is fast enough (this
theoretically allows for up to 8 clashes or some 32000 symbols).
It might be a good idea to do this by default for all symbols. I'm not
sure whether there's any part of the Smalltalk source dependent on the fact
that hash values are < 4096. Anybody how knows may speak up.
I then inspected the resulting bag and made the mistake to click on the
'contents' instance variable slot. My system froze. I worked in morphic,
so obviously the inspector tried to display the complete - quite large -
PlugableDictionary not only once but periodically. Calling a debugger was
therefore no real help but I had to close that inspector. Something that
only successes with real patience, you unfortunately notice here that
Squeak is a polling system.
After an eternity or two, I tried to figure out what happened. The expression
MessageTally spyOn: [contents printString]
should give the answer. I always use this tool if I need to figure out
where time is spent as guessing leads often if not always to the wrong
path. It was obviously the sorting of the dictionary keys before printing
took place.
I also noticed that printStringLimitedTo: is still broken. I noticed that
a few weeks ago but never filed a bug report. Now I remembered. The funny
thing is, that it's broken since 1998. Try
(contents printStringLimitedTo: 5000) size
--> 6481
I'd have expected 5000 but not more. In this case it's not that bad, but
sometimes, limiting doesn't work at all because WriteStream>>nextPutAll:
disables the limiting code (consider this as a bug report). If you want to
reproduce the bug, test
(String
streamContents: [:s |
s nextPutAll: (String new: 100)]
limitedTo: 10) size --> 100
But 99% percent of execution time were spent in sorting so that was the
code to attack. I identified OrderedCollction>>insert:before: as the
"Übelkübel" (learn that word, it's like angstlust or zeitgeist ;-), the
most problematic method. What happens here, when the SortedCollection is
created using calls to #add:, an insertion sort algorithm is used to insert
the keys at the right place. Assuming keys are not randomly distributed
(as a good hash value should do) or keys are already preordered, insertion
sort is better than quick sort. But as I said, the assumptions are
probably wrong. So this algorithm is at least questionable.
But even if we accept the algorithm, it's IMHO questionable that for
insertion, we always move all elements before the insertion point one step
down - even if moving all elements after the insertion point up would copy
less elements. No such test is performed. So even if keys are ordered and
so always added at or near the end, all elements are always copied.
And we're talking about 15469 keys here, assuming random order, we'll have
copy around 119 _million_ elements here. If we could at least use a
primitive for that copy loop... Fortunately, we can. Here's a better
implementation
...
array
replaceFrom: firstIndex
to: spotIndex - 2
with: array
startingAt: firstIndex + 1.
array at: spotIndex - 1 put: anObject.
...
This still moves elements down every time but the speedup was again about
two orders of magnitude. But the printing still took about 3 seconds. If
we look at Dictionary>>keysSortedSafely, we see a very difficult sort block
in which we spend about 40% of all time (50% is #insert:before: and 10%
printing). Replacing #isKindOf: and #& with #is... and #and: saved 25% of
execution time. Now more than 70% of the time was spent in
#insert:before:. What if I'd replace
self keysDo: [:aKey | sortedKeys add: aKey].
with
sortedKeys addAll: self keys.
Not bad! Additional 18% speedup. Unfortunately, a some 15000 elements
sized PluggableSet gets now created just to add its elements. So I tried
self keysDo: [:each | sortedKeys addLast: each].
^ sortedKeys reSort
which increases the speedup to 40% total.
A system with the contents slots accidentally selected stays now barely
usable but still, it's nothing you'd want to work with.
So I dug into class Inspector and found that in stepAt:in: the system tries
every second to generate a new printString - which needs more than 1.5
seconds on my system! So the whole morphic world does nothing else but
generating print strings... #@!?%$
As a first thing, I added this method to class Inspector
stepTimeIn: aSystemWindow
^ 5000
(after I noticed that stepTime didn't work. Why can't Object>>stepTimeIn:
default to Object>>stepTime!?)
What a relieve! The system reacts sluggish (and you'll again get annoyed
by the effects of polling) but most of the time (3.5 seconds of 5 - or 70%)
it is responsive as ever.
So I had the perhaps crazy idea to let the inspect measure the time needed
to generate the print string and then automatically adapt itself so that at
most 10% are spent for generating stupid print strings. I added an
instance variable "selectionUpdateTime", wrapped #selectionPrintString in an
selectionUpdateTime _ [text _ ...] timeToRun.
^ text
block and changed the new #stepTimeIn: to
^ (selectionUpdateTime ifNil: [0]) * 10 max: 1000
Yeah! An alternative might be to disable automatic refreshing if a certain
time threshold is reached and to have a menu setting to switch refreshing
on/off.
So I could eventually ask my bag for #sortedCounts to get the hit list of
the most polymorphic methods. Why I wanted to know that you ask?
Well, I was interested in how many different categories the most often used
methods are put. #initialize (which is implemented 311 times) can be found
in 26 different categories.
175->#initialization 126->#'class initialization' 67->#'as yet
unclassified' 53->#initialize 33->#'initialize-release' 7->#private 6->#all
4->#'private-initialization' 2->#'initialize-reset-release' 1->#events
1->#misc 1->#'class variables' 1->#'simple keyboard' 1->#'parser support'
1->#'initialization-release' 1->#simulation 1->#'ST initialize' 1->#'like
fullCopy' 1->#public 1->#'common requests' 1->#initializing 1->#accessing
1->#'instance creation' 1->#application 1->#project 1->#Initializing
This makes the core system really untidy IMHO. If I look for a certain
method, I should only need to look into one or only a few obvious
categories. It seems however that people life out their creativity (or
laziness in the case of 'as yet unclassified') in picking strange category
names.
Who would expect to find printOn: (156 occurences) in all of these categories?
127->#printing 9->#'as yet unclassified' 5->#accessing 3->#misc 3->#private
3->#other 2->#access 1->#'double dispatching' 1->#display 1->#'fileIn/Out'
1->#'file in/out'
To make it easier to detect, I then changed this method in MessageSet
initializeMessageList: anArray
messageList := anArray collect: [:each |
MessageSet parse: each toClassAndSelector: [:class :sel |
class name , ' ' , sel , ' {' ,
((class organization categoryOfElement: sel) ifNil: ['']) , '}']].
messageListIndex := 0.
contents := ''
so that all implementors and senders browsers automatically show the method
category. One can now as question like "Andreas why did you categorize
B3DMatrix4x4>>printOn: as 'double dispatch'? ;-)
I think, we should reorganize that parts of the system were polymorphic
methods (especially if overwritten in subclasses) are in different categories.
Thank you for reading. I'll issue a change set with my next bunch of
update candidates.
bye
--
Stefan Matthias Aust // Bevor wir fallen, fallen wir lieber auf
More information about the Squeak-dev
mailing list
|