How I wasted one our of my time :-)

Thu Jun 15 17:12:19 UTC 2000

We need to be more careful about the base system. A lot of methods perform 
very poor but do not have so.  I'd like to tell this story of profiling and 
code exploration.

It all began with this harmless expression I wanted to use to count the 
number of polymorphic methods in my system.

  | b |
  b := Bag new.
  Smalltalk allClasses do: [:e | b addAll: e selectors].
  b sortedCounts

It however took forever to compute.  I got impatient, aborted the execution 
and tried

  ... from: 1 to: 100 do: ...

instead of

  ... do: ...

which is a nice way to enumerate only over a fraction of a 
collection.  That was much better and I figured out that my impatient 
threshold was around 400.

Profiling that expression, I noticed that about 98% of execution time were 
spent in Dictionary>>scanFor: and Symbol>>= as well as in 
primitives.  Looking at #scanFor: and thinking about the result, I realized 
that probably Symbol>>hash causes the problem.  "But why," I was asking 
myself.  Symbol uses a primitive (number 75) which should be well tested 
and fast.

It might be a good idea to check the distribution, I thought, and wrote the 
following code to test.

  | m a h |
  m := 0.
  Smalltalk allClasses do: [:class |
    class selectors do: [:sel |
      m := m max: sel hash]].
  a := Array new: m + 1 withAll: 0.
  Smalltalk allClasses collect: [:class |
    (class selectors collect: [:sel |
      h := sel hash + 1.
      a at: h put: (a at: h) + 1]].
  a

I looked at the resulting array.  First, I was irritated that I got only 
4096 elements, then suddenly I remembered that hash values have only 12 
bits. Yes,

  1 bitShift: 12

answered 4096 so that was why my 24489 methods I tried to store in one big 
bag performed so poorly.  By average, six symbols should have the same 
identical hash value.  Let us see:

  a count --> error

grr, it's called #sum, again

  a sum --> 24489

(You read that number already)

  a min --> 0

Oh, some values are never chosen.  Does this happen often?

  a count: [:each | each = 0] --> 99

Well, only 3997 different values are generated by Symbol>>hash.  Are these 
values distributed equally?  Unfortunately, I didn't find a good 
statistical method (chi square test etc) but I tried

  a max --> 315

Oops?  That's bad.  Let's compile a small table of derivates

  #(10 15 20 50 100 200) collect: [:d |
    d -> (a count: [:e | e >= d])]
  -->
  #(10->536 15->172 20->79 50->14 100->2 200->1)

It's a lousy distribution which might become a problem.

But back to my story.  With better hash values my initial expression should 
be much faster.  I'm not the first one who noticed this problem so there's 
already a solution:  PluggableSets and directories.  Unfortunately, there's 
no PluggableBag but the following code should do the trick:

  | b |
  b := Bag new.
  b setContents: ((PluggableDictionary new: 32000)
    hashBlock: [:obj | ((obj hash bitShift: 8)
      bitXor: (obj first asciiValue bitShift: 4))
      bitXor: obj last asciiValue] fixTemps).
  Smalltalk allClasses do: [:e | b addAll: e selectors].
  b

It defines a new hash value for symbols with a range 256 times 
larger.  This was nearly two orders of magnitude faster than the first 
approach.  Actually, I noticed that

  [:obj | obj hash bitShift: 3]

is already sufficient as now the values are much better spread over the 
dictionary and resolving the average of 6 clashes is fast enough (this 
theoretically allows for up to 8 clashes or some 32000 symbols).

It might be a good idea to do this by default for all symbols.  I'm not 
sure whether there's any part of the Smalltalk source dependent on the fact 
that hash values are < 4096.  Anybody how knows may speak up.

I then inspected the resulting bag and made the mistake to click on the 
'contents' instance variable slot.  My system froze.  I worked in morphic, 
so obviously the inspector tried to display the complete - quite large - 
PlugableDictionary not only once but periodically.  Calling a debugger was 
therefore no real help but I had to close that inspector.   Something that 
only successes with real patience, you unfortunately notice here that 
Squeak is a polling system.

After an eternity or two, I tried to figure out what happened. The expression

  MessageTally spyOn: [contents printString]

should give the answer.  I always use this tool if I need to figure out 
where time is spent as guessing leads often if not always to the wrong 
path.  It was obviously the sorting of the dictionary keys before printing 
took place.

I also noticed that printStringLimitedTo: is still broken.  I noticed that 
a few weeks ago but never filed a bug report.  Now I remembered.  The funny 
thing is, that it's broken since 1998. Try

  (contents printStringLimitedTo: 5000) size
   --> 6481

I'd have expected 5000 but not more.  In this case it's not that bad, but 
sometimes, limiting doesn't work at all because WriteStream>>nextPutAll: 
disables the limiting code (consider this as a bug report).  If you want to 
reproduce the bug, test

  (String
    streamContents: [:s |
      s nextPutAll: (String new: 100)]
    limitedTo: 10) size  -->  100

But 99% percent of execution time were spent in sorting so that was the 
code to attack.  I identified OrderedCollction>>insert:before: as the 
"Übelkübel" (learn that word, it's like angstlust or zeitgeist ;-), the 
most problematic method.  What happens here, when the SortedCollection is 
created using calls to #add:, an insertion sort algorithm is used to insert 
the keys at the right place.  Assuming keys are not randomly distributed 
(as a good hash value should do) or keys are already preordered, insertion 
sort is better than quick sort.  But as I said, the assumptions are 
probably wrong.  So this algorithm is at least questionable.

But even if we accept the algorithm, it's IMHO questionable that for 
insertion, we always move all elements before the insertion point one step 
down - even if moving all elements after the insertion point up would copy 
less elements.  No such test is performed.  So even if keys are ordered and 
so always added at or near the end, all elements are always copied.

And we're talking about 15469 keys here, assuming random order, we'll have 
copy around 119 _million_ elements here.  If we could at least use a 
primitive for that copy loop... Fortunately, we can.  Here's a better 
implementation

  ...
  array
    replaceFrom: firstIndex
    to: spotIndex - 2
    with: array
    startingAt: firstIndex + 1.
   array at: spotIndex - 1 put: anObject.
   ...

This still moves elements down every time but the speedup was again about 
two orders of magnitude.  But the printing still took about 3 seconds.  If 
we look at Dictionary>>keysSortedSafely, we see a very difficult sort block 
in which we spend about 40% of all time (50% is #insert:before: and 10% 
printing).  Replacing #isKindOf: and #& with #is... and #and: saved 25% of 
execution time.  Now more than 70% of the time was spent in 
#insert:before:. What if I'd replace

  self keysDo: [:aKey | sortedKeys add: aKey].

with

  sortedKeys addAll: self keys.

Not bad!  Additional 18% speedup.  Unfortunately, a some 15000 elements 
sized PluggableSet gets now created just to add its elements.  So I tried

  self keysDo: [:each | sortedKeys addLast: each].
  ^ sortedKeys reSort

which increases the speedup to 40% total.

A system with the contents slots accidentally selected stays now barely 
usable but still, it's nothing you'd want to work with.

So I dug into class Inspector and found that in stepAt:in: the system tries 
every second to generate a new printString - which needs more than 1.5 
seconds on my system!  So the whole morphic world does nothing else but 
generating print strings... #@!?%$

As a first thing, I added this method to class Inspector

  stepTimeIn: aSystemWindow
    ^ 5000

(after I noticed that stepTime didn't work.  Why can't Object>>stepTimeIn: 
default to Object>>stepTime!?)

What a relieve!  The system reacts sluggish (and you'll again get annoyed 
by the effects of polling) but most of the time (3.5 seconds of 5 - or 70%) 
it is responsive as ever.

So I had the perhaps crazy idea to let the inspect measure the time needed 
to generate the print string and then automatically adapt itself so that at 
most 10% are spent for generating stupid print strings.  I added an 
instance variable "selectionUpdateTime", wrapped #selectionPrintString in an

  selectionUpdateTime _ [text _ ...] timeToRun.
  ^ text

block and changed the new #stepTimeIn: to

  ^ (selectionUpdateTime ifNil: [0]) * 10 max: 1000

Yeah!  An alternative might be to disable automatic refreshing if a certain 
time threshold is reached and to have a menu setting to switch refreshing 
on/off.

So I could eventually ask my bag for #sortedCounts to get the hit list of 
the most polymorphic methods.  Why I wanted to know that you ask?

Well, I was interested in how many different categories the most often used 
methods are put.  #initialize (which is implemented 311 times) can be found 
in 26 different categories.

175->#initialization 126->#'class initialization' 67->#'as yet 
unclassified' 53->#initialize 33->#'initialize-release' 7->#private 6->#all 
4->#'private-initialization' 2->#'initialize-reset-release' 1->#events 
1->#misc 1->#'class variables' 1->#'simple keyboard' 1->#'parser support' 
1->#'initialization-release' 1->#simulation 1->#'ST initialize' 1->#'like 
fullCopy' 1->#public 1->#'common requests' 1->#initializing 1->#accessing 
1->#'instance creation' 1->#application 1->#project 1->#Initializing

This makes the core system really untidy IMHO.  If I look for a certain 
method, I should only need to look into one or only a few obvious 
categories.  It seems however that people life out their creativity (or 
laziness in the case of 'as yet unclassified') in picking strange category 
names.

Who would expect to find printOn: (156 occurences) in all of these categories?

127->#printing 9->#'as yet unclassified' 5->#accessing 3->#misc 3->#private 
3->#other 2->#access 1->#'double dispatching' 1->#display 1->#'fileIn/Out' 
1->#'file in/out'

To make it easier to detect, I then changed this method in MessageSet

  initializeMessageList: anArray
    messageList := anArray collect: [:each |
      MessageSet parse: each toClassAndSelector: [:class :sel |
        class name , ' ' , sel , ' {' ,
        ((class organization categoryOfElement: sel) ifNil: ['']) , '}']].
    messageListIndex := 0.
    contents := ''

so that all implementors and senders browsers automatically show the method 
category.  One can now as question like "Andreas why did you categorize 
B3DMatrix4x4>>printOn: as 'double dispatch'? ;-)

I think, we should reorganize that parts of the system were polymorphic 
methods (especially if overwritten in subclasses) are in different categories.

Thank you for reading.  I'll issue a change set with my next bunch of 
update candidates.

bye
--
Stefan Matthias Aust  //  Bevor wir fallen, fallen wir lieber auf