Pocket PC Performance

Torge.Husfeldt at gmx.de Torge.Husfeldt at gmx.de
Mon Dec 30 08:41:47 UTC 2002


Hi Andreas,
You're right, my tests don't deal with hard numbers.
I just made a guess as to what may cause the bad
_image_ performance in later squeak versions.
And given that this performance hit came rather
gradually and nobody has yet been able to tie it
to a particular change, i thought, let's give it a try
and ask people how performance 'feels' with reduced
lookup lengths (btw in my 3.4b image i encountered
lookup lengths up to 46 and i think this is quite a lot).
Once we have two or three votes of people who say
that it helped (or didn't help) we can decide whether
to go on chasing for hard numbers.
But then again, since you are such an expert on i386
VMs you can go chasing hard numbers right away ;)
My proposition of images to compare (with the same vm
since the problems reported are independent of VMs):
#(2.6 2.8 3.0) do:[:imageVersion |
	#(mvc morphic) do:[:guiMode |
		#(true false) do:[:withLookupLenghtsReduced |
				imageVersion startUp.
				project := twm open... <guiMode> Project.
				project enter.
				withLookupLenghtsReduced ifTrue:[applyTorgeSPatch]
				"Do some testing"]]]
... is my pseudo-smalltalk program that describes the test cases.
This could give us some good hints on what the change in this particular
area is and then we may discuss what benefits we can get from which
change.
Note that i'm talking about a performance hit that occured on all
CPUs with all (applicable)VMs between the image versions. I'd be glad 
to give you the numbers on my not so up to date K6-2/350 (I think i can
make it run as slow as 233 if this helps) but i have to make it run
first.

Best Regards,
Torge
"Andreas Raab" <andreas.raab at gmx.de> wrote:
> Hi Torge,
> 
> If you're right with your assumption the difference should be measurable
> from within the VM with different images. I don't know if the devices
> we're talking about have something equivalent to the RDTSC instruction
> on i386 processors but if there's a _really_ cheap way of measuring
> sub-microsecond units (RDTSC, for example, measures clock cycles) then
> it might be really worthwhile to attribute a VM and see if you can find
> out a difference (e.g., time spent in critical areas such as "full"
> method lookup as a percentage of overall time spent).
> 
> The problem I have with your measurements is that (I think) they are not
> really giving you any "good enough" evidence to make a case here. Even
> if it is true that faster images show smaller lookup lengths the
> differences could still be attributed to many other factors - lots of
> things have changed and chasing VM inefficiencies is typically very hard
> if you don't have any "hard numbers" to go along with.
> 
> Personally, my feeling is "half and half" here. Yes, there could be a
> problem with the mcache size as well as the speed of the full method
> lookup. But then, it's _really_ hard to tell without hard numbers.
> 
> Cheers,
>   - Andreas
> 
> > -----Original Message-----
> > From: squeak-dev-admin at lists.squeakfoundation.org 
> > [mailto:squeak-dev-admin at lists.squeakfoundation.org] On 
> > Behalf Of Torge.Husfeldt at gmx.de
> > Sent: Sunday, December 29, 2002 6:38 PM
> > To: squeak-dev at lists.squeakfoundation.org
> > Cc: squeak-dev at lists.squeakfoundation.org
> > Subject: Re: Pocket PC Performance
> > 
> > 
> > Hi All,
> > Can someone who encounters the performance problems mentioned in this
> > thread
> > please try out the following code snippets and report on the outcome?!
> > 
> > First try in a workspace:
> > | lookupLengths |
> > lookupLengths _ SortedCollection new.
> > Behavior allSubInstancesDo:[:class | | md |
> > 	md _ class methodDict.
> > 	lookupLengths addAll:(md keys asSortedCollection collect:[:sel |
> > 			(((md scanFor: sel) - sel identityHash) 
> > \\ md basicSize) -> (class ->
> > sel)])
> > 	].
> > lookupLengths asBag sortedCounts inspect.
> > lookupLenghts last:100 inspect.
> > 
> > This will give you two inspectors.
> > The first will show the sorted counts of a bag which entries should be
> > interpreted the following:
> > #occurences -> #lookupLength -> sampleClass -> sampleSelector
> > Please report on the differences between a slow image and an 
> > acceptable
> > image (preferrably on the same system)
> > The second will give you the details of the 100 Methods with 
> > the highest
> > lookupLenghts.
> > Please look swiftly over this list if you can detect any Morphic
> > specific selectors with
> > long lookup lenghts.
> > 
> > The second thing i want you to try is to grow all your
> > MethodDictionaries that have 
> > exessive lookupLenghts. The following code snippet will do 
> > this for you.
> > 
> > | lookupLengths |
> > Behavior allSubInstancesDo:[:class | | md |
> > 	md _ class methodDict.
> > 	md isEmpty ifFalse:[
> > 		lookupLengths _ SortedCollection new.
> > 		lookupLengths addAll:(md keys 
> > asSortedCollection collect:[:sel |
> > 				(((md scanFor: sel) - sel 
> > identityHash) \\ md basicSize) -> (class
> > -> sel)]).
> > 		(lookupLengths last key > 9) ifTrue:[md grow]]]
> > 
> > 
> > Please report if your image "feels" any swifter after this operation.
> > Note#0:
> > Be sure not to have any PackagePaneBrowser (aka 5-pane browser)
> > open when you do your tests because these beasts will stop all morphic
> > updating (and maybe event dispatch) for up to one second every second
> > on a slow machine. This is due to a design bug which can very 
> > easily be
> > avoided using a changeset i once posted to the list but don't have the
> > patience
> > to dig up right now.
> > Note#1:
> > These operations might take a _very_ long time (especially on a slow
> > system)
> > so be pationent) (on my 1700+ it was in the second range but since
> > you're
> > especially encountering problems on slow systems you will probably do
> > the
> > tests there, too -- so don't say i didn't warn you ).
> > Note#2:
> > LookupLenghts stand for the amount of probes the vm has to do in _a
> > single
> > method dictionary_ to find a method corresponding to a 
> > selector. This is
> > just a
> > minimum measure because it doesn't count  the number of probes spent
> > while
> > following the superclass chain. These numbers are typically small for
> > almost empty
> > method dictionaries but may become huge when all superclasses 
> > have long
> > probe chains
> > and the selector is only implemented in ProtoObject
> > Note#3:
> > It is nowhere near guranteed that this will change anything because
> > lookup
> > lenghts aren't _supposed_ to make a difference. It is widely believed
> > that the
> > vm's lookup cache mechanism should deal with the performance hit that
> > would
> > result from long probe chains.
> > I have, whatsoever, two strong hints that lookupLenghts 
> > _might_ be part
> > of the
> > problem you encounter. These are:
> > Hint#1: The problem has arisen rather gradually and noone has yet been
> > able to
> > find any particular change that made the difference
> > hint#2: The Lookup cache (as i understand it) seems to be rather small
> > for as big
> > a system as morphic (i only saw space for 512 entries last time i
> > looked) and gets
> > flushed on several accasions (such as gc's).
> > 
> > Looking forward to your feedback,
> > Torge
> >



More information about the Squeak-dev mailing list