measuring the system (was: Re: gcd:, optimization)

Daniel Vainsencher danielv at netvision.net.il
Sun Feb 15 02:36:48 UTC 1998


On Sat, 14 Feb 1998 16:48:48 -0300, Leandro Caniglia wrote:

>Your implementation of #gcd: is faster than the binary one (that found in
>Squeak) provided that the receiver and the argument are SmallIntegers. The
>binary algorithm is faster for LargeIntegers, but there is an alternative
>algorithm for LargeIntegers that runs even faster in Squeak (you can found
>it in the Vol. 2 of Knuth). Thus, while there is only one #gcd:
>implementation in the Integer hierarchy, the current choice seems to be the
>right one. (BTW, the current implementation admits a minor change doing it
>to run a bit faster)
>
>Saludos,
>Leandro

It is precisely for this reason that I ask how one goes about finding
out the real-world useage and costs for algorithms. If 90% of gcd:
calls operate of SmallIntegers then the better algorithms you
describe might not apply.
(See Tim's comments below.)

For example, I suspect that as with Fractions, a significant part of
the costs working with LargeIntegers will be allocation costs, thus
what the gcd: algorithm is doesn't matter that much. Or maybe I am
mistaken.

Tim olson said -
>So creating a good benchmark test requires that you know not only the 
>approximate number and distribution of methods used, but also the 
>distribution of data types.
>
>You might be able to get some of this info by modifying the existing 
>Fraction class methods to tally the message counts and data types while 
>running whatever code you are interested in.

The way I see would be to add gcd: methods to the concrete subclasses
of Integer, each of which inc's a counter and then super-calls.

I don't like the intrusiveness of this method. Is there a more
general way of tackling this? maybe somewhere in the method lookup
mechanism?

Another issue - if choosing which algorithm to implement depends on
my benchmarks, then it should really depend on my useage of the
system. The variations in how different people use their machines
might make all the effort useless. Is there some good rule of thumb
here, or do we just choose something that works fine for us now, and
that when useage patterns change someone will remake the decision?

Dan says - 
>To take Tim's example, you simply use the message tallySends: in place or >spyOn:, and there is no need to wrap it in a 5000x loop, because the >tallies are exact:
[snip statistics print out]
>The second part of the report is especially interesting in complex examples >because, for every routine of significance, you get a breakdown of how many >calls come from where:

That seems to be the really important part - now, if we could get
that sort of info about everything that happens in the system...

In general, a barrier I seem to run into often is that it's much
harder to find out what happens when specific user events happen than
when I run some code explicitly. For example, how keyboard input is
handled from the keypress. Any suggestions, tips?
-----------
Daniel Vainsencher 
A member of the First Millennial Foundation,
making space a place to live in. Visit our CyberHabitat at
http://www.millennial.org





More information about the Squeak-dev mailing list