[Newbies] Re: How to shorten a method (using inject: into: ?)

Mon Jul 21 16:41:21 UTC 2008

Randal L. Schwartz <merlyn <at> stonehenge.com> writes:

> 
> This looks even uglier.  How about first gathering the testSets, then
> getting what you need from those:
> 
> | testSets mTotal pTotal |
> 
> testSets := (Forecaster testMale meiose) collect: [:strand | strand testRun ].
> mTotal := (testSets collect: [:each | each maternalCount]) sum.
> pTotal := (testSets collect: [:each | each paternalCount]) sum.
> 
> Unless you're talking about 5000 elements in testSets, this is likely in the
> same ballpark for speed, and that's probably dwarfed by the cost of #testRun
> anyway.
> 
> Sometimes an obsession with #inject:into: as a shiny object leads you down a
> dark path.  Especially with sums.  Consider #sum first.
> 

Premature optimization might make our code obscure for no serious reasons.
That means hard to read, hard to maintain, hard to test, maybe flawed.

However, if you deal with really huge collections, problems will come more from
space reasons.

nLoop := 500000.
example1 := [(1 to: nLoop) detectSum: [:each | each log: 10]].
example2 := [((1 to: nLoop) collect: [:each | each log: 10]) sum].

the main difference between 1 and 2 won't come from iterating twice
(as this is partially masked by the cost of #log: evaluation).
It will come mainly from putting pressure on garbage collector:
- second example allocates a lot of new Float objects,
- but unlike first example, these objects cannot be reclaimed immediately.

{
[Smalltalk garbageCollect. example1 timeToRun] value.
[Smalltalk garbageCollect. example2 timeToRun] value.
}
exhibits a factor 12 on my 3.10 image, not a factor 2.

If you plot it, example2 timeToRun is not proportional to nLoop.
It can be worse once you trigger OS page swapping on disk.

I remember old st-80 implementations would eventually have crashed on this kind
of example, because the lowSpace process could be activated too late (every 50ms
or so). Fortunately, I think this is not the case any more.

Of course, up to 10000, you will hardly notice a difference between 1 and 2.
Better leave the code simple.
Here detectSum: is both simple and efficient, though i do not like the name.
(I usually name the message sumOf:).

Nicolas