[Vm-dev] Array new: SmallInteger maxVal

Fri Oct 23 01:36:59 UTC 2009

Thanks Henrik,

I took your suggestions and found the following:

- Using your suggested test:
	[1 to: 200 do: [:e | 1 to: 25185 do: [:t | Array new: e]]] timeToRun -
	[1 to: 200 do: [:e | 1 to: 25185 do: [:t | ]]] timeToRun.
  Unfortunately I was not able to get any useful data from a TimeProfileBrowser
  on my system (there was no indication that time was being spent in GC though),
  but overall time to run showed the updated VM (with allocation checks) giving
  a 12% better performance in primitives than the prior VM without checks (!?!).

- Going back to my original test, and looking at it with a TimeProfileBrowser,
  I saw about 91-95% of the time was spent in primitives under Collection>>add:
  so the time was presumably being spent largely in array allocation. That
  presumably included garbage collection, but it was nonetheless primarily
  exercising #primitiveNewWithArg.

- Comparing just the time spent in primitives, the time in primitives for
  the VM with new object allocation checks was 3.8% better than the VM without
  those checks. I would not attribute much precision to this, but it's still
  consistent with my original smoke test check that showed the VM with checks
  being slightly ( < 1% ) faster than the prior version without the checks.

I cannot explain why the updates seem to make the VM slightly faster, but
it does seem to be the case on my machine (AMD, 64-bit Linux). My best SWAG
speculative-and-probably-wrong guess would be that the variable declaration
updates included in the change set may have had the unintended side effect
of eliminating some inefficiencies somewhere.

I suspect that I am making a mistake somewhere. Really, there's just no
way that the added checks should make things go *faster*. Can anyone
else confirm or deny a performance difference between a VM built with
VMMaker-dtl.143 (including the allocation checks) versus a VM built with
VMMaker-dtl.142 or earlier? 

Dave

On Thu, Oct 22, 2009 at 02:47:36PM +0200, Henrik Johansen wrote:
> 
> That's more of a GC-test :) (93% GC, 5% OrderedCollection>>add: on my  
> machine)
> I found it's usually a good idea to first do a
> TimeProfileBrowser onBlock: testBlock
> just to check the timing is actually spent doing what you want to  
> measure a difference in,
> before switching to millisecondsToRun to get the number without tally  
> overhead.
> 
> Measuring single primitives can be rather hard though, since any  
> overhead can be a big part of total runtime...
> Also, do:,  timesRepeat: etc. should be avoided for looping when  
> measuring performance until the Stack VM is out, since they create  
> additional BlockContexts (and thus more time spent in gc) that weren't  
> there before closures.
> 
> It's also good to avoid computations other than the one you're testing  
> in the inner loop, so a better test might be something like:
> 
> [1 to: 200 do: [:e | 1 to: 25185 do: [:t | Array new: e]]] timeToRun -  
> [1 to: 200 do: [:e | 1 to: 25185 do: [:t | ]]] timeToRun.
> Then open a TimeProfileBrowser  on the first block and subtract the GC- 
> time listed there.
> (The 25185 was 1000000//27 from your test, changed 27 with 200 since  
> the ms runtime with 27 was in the double digits...)
> 
> If any of my assumptions are incorrect, I'd like to know :)
> 
> Cheers,
> Henry
> 
> On Oct 22, 2009, at 3:23 15AM, David T. Lewis wrote:
> 
> >Regarding performance associated with the changes, I was not able to  
> >measure
> >any loss of performance. Actually, my crude test showed a slight  
> >improvement,
> >which I can only attribute to random variation in the results.
> >
> >Here is an example of one of the informal tests that I tried:
> >
> > block := [oc := OrderedCollection new.
> > (1 to: 1000000) do: [:e | oc add: (Array new: (e \\ 27) + 1)]].
> >
> > "Stock VM:"
> > Smalltalk garbageCollect.
> > before := (1 to: 5) collect: [:e | Time millisecondsToRun: block]  
> >==> #(21393 20582 21511 21101 20761)
> >
> > "VM with my Array alloc changes:"
> > Smalltalk garbageCollect.
> > after := (1 to: 5) collect: [:e | Time millisecondsToRun: block]  
> >==> #(21582 20737 20693 20691 20725)
> >
> > slowdownDueToTheChanges := (after sum - before sum / before sum)  
> >asFloat ==> -0.008732961233246
> >
> >I got similar results for allocating strings, very slightly faster  
> >after
> >the changes. I was happy with "not slower" and left it at that.
> >
> >Can anyone suggest a more suitable benchmark?
> >
> >Also, I'm running on AMD 64 and I was only guessing that integer  
> >shift and
> >test sign would be a good approach. It might be awful on some  
> >hardware, I
> >don't know.
> >
> >r.e. vmParameterAt:put: to modify max allocation request size --  
> >good idea.
> >The changes that I made are strictly intended to protect against a  
> >VM crash
> >or object memory corruption, nothing more. But some mechanism to  
> >prevent
> >people from making unreasonable memory requests is clearly also  
> >needed.
> >
> >Dave
> >
> >