Thanks Henrik,
I took your suggestions and found the following:
- Using your suggested test: [1 to: 200 do: [:e | 1 to: 25185 do: [:t | Array new: e]]] timeToRun - [1 to: 200 do: [:e | 1 to: 25185 do: [:t | ]]] timeToRun. Unfortunately I was not able to get any useful data from a TimeProfileBrowser on my system (there was no indication that time was being spent in GC though), but overall time to run showed the updated VM (with allocation checks) giving a 12% better performance in primitives than the prior VM without checks (!?!).
- Going back to my original test, and looking at it with a TimeProfileBrowser, I saw about 91-95% of the time was spent in primitives under Collection>>add: so the time was presumably being spent largely in array allocation. That presumably included garbage collection, but it was nonetheless primarily exercising #primitiveNewWithArg.
- Comparing just the time spent in primitives, the time in primitives for the VM with new object allocation checks was 3.8% better than the VM without those checks. I would not attribute much precision to this, but it's still consistent with my original smoke test check that showed the VM with checks being slightly ( < 1% ) faster than the prior version without the checks.
I cannot explain why the updates seem to make the VM slightly faster, but it does seem to be the case on my machine (AMD, 64-bit Linux). My best SWAG speculative-and-probably-wrong guess would be that the variable declaration updates included in the change set may have had the unintended side effect of eliminating some inefficiencies somewhere.
I suspect that I am making a mistake somewhere. Really, there's just no way that the added checks should make things go *faster*. Can anyone else confirm or deny a performance difference between a VM built with VMMaker-dtl.143 (including the allocation checks) versus a VM built with VMMaker-dtl.142 or earlier?
Dave
On Thu, Oct 22, 2009 at 02:47:36PM +0200, Henrik Johansen wrote:
That's more of a GC-test :) (93% GC, 5% OrderedCollection>>add: on my machine) I found it's usually a good idea to first do a TimeProfileBrowser onBlock: testBlock just to check the timing is actually spent doing what you want to measure a difference in, before switching to millisecondsToRun to get the number without tally overhead.
Measuring single primitives can be rather hard though, since any overhead can be a big part of total runtime... Also, do:, timesRepeat: etc. should be avoided for looping when measuring performance until the Stack VM is out, since they create additional BlockContexts (and thus more time spent in gc) that weren't there before closures.
It's also good to avoid computations other than the one you're testing in the inner loop, so a better test might be something like:
[1 to: 200 do: [:e | 1 to: 25185 do: [:t | Array new: e]]] timeToRun - [1 to: 200 do: [:e | 1 to: 25185 do: [:t | ]]] timeToRun. Then open a TimeProfileBrowser on the first block and subtract the GC- time listed there. (The 25185 was 1000000//27 from your test, changed 27 with 200 since the ms runtime with 27 was in the double digits...)
If any of my assumptions are incorrect, I'd like to know :)
Cheers, Henry
On Oct 22, 2009, at 3:23 15AM, David T. Lewis wrote:
Regarding performance associated with the changes, I was not able to measure any loss of performance. Actually, my crude test showed a slight improvement, which I can only attribute to random variation in the results.
Here is an example of one of the informal tests that I tried:
block := [oc := OrderedCollection new. (1 to: 1000000) do: [:e | oc add: (Array new: (e \ 27) + 1)]].
"Stock VM:" Smalltalk garbageCollect. before := (1 to: 5) collect: [:e | Time millisecondsToRun: block] ==> #(21393 20582 21511 21101 20761)
"VM with my Array alloc changes:" Smalltalk garbageCollect. after := (1 to: 5) collect: [:e | Time millisecondsToRun: block] ==> #(21582 20737 20693 20691 20725)
slowdownDueToTheChanges := (after sum - before sum / before sum) asFloat ==> -0.008732961233246
I got similar results for allocating strings, very slightly faster after the changes. I was happy with "not slower" and left it at that.
Can anyone suggest a more suitable benchmark?
Also, I'm running on AMD 64 and I was only guessing that integer shift and test sign would be a good approach. It might be awful on some hardware, I don't know.
r.e. vmParameterAt:put: to modify max allocation request size -- good idea. The changes that I made are strictly intended to protect against a VM crash or object memory corruption, nothing more. But some mechanism to prevent people from making unreasonable memory requests is clearly also needed.
Dave