[VM] HashBits, a lazy way
John M McIntosh
johnmci at mac.com
Sun Jul 20 06:00:25 UTC 2003
> From: "Andreas Raab" <andreas.raab at g...>
> Date: Sun Jul 20, 2003 12:19 am
> Subject:
>
> John,
>
> > gcc version is 2.95.2
> > -g -O2 -fomit-frame-pointer
>
> What about -O3 -mpentium and -funroll-loops? Those are included in my
> builds
> by default (though I'm not certain if it makes any big difference).
I just used the defaults that Ian used in his make, didn't touch it.
For unroll-loops this just unrolls for loops, but most loops in the
Squeak VM are while loops.
unroll-all-loops for them, but I don't think it makes a difference. In
a few places in Squeak we
move memory about in different ways, I think we could change those to a
for loop and have the
compiler generated unrolled loops, might be better.
The O3 causes inlining, I noticed in the allocation routine, the object
initialize routine which isn't inlined by Squeak does get inlined by
the O3, this makes a difference on the powerpc, because we using
working registers to hold values, and avoid 2 register store/load
operation pairs.
>
> > We are comparing AHC changes versus your localization
> > changes. Versus say a interp.c that has 20+ t1,t2,... in it...
> > Don't know if the 10% you talk about is AHC(sp)
> > CGeneratorEnhancements-ajh.1.cs versus yours?
> > or to a VM that didn't have the change...
>
> My comparison was based on a pure VMMaker package as you get it from
> SqueakMap. All of my comparisons are against this - I don't know if it
> includes the changes you are talking about.
For this test the 100% allocation difference really is just measuring
the changes to the allocation
routine and hash table lookup. The other changes happen to be along
for the ride but don't really affect things.
>
> Which reminds me: The thing you said about "headerTypeBytes" or so
> having an
> off-by-one in the C indexing - is this bug in the VMMaker package?
It's not a bug, it's due to my originally using an Array versus
CArrayAccessor in Interpreter for headerTypeBytes. (PS I wonder if
there are other array indexing issues like that in the VM?)
The change I made there was not to do the +1, that of course was there
so the InterpreterSimulator won't choke on (headTypeBytes at: 0 for an
Array, ok for CArrayAccessor).
For the powerpc this makes no difference because I think the integer
unit(s) consume the addition in step with the other arithmetic
instructions. Less capable CPUS (68K) will benefit by not having to do
the addition.
>
> > Sure declare JMMWhy float as a global, set to zero, then inspect
> > this below.
>
> Err ... I don't get it. You aren't measuring anything here. Shouldn't
> a
> benchmark look somewhere along the lines of:
> Time millisecondsToRun:[
> n timesRepeat: [
> 1 asFloat. 1 asFloat. 1 asFloat. 1 asFloat. 1 asFloat.
> 1 asFloat. 1 asFloat. 1 asFloat. 1 asFloat. 1 asFloat.
> ].
> ].
>
> What (and how) are you measuring with the forked process?
>
> Cheers,
> - Andreas
>
Ah, yes I've a Morphic along the lines of the framerate morphic to
grab the JMMwhy value
every second or so, then look at current - old remembered counter
divided by the actual time interval. Also to remember the peak. This
allows me to watch the allocations per second in real time, and after
running enough gather the peak allocation rate. You could of course do
a TIme millisecond and a calculation to get the average...
--
========================================================================
===
John M. McIntosh <johnmci at smalltalkconsulting.com> 1-800-477-2659
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===
More information about the Squeak-dev
mailing list
|