[VM] HashBits, a lazy way

John M McIntosh johnmci at mac.com
Sun Jul 20 06:00:25 UTC 2003


> From:  "Andreas Raab" <andreas.raab at g...>
>  Date:  Sun Jul 20, 2003  12:19 am
>  Subject: 
>
>  John,
>
>  > gcc version is 2.95.2
>  > -g -O2 -fomit-frame-pointer
>
>  What about -O3 -mpentium and -funroll-loops? Those are included in my  
> builds
>  by default (though I'm not certain if it makes any big difference).

I just used the defaults that Ian used in his make, didn't touch it.

For unroll-loops this just unrolls for loops, but most loops in the  
Squeak VM are while loops.
unroll-all-loops for them, but I don't think it makes a difference. In  
a few places in  Squeak we
move memory about in different ways, I think we could change those to a  
for loop and have the
compiler generated unrolled loops, might be better.

The O3 causes inlining, I noticed in the allocation routine, the object  
initialize routine which isn't inlined by Squeak does get inlined by  
the O3, this makes a difference on the powerpc, because we  using  
working registers to hold values, and avoid 2 register store/load  
operation pairs.

>
>  > We are comparing AHC changes versus your localization
>  > changes. Versus say a interp.c that has 20+ t1,t2,... in it...
>  > Don't know if the 10% you talk about is AHC(sp)
>  > CGeneratorEnhancements-ajh.1.cs versus yours?
>  > or to a VM that didn't have the change...
>
>  My comparison was based on a pure VMMaker package as you get it from
>  SqueakMap. All of my comparisons are against this - I don't know if it
>  includes the changes you are talking about.

For this test the 100% allocation difference really is just measuring  
the changes to the allocation
routine and hash table lookup.  The other changes happen to be along  
for the ride but don't really affect things.
>
>  Which reminds me: The thing you said about "headerTypeBytes" or so  
> having an
>  off-by-one in the C indexing - is this bug in the VMMaker package?

It's not a bug, it's due to my originally using an  Array versus  
CArrayAccessor in Interpreter for headerTypeBytes. (PS I wonder if  
there are other array indexing issues like that in the VM?)

The change I made there was not to do the +1, that of course was there  
so the InterpreterSimulator won't choke on (headTypeBytes at: 0 for an  
Array, ok for CArrayAccessor).

For the powerpc this makes no difference because I think the integer  
unit(s) consume the addition in step with the other arithmetic  
instructions. Less capable CPUS (68K) will benefit by not having to do  
the addition.

>
>  > Sure declare JMMWhy float as a global, set to zero, then inspect
>  > this below.
>
>  Err ... I don't get it. You aren't measuring anything here. Shouldn't  
> a
>  benchmark look somewhere along the lines of:
>  Time millisecondsToRun:[
>  n timesRepeat: [
>  1 asFloat. 1 asFloat. 1 asFloat. 1 asFloat. 1 asFloat.
>  1 asFloat. 1 asFloat. 1 asFloat. 1 asFloat. 1 asFloat.
>  ].
>  ].
>
>  What (and how) are you measuring with the forked process?
>
>  Cheers,
>  - Andreas
>

  Ah, yes I've a Morphic along the lines of the framerate morphic to  
grab the JMMwhy value
every second or so, then look at current - old remembered counter  
divided by the actual time interval. Also to remember the peak. This  
allows me to watch the allocations per second in real time, and after  
running enough gather the peak allocation rate.  You could of course do  
a TIme millisecond and a calculation to get the average...


--
======================================================================== 
===
John M. McIntosh <johnmci at smalltalkconsulting.com> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===


More information about the Squeak-dev mailing list