[ENH][VM] Improved code generation (hopefully ;)

Sun Jul 13 20:51:56 UTC 2003

Hi Scott,

I actually went back to my list archive to review your mcache changes and
built a (3.6-based) VM to see what effect we get (based on messages sent
around Dec 2001 which were the latest versions I could find). I am sorry to
report that your mcache changes do not seem to improve anything, neither in
tiny nor in macro benchmarks. I ran the benchmarks using the same image
multiple times automatically (e.g., upon startup) in order to prevent any
other side effects. Since the individual results were very consistent I will
only report the averages here:

For the current mcache tinyBenchmarks reports (all of these are averages of
3 runs with 3 times starting Squeak, e.g., it averages over 9 runs total
with a spread below 1%):

	128,514,056 bytecodes per sec
	  3,755,704 sends per sec

vs. your mcache

	127,617,148 bytecodes per sec
	  3,679,575 sends per sec

thus we have 2% advantage for the current mcache in send speed (note that
the small difference in bytecode speed is probably caused by the sends
contained in the benchmark).

Macrobenchmarks were run 3 times (one time per Squeak run), and averaging we
get for the current mcache:

	#(8272 53889 18588 9395 0 6105 3683)

vs. your mcache

	#(8736 62699 20565 10072 0 6725 3915)

which gives a good 5-10% advantage for the current mcache (the difference is
surprisingly large and I am not quite sure why it's that large - this might
be related cache size, I have not tried to see if reducing the 8k cache
helps but in any case I got this consistently for three runs so I must take
it as the benchmark result).

I know that the results will disappoint you but those are the numbers I get.
Most likely, the 30% advantage you reported in macrobenchmark was due to the
other changes you did, or perhaps you did not use the same images.

As for the other changes you did back then, they have either been integrated
or adopted in other forms:
* root table overflow - this has been integrated (much to my own surprise -
I didn't remember it)
* hash bits - this has been changed by adjusting identity sets /
dictionaries to spread out the limited number of hash bits (thus removing
the need to extend the hash bits in the VM)

Cheers,
  - Andreas

> -----Original Message-----
> From: squeak-dev-bounces at lists.squeakfoundation.org 
> [mailto:squeak-dev-bounces at lists.squeakfoundation.org] On 
> Behalf Of Scott A Crosby
> Sent: Wednesday, July 09, 2003 4:29 PM
> To: Andreas Raab
> Cc: 'The general-purpose Squeak developers list'
> Subject: Re: [ENH][VM] Improved code generation (hopefully ;)
> 
> 
> On Wed, 9 Jul 2003 00:20:54 +0200, "Andreas Raab" 
> <andreas.raab at gmx.de> writes:
> 
> > Hi Scott,
> > 
> > > I just checked to see if my new method cache from about a 
> year&half
> > > ago went in. It looks not. If I remember right it was a 
> 5-15% gain on
> > > microbenchmarks at the time. You might want to reincarnate it.
> > 
> > We discussed these changes but nobody (including me) was 
> able to recreate
> > the improvement you reported.
> 
> It should have been straightforward with the new methodcache. AFAIK,
> nobody else tested it. (Or at least, I saw no discussion other than
> plans to drop it because it was considered too invasive.)
> 
> Did the root-table-overflow patch go in? (That one was where if the
> root table was about to overflow, immediately microGC, then tenure.)
> 
> > > As is before, the current method cache is ALWAYS, at most 2/3
> > > occupied.  It also pessimizes the branch prediction (What 
> focussed my
> > > effort was finding out that a single instruction in the 
> code accounted
> > > for a huge amount of the tuntime.
> > 
> > Huh? I don't quite understand what you're saying here. 
> Where was that insn?
> > What did it do? How was it related to the mcache?
> 
> It was in commonLookup, here's the code snippet out of interp.c
> 
>                 commonLookup:   /*  */;
>                         /* begin lookupInMethodCacheSel:class: */
>                         t5 = messageSelector ^ lkupClass;
>                         t6 = t5 & 4088;
>                         if (((methodCache[t6 + 1]) == 
> messageSelector) && ((meth
> odCache[t6 + 2]) == lkupClass)) {
>                                 newMethod = methodCache[t6 + 3];
>                                 primitiveIndex = methodCache[t6 + 4];
>                                 newNativeMethod = methodCache[t6 + 5];
>                                 t4 = 1;
>                                 goto l65;
> 
> The problem is the first branch in the conditional. It has a 66% or so
> hit rate, which is horrible because it means that you're constantly
> paying a branch misprediction. That focussed me on the methodcache. If
> I remember right, something around 2-5% of the runtime was spent on
> that single instruction. (during a macroBenchmarks)
> 
> The current methodcache has two problems. First the 'if all three
> probes are full, erase all three' logic means that the steady-state
> occupancy of the method cache is *at most* 2/3 full. This has been
> verified both analytically and in practice.
> 
> Secondly, the hit rate on that first branch is horrible. With both an
> increase in the method cache size, and a new algorithm that gives it
> an occupancy rate of 100%, I got I think 5-10%. Looking at what I had
> below, I seem to have gotten 2x in cases. (However, that might have
> included other fixes on the squeak code side.)
> 
> Going back into my old-projects directory, here were my results:
> 
> ------
> 
> New cache results 32768 slots 4 probes:
>   Overall miss rate: 0.0022517
>   First probe miss rate: 0.062509
> 
> 
> New cache results 8192 slots 4 probes:
>   Overall miss rate: 0.0057631
>   First probe miss rate: .10140
> 
> 
> New cache results 512 slots 4 probes:
>   Overall miss rate: .056431
>   First probe miss rate: 0.64578
> 
> 
> Old cache:
>   Overall Miss rate: 0.044352
>   First probe miss rate .66
> 
> -- This comparison above is apples-oranges, but I can't easily go
> -- through my earlier posts to identify if I have a better one.
> 
> -- Here is where I compared the new MC and BC..
> 
> >       STOCK   MC      BC      BC+MC
> > 1      54908   27235      -       -
> > 2     294773  219893      -   54632[*]
> > 3     132697   56784  83456   41692
> > 4      54756   30076      -  207758
> > 5          -       -      -       -
> > 6      19057   14462  18267   11682
> > 7      14544   11520  15518    8367
> > 
> > These are all on a P2-450
> 
> [*] This number is wrong, it builds tiles for 9 methods, when 
> the others are
> run with 35.
> 
> -- This above was probably done with my stock MC patch (8192 slots)
> 
> -- The main essence of the new cache is:
> 
> !Interpreter methodsFor: 'method lookup cache' stamp: 'sac 
> 2/12/2002 13:56'!
> addNewMethodToCache
>         "Add the given entry to the method cache.  
>         
>         The policy is as follows:
> 
>         We have a hash table, where each entry in the table 
> contains 4 slots. We replace a random entry with the new 
> cached methodlookup. We don't bother searching for an empty en
> try because there will basically never be on except during 
> startup or after a GC."
>         | probe hash |
>         self inline: false.
>         self compilerTranslateMethodHook.
>         "newMethod x lkupClass -> newNativeMethod (may cause GC !!)"
>         hash _ self hashIndex: messageSelector with: lkupClass.
> 
>         "The cache is almost always going to be full, so 
> don't bother to find an empty slot, just do random 
> replacement."       
> 
>         probe _ hash + ((cacheRandomState bitAnd: CacheProbeMax - 1)
>                                         * MethodCacheEntrySize).
>         "Overwrite an entry."
>         methodCache at: probe + MethodCacheSelector put: 
> messageSelector.
>         methodCache at: probe + MethodCacheClass put: lkupClass.
>         methodCache at: probe + MethodCacheMethod put: newMethod.
>         methodCache at: probe + MethodCachePrim put: primitiveIndex.
>         methodCache at: probe + MethodCacheNative put: 
> newNativeMethod.
>         "Construct a new random seed."
>         cacheRandomState _ cacheRandomState * 23 >> 4 + 1 
> bitAnd: 16777215! !
> 
> !Interpreter methodsFor: 'method lookup cache' stamp: 'sac 
> 10/2/2001 20:02'!
> hash: aSelector with: aClass
>         "Compute the hash code for the selector and class."
>         ^(aSelector bitXor: aClass).
> ! !
> 
> !Interpreter methodsFor: 'method lookup cache' stamp: 'sac 
> 10/3/2001 07:38'!
> hashIndex: aSelector with: aClass 
>         "Compute the slot index in the hash table for the 
> given selector and  
>         class."
>         ^ (self hash: aSelector with: aClass)
>                 bitAnd: ((MethodCacheMask << CacheProbeShift) 
> bitAnd: MethodCacheMask).! !
> 
> !Interpreter methodsFor: 'method lookup cache' stamp: 'sac 
> 2/12/2002 13:59'!
> lookupInMethodCacheSel: selector class: class 
>         "This method implements a simple method lookup cache.
>         
>         The cache is implemented as a set of entries where 
> each entry has several slots. When we probe, we prombe the 
> first slot. If its a hit, we're done. If one of the other slo
> ts matches, we swap that slot with the previous one, trying 
> to put the most frequently used method&selector in the first slot.
> 
>      If an entry for  
>         the given selector and class is found in the cache, 
> set the values of  
>         'newMethod' and 'primitiveIndex' and return true. 
> Otherwise, return  
>         false."
>         "About the re-probe scheme: The hash is the low bits 
> of the XOR of two  
>         large addresses, minus their useless lowest two bits."
>         "WARNING: Since the hash computation is based on the 
> object addresses of 
>         the class and selector, we must rehash or flush when 
> compacting  
>         storage. We've chosen to flush, since that also saves 
> the trouble of  
>         updating the addresses of the objects in the cache."
>         | hash probe |
>         self inline: true.
>         hash _ self hashIndex: selector with: class.
> 
>         probe _ hash.
>         "first probe"
>         ((methodCache at: probe + MethodCacheSelector)
>                                 = selector
>                         and: [(methodCache at: probe + 
> MethodCacheClass)
>                                         = class])
>                 ifTrue: [newMethod _ methodCache at: probe + 
> MethodCacheMethod.
>                         primitiveIndex _ methodCache at: 
> probe + MethodCachePrim.
>                         newNativeMethod _ methodCache at: 
> probe + MethodCacheNative.
>                         ^ true].
>         "Damn, didn't find it first, so we do the long search 
> among the rest."
>         1
>                 to: CacheProbeMax - 1
>                 do: [:i | 
>                         probe _ hash + (i * MethodCacheEntrySize).
>                         "All later probes."
>                         ((methodCache at: probe + MethodCacheSelector)
>                                                 = selector
>                                         and: [(methodCache 
> at: probe + MethodCacheClass)
>                                                         = class])
>                                 ifTrue: ["Get the feedback."
>                                         newMethod _ 
> methodCache at: probe + MethodCacheMethod.
>                                         primitiveIndex _ 
> methodCache at: probe + MethodCachePrim.
>                                         newNativeMethod _ 
> methodCache at: probe + MethodCacheNative.
>                                         "And swap this entry 
> with the last one."
>                                         self 
> swapCacheWithPrevEntry: probe.
>                                         ^ true]].
>         ^ false! !
> 
> !Interpreter methodsFor: 'method lookup cache' stamp: 'sac 
> 10/2/2001 20:44'!
> swapCacheWithPrevEntry: probe
>         self inline: true.
>         self swapFieldWithPrevEntry: probe offset: 
> MethodCacheSelector.
>         self swapFieldWithPrevEntry: probe offset: MethodCacheClass.
>         self swapFieldWithPrevEntry: probe offset: MethodCacheMethod.
>         self swapFieldWithPrevEntry: probe offset: MethodCachePrim.
>         self swapFieldWithPrevEntry: probe offset: MethodCacheNative.
>         ! !
> 
> !Interpreter methodsFor: 'method lookup cache' stamp: 'sac 
> 10/2/2001 20:44'!
> swapFieldWithPrevEntry: probe offset: offset
>         | t u |
>         self inline: true.
>         t _ methodCache at: probe + offset.
>         u _ methodCache at: probe - MethodCacheEntrySize + offset.
>         methodCache at: probe + offset put: u.
>         methodCache at: probe - MethodCacheEntrySize + offset put: t.
>         ! !
> 
> 
> 
> Scott
>