VM Maker: VMMaker.oscog-eem.567.mcz - Vm-dev

21 Dec 2013


      Eliot Miranda uploaded a new version of VMMaker to project VM Maker:
http://source.squeak.org/VMMaker/VMMaker.oscog-eem.567.mcz
==================== Summary ====================
Name: VMMaker.oscog-eem.567
Author: eem
Time: 20 December 2013, 3:47:15.976 pm
UUID: 88799310-3943-4468-b8ee-4c007e7f98e7
Ancestors: VMMaker.oscog-eem.565
Commit the takeaways, which are that
a) 4 byte entry-point alignment is as good as 8-byte on Core i7
b) the older backward-branching for immediates entry-point code
is significantly faster for non-immediates and because we expect
most SmallInteger code to be performed in-line it is better to prefer
non-immediate send performance.
N.B. None of this would be an issue with 30-bit immediates.
=============== Diff against VMMaker.oscog-eem.565 ===============
Item was changed:
  ----- Method: CogObjectRepresentationFor32BitSpur>>getInlineCacheClassTagFrom:into: (in category 'compile abstract instructions') -----
  getInlineCacheClassTagFrom: sourceReg into: destReg
    "Extract the inline cache tag for the object in sourceReg into destReg. The inline
     cache tag for a given object is the value loaded in inline caches to distinguish
     objects of different classes.  In Spur this is either the tags for immediates, (with
     1 & 3 collapsed to 1 for SmallIntegers, and 2 collapsed to 0 for Characters), or
     the receiver's classIndex.  Generate something like this:
+ 		Limm:
+ 			andl $0x1, rDest
+ 			j Lcmp
    	Lentry:
    		movl rSource, rDest
    		andl $0x3, rDest
+ 			jnz Limm
- 			jz LnotImm
- 			andl $1, rDest
- 			j Lcmp
- 		LnotImm:
    		movl 0(%edx), rDest
    		andl $0x3fffff, rDest
    	Lcmp:
+ 	 At least on a 2.2GHz Intel Core i7 the following is slightly faster than the above,
+ 	 136m sends/sec vs 130m sends/sec for nfib in tinyBenchmarks
- 	 At least on a 2.2GHz Intel Core i7 it is slightly faster,
- 	 136m sends/sec vs 130m sends/sec for nfib in tinyBenchmarks, than
- 		Limm:
- 			andl $0x1, rDest
- 			j Lcmp
    	Lentry:
    		movl rSource, rDest
    		andl $0x3, rDest
+ 			jz LnotImm
+ 			andl $1, rDest
+ 			j Lcmp
+ 		LnotImm:
- 			jnz Limm
    		movl 0(%edx), rDest
    		andl $0x3fffff, rDest
    	Lcmp:
+ 	 But we expect most SMallInteger arithmetic to be performwd in-line and so prefer the
+ 	 version that is faster for non-immediates (because it branches for immediates only)."
- 	"
    | immLabel jumpNotImm entryLabel jumpCompare |
    <var: #immLabel type: #'AbstractInstruction *'>
    <var: #jumpNotImm type: #'AbstractInstruction *'>
    <var: #entryLabel type: #'AbstractInstruction *'>
    <var: #jumpCompare type: #'AbstractInstruction *'>
+ 	false
- 	true
    	ifTrue:
+ 			[cogit AlignmentNops: BytesPerWord.
- 			[cogit AlignmentNops: (BytesPerWord max: 8).
    		 entryLabel := cogit Label.
    		 cogit MoveR: sourceReg R: destReg.
    		 cogit AndCq: objectMemory tagMask R: destReg.
    		 jumpNotImm := cogit JumpZero: 0.
    		 cogit AndCq: 1 R: destReg.
    		 jumpCompare := cogit Jump: 0.
    		 "Get least significant half of header word in destReg"
    		 self flag: #endianness.
    		 jumpNotImm jmpTarget:
    			(cogit MoveMw: 0 r: sourceReg R: destReg).
    		 jumpCompare jmpTarget:
    			(cogit AndCq: objectMemory classIndexMask R: destReg)]
    	ifFalse:
    		[cogit AlignmentNops: BytesPerWord.
    		 immLabel := cogit Label.
    		 cogit AndCq: 1 R: destReg.
    		 jumpCompare := cogit Jump: 0.
    		 cogit AlignmentNops: BytesPerWord.
    		 entryLabel := cogit Label.
    		 cogit MoveR: sourceReg R: destReg.
    		 cogit AndCq: objectMemory tagMask R: destReg.
    		 cogit JumpNonZero: immLabel.
    		 self flag: #endianness.
    		 "Get least significant half of header word in destReg"
    		 cogit MoveMw: 0 r: sourceReg R: destReg.
    		 cogit AndCq: objectMemory classIndexMask R: destReg.
    		 jumpCompare jmpTarget: cogit Label].
    ^entryLabel!