Eliot Miranda uploaded a new version of VMMaker to project VM Maker: http://source.squeak.org/VMMaker/VMMaker.oscog-eem.567.mcz
==================== Summary ====================
Name: VMMaker.oscog-eem.567 Author: eem Time: 20 December 2013, 3:47:15.976 pm UUID: 88799310-3943-4468-b8ee-4c007e7f98e7 Ancestors: VMMaker.oscog-eem.565
Commit the takeaways, which are that a) 4 byte entry-point alignment is as good as 8-byte on Core i7 b) the older backward-branching for immediates entry-point code is significantly faster for non-immediates and because we expect most SmallInteger code to be performed in-line it is better to prefer non-immediate send performance.
N.B. None of this would be an issue with 30-bit immediates.
=============== Diff against VMMaker.oscog-eem.565 ===============
Item was changed: ----- Method: CogObjectRepresentationFor32BitSpur>>getInlineCacheClassTagFrom:into: (in category 'compile abstract instructions') ----- getInlineCacheClassTagFrom: sourceReg into: destReg "Extract the inline cache tag for the object in sourceReg into destReg. The inline cache tag for a given object is the value loaded in inline caches to distinguish objects of different classes. In Spur this is either the tags for immediates, (with 1 & 3 collapsed to 1 for SmallIntegers, and 2 collapsed to 0 for Characters), or the receiver's classIndex. Generate something like this: + Limm: + andl $0x1, rDest + j Lcmp Lentry: movl rSource, rDest andl $0x3, rDest + jnz Limm - jz LnotImm - andl $1, rDest - j Lcmp - LnotImm: movl 0(%edx), rDest andl $0x3fffff, rDest Lcmp: + At least on a 2.2GHz Intel Core i7 the following is slightly faster than the above, + 136m sends/sec vs 130m sends/sec for nfib in tinyBenchmarks - At least on a 2.2GHz Intel Core i7 it is slightly faster, - 136m sends/sec vs 130m sends/sec for nfib in tinyBenchmarks, than - Limm: - andl $0x1, rDest - j Lcmp Lentry: movl rSource, rDest andl $0x3, rDest + jz LnotImm + andl $1, rDest + j Lcmp + LnotImm: - jnz Limm movl 0(%edx), rDest andl $0x3fffff, rDest Lcmp: + But we expect most SMallInteger arithmetic to be performwd in-line and so prefer the + version that is faster for non-immediates (because it branches for immediates only)." - " | immLabel jumpNotImm entryLabel jumpCompare | <var: #immLabel type: #'AbstractInstruction *'> <var: #jumpNotImm type: #'AbstractInstruction *'> <var: #entryLabel type: #'AbstractInstruction *'> <var: #jumpCompare type: #'AbstractInstruction *'> + false - true ifTrue: + [cogit AlignmentNops: BytesPerWord. - [cogit AlignmentNops: (BytesPerWord max: 8). entryLabel := cogit Label. cogit MoveR: sourceReg R: destReg. cogit AndCq: objectMemory tagMask R: destReg. jumpNotImm := cogit JumpZero: 0. cogit AndCq: 1 R: destReg. jumpCompare := cogit Jump: 0. "Get least significant half of header word in destReg" self flag: #endianness. jumpNotImm jmpTarget: (cogit MoveMw: 0 r: sourceReg R: destReg). jumpCompare jmpTarget: (cogit AndCq: objectMemory classIndexMask R: destReg)] ifFalse: [cogit AlignmentNops: BytesPerWord. immLabel := cogit Label. cogit AndCq: 1 R: destReg. jumpCompare := cogit Jump: 0. cogit AlignmentNops: BytesPerWord. entryLabel := cogit Label. cogit MoveR: sourceReg R: destReg. cogit AndCq: objectMemory tagMask R: destReg. cogit JumpNonZero: immLabel. self flag: #endianness. "Get least significant half of header word in destReg" cogit MoveMw: 0 r: sourceReg R: destReg. cogit AndCq: objectMemory classIndexMask R: destReg. jumpCompare jmpTarget: cogit Label]. ^entryLabel!
vm-dev@lists.squeakfoundation.org