Eliot Miranda uploaded a new version of VMMaker to project VM Maker: http://source.squeak.org/VMMaker/VMMaker.oscog-eem.1332.mcz
==================== Summary ====================
Name: VMMaker.oscog-eem.1332 Author: eem Time: 1 June 2015, 5:02:18.369 pm UUID: 528336b3-25ab-41f7-b788-2b9155e5d980 Ancestors: VMMaker.oscog-eem.1331
Fix the regression in implicit receiver sends caused by VMMaker.oscog-eem.1317 (fix to performance regression caused by using XCHG on x86).
The implicit receiver cache uses SendNumArgsReg to refer to the cache object. Hence we must use TempReg for genPushRegisterArgsForNumArgs: in this case. So refactor to genPushRegisterArgsForNumArgs:scratchReg:, passing either TempReg or SendNumArgsReg as approprate.
=============== Diff against VMMaker.oscog-eem.1331 ===============
Item was removed: - ----- Method: CogARMCompiler>>genPushRegisterArgsForNumArgs: (in category 'smalltalk calling convention') ----- - genPushRegisterArgsForNumArgs: numArgs - "Ensure that the register args are pushed before the retpc for arity <= self numRegArgs." - "This is easy on a RISC like ARM because the return address is in the link register. Putting - the receiver and args above the return address means the CoInterpreter has a single - machine-code frame format which saves us a lot of work - NOTA BENE: we do NOT push the return address here, which means it must be dealt with later." - numArgs <= cogit numRegArgs ifTrue: - [self assert: cogit numRegArgs <= 2. - cogit PushR: ReceiverResultReg. - numArgs > 0 ifTrue: - [cogit PushR: Arg0Reg. - numArgs > 1 ifTrue: - [cogit PushR: Arg1Reg]]]!
Item was added: + ----- Method: CogARMCompiler>>genPushRegisterArgsForNumArgs:scratchReg: (in category 'smalltalk calling convention') ----- + genPushRegisterArgsForNumArgs: numArgs scratchReg: ignored + "Ensure that the register args are pushed before the retpc for arity <= self numRegArgs." + "This is easy on a RISC like ARM because the return address is in the link register. Putting + the receiver and args above the return address means the CoInterpreter has a single + machine-code frame format which saves us a lot of work + NOTA BENE: we do NOT push the return address here, which means it must be dealt with later." + numArgs <= cogit numRegArgs ifTrue: + [self assert: cogit numRegArgs <= 2. + cogit PushR: ReceiverResultReg. + numArgs > 0 ifTrue: + [cogit PushR: Arg0Reg. + numArgs > 1 ifTrue: + [cogit PushR: Arg1Reg]]]!
Item was removed: - ----- Method: CogAbstractInstruction>>genPushRegisterArgsForNumArgs: (in category 'smalltalk calling convention') ----- - genPushRegisterArgsForNumArgs: numArgs - "Ensure that the register args are pushed before the retpc for arity <= self numRegArgs." - "This won't be as clumsy on a RISC. But putting the receiver and - args above the return address means the CoInterpreter has a - single machine-code frame format which saves us a lot of work." - - "This won't be as clumsy on a RISC. But putting the receiver and - args above the return address means the CoInterpreter has a - single machine-code frame format which saves us a lot of work." - - "Iff there are register args convert - sp -> retpc (send site retpc) - to - base -> receiver - (arg0) - (arg1) - sp -> retpc (send site retpc)" - self subclassResponsibility!
Item was added: + ----- Method: CogAbstractInstruction>>genPushRegisterArgsForNumArgs:scratchReg: (in category 'smalltalk calling convention') ----- + genPushRegisterArgsForNumArgs: numArgs scratchReg: scratchReg + "Ensure that the register args are pushed before the retpc for arity <= self numRegArgs. + This isn't as clumsy on a RISC. But putting the receiver and + args above the return address means the CoInterpreter has a + single machine-code frame format which saves us a lot of work." + + "Iff there are register args convert + sp -> retpc (send site retpc) + to + base -> receiver + (arg0) + (arg1) + sp -> retpc (send site retpc)" + self subclassResponsibility!
Item was removed: - ----- Method: CogIA32Compiler>>genPushRegisterArgsForNumArgs: (in category 'smalltalk calling convention') ----- - genPushRegisterArgsForNumArgs: numArgs - "Ensure that the register args are pushed before the retpc for arity <= self numRegArgs. This - won't be as clumsy on a RISC. But putting the receiver and args above the return address - means the CoInterpreter has a single machine-code frame format which saves us a lot of work. - N.B. Take great care to /not/ smash TempReg, which is used in directed send marshalling. - We could use XCHG to swap the ReceiverResultReg and top-of-stack return address, pushing the - the ret pc (now in ReceiverResultReg) later, but XCHG is very slow. We can use SendNumArgsReg - because it is only live in sends of arity >= (NumSendTrampolines - 1)." - self assert: cogit numRegArgs < (NumSendTrampolines - 1). - numArgs <= cogit numRegArgs ifTrue: - [self assert: cogit numRegArgs <= 2. - false "these two variants show the same performance on Intel Core i7, but the second one may be shorter." - ifTrue: - [cogit MoveMw: 0 r: SPReg R: SendNumArgsReg. "Save return pc" - numArgs > 0 ifTrue: - [cogit PushR: Arg0Reg. - numArgs > 1 ifTrue: - [cogit PushR: Arg1Reg]]. - cogit PushR: SendNumArgsReg. - cogit MoveR: ReceiverResultReg Mw: objectMemory wordSize * (1 + numArgs) r: SPReg] - ifFalse: - ["a.k.a. - cogit gen: XCHGMwrR operand: 0 operand: SPReg operand: ReceiverResultReg. - but XCHG is slow." - cogit MoveMw: 0 r: SPReg R: SendNumArgsReg. "Save return pc" - cogit MoveR: ReceiverResultReg Mw: 0 r: SPReg. - numArgs > 0 ifTrue: - [cogit PushR: Arg0Reg. - numArgs > 1 ifTrue: - [cogit PushR: Arg1Reg]]. - cogit PushR: SendNumArgsReg]] "Restore return address"!
Item was added: + ----- Method: CogIA32Compiler>>genPushRegisterArgsForNumArgs:scratchReg: (in category 'smalltalk calling convention') ----- + genPushRegisterArgsForNumArgs: numArgs scratchReg: scratchReg + "Ensure that the register args are pushed before the retpc for arity <= self numRegArgs. This + isn't as clumsy on a RISC. But putting the receiver and args above the return address + means the CoInterpreter has a single machine-code frame format which saves us a lot of work. + N.B. Take great care to /not/ smash TempReg, which is used in directed send marshalling. + We could use XCHG to swap the ReceiverResultReg and top-of-stack return address, pushing the + the ret pc (now in ReceiverResultReg) later, but XCHG is very slow. We can use SendNumArgsReg + because it is only live in sends of arity >= (NumSendTrampolines - 1)." + self assert: cogit numRegArgs < (NumSendTrampolines - 1). + numArgs <= cogit numRegArgs ifTrue: + [self assert: cogit numRegArgs <= 2. + false "these two variants show the same performance on Intel Core i7, but the second one may be shorter." + ifTrue: + [cogit MoveMw: 0 r: SPReg R: scratchReg. "Save return pc" + numArgs > 0 ifTrue: + [cogit PushR: Arg0Reg. + numArgs > 1 ifTrue: + [cogit PushR: Arg1Reg]]. + cogit PushR: scratchReg. + cogit MoveR: ReceiverResultReg Mw: objectMemory wordSize * (1 + numArgs) r: SPReg] + ifFalse: + ["a.k.a. + cogit gen: XCHGMwrR operand: 0 operand: SPReg operand: ReceiverResultReg. + but XCHG is slow." + cogit MoveMw: 0 r: SPReg R: scratchReg. "Save return pc" + cogit MoveR: ReceiverResultReg Mw: 0 r: SPReg. + numArgs > 0 ifTrue: + [cogit PushR: Arg0Reg. + numArgs > 1 ifTrue: + [cogit PushR: Arg1Reg]]. + cogit PushR: scratchReg]] "Restore return address"!
Item was changed: ----- Method: Cogit>>genNSSendTrampolineFor:numArgs:called: (in category 'initialization') ----- genNSSendTrampolineFor: aRoutine numArgs: numArgs called: aString "ReceiverResultReg: method receiver SendNumArgsReg: the NSSendCache cache" <option: #NewspeakVM> <var: #aRoutine type: #'void *'> <var: #aString type: #'char *'> | jumpMiss jumpItsTheReceiverStupid | <var: #jumpMiss type: #'AbstractInstruction *'> <var: #jumpItsTheReceiverStupid type: #'AbstractInstruction *'> opcodeIndex := 0. objectRepresentation genGetInlineCacheClassTagFrom: ReceiverResultReg into: ClassReg forEntry: false. self MoveMw: NSCClassTagIndex * objectMemory wordSize r: SendNumArgsReg R: TempReg. self CmpR: ClassReg R: TempReg. jumpMiss := self JumpNonZero: 0. self MoveMw: NSCEnclosingObjectIndex * objectMemory wordSize r: SendNumArgsReg R: TempReg. self CmpCq: 0 R: TempReg. jumpItsTheReceiverStupid := self JumpZero: 0. self MoveR: TempReg R: ReceiverResultReg. "We don't patch stack(-numArgs). See comment in ceImplicitReceiverSend:receiver:" jumpItsTheReceiverStupid jmpTarget: self Label. self MoveMw: NSCTargetIndex * objectMemory wordSize r: SendNumArgsReg R: TempReg. self JumpR: TempReg.
jumpMiss jmpTarget: self Label. objectRepresentation genEnsureObjInRegNotForwarded: ReceiverResultReg scratchReg: TempReg updatingMw: FoxMFReceiver r: FPReg. + self numRegArgs > 0 ifTrue: + [backEnd genPushRegisterArgsForNumArgs: numArgs scratchReg: TempReg]. - backEnd genPushRegisterArgsForNumArgs: numArgs. ^self genTrampolineFor: aRoutine called: aString numArgs: 2 arg: SendNumArgsReg "The NSSendCache" arg: ReceiverResultReg arg: nil arg: nil saveRegs: false pushLinkReg: true resultReg: ReceiverResultReg "Never happens?" appendOpcodes: true!
Item was changed: ----- Method: StackToRegisterMappingCogit>>compileOpenPIC:numArgs: (in category 'in-line cacheing') ----- compileOpenPIC: selector numArgs: numArgs "Compile the code for an open PIC. Perform a probe of the first-level method lookup cache followed by a call of ceSendFromInLineCacheMiss: if the probe fails. Override to push the register args when calling ceSendFromInLineCacheMiss:" | jumpSelectorMiss jumpClassMiss itsAHit jumpBCMethod | <var: #jumpSelectorMiss type: #'AbstractInstruction *'> <var: #jumpClassMiss type: #'AbstractInstruction *'> <var: #itsAHit type: #'AbstractInstruction *'> <var: #jumpBCMethod type: #'AbstractInstruction *'> self compilePICAbort: numArgs. entry := objectRepresentation genGetClassTagOf: ReceiverResultReg into: ClassReg scratchReg: TempReg.
"Do first of three probes. See CoInterpreter>>lookupInMethodCacheSel:classTag:" self flag: #lookupInMethodCacheSel:classTag:. "so this method shows up as a sender of lookupInMethodCacheSel:class:" self MoveR: ClassReg R: SendNumArgsReg. self annotate: (self XorCw: selector R: ClassReg) objRef: selector. self LogicalShiftLeftCq: objectMemory shiftForWord R: ClassReg. self AndCq: MethodCacheMask << objectMemory shiftForWord R: ClassReg. self MoveMw: coInterpreter methodCacheAddress asUnsignedInteger + (MethodCacheSelector << objectMemory shiftForWord) r: ClassReg R: TempReg. self annotate: (self CmpCw: selector R: TempReg) objRef: selector. jumpSelectorMiss := self JumpNonZero: 0. self MoveMw: coInterpreter methodCacheAddress asUnsignedInteger + (MethodCacheClass << objectMemory shiftForWord) r: ClassReg R: TempReg. self CmpR: SendNumArgsReg R: TempReg. jumpClassMiss := self JumpNonZero: 0.
itsAHit := self Label. "Fetch the method. The interpret trampoline requires the bytecoded method in SendNumArgsReg" self MoveMw: coInterpreter methodCacheAddress asUnsignedInteger + (MethodCacheMethod << objectMemory shiftForWord) r: ClassReg R: SendNumArgsReg. "If the method is compiled jump to its unchecked entry-point, otherwise interpret it." objectRepresentation genLoadSlot: HeaderIndex sourceReg: SendNumArgsReg destReg: TempReg. self MoveR: TempReg R: ClassReg. jumpBCMethod := objectRepresentation genJumpSmallIntegerInScratchReg: TempReg. jumpBCMethod jmpTarget: picInterpretAbort. self AddCq: cmNoCheckEntryOffset R: ClassReg. self JumpR: ClassReg.
"First probe missed. Do second of three probes. Shift hash right one and retry." jumpSelectorMiss jmpTarget: (jumpClassMiss jmpTarget: self Label). self MoveR: SendNumArgsReg R: ClassReg. self annotate: (self XorCw: selector R: ClassReg) objRef: selector. self LogicalShiftLeftCq: objectMemory shiftForWord - 1 R: ClassReg. self AndCq: MethodCacheMask << objectMemory shiftForWord R: ClassReg. self MoveMw: coInterpreter methodCacheAddress asUnsignedInteger + (MethodCacheSelector << objectMemory shiftForWord) r: ClassReg R: TempReg. self annotate: (self CmpCw: selector R: TempReg) objRef: selector. jumpSelectorMiss := self JumpNonZero: 0. self MoveMw: coInterpreter methodCacheAddress asUnsignedInteger + (MethodCacheClass << objectMemory shiftForWord) r: ClassReg R: TempReg. self CmpR: SendNumArgsReg R: TempReg. self JumpZero: itsAHit.
"Second probe missed. Do last probe. Shift hash right two and retry." jumpSelectorMiss jmpTarget: self Label. self MoveR: SendNumArgsReg R: ClassReg. self annotate: (self XorCw: selector R: ClassReg) objRef: selector. objectMemory shiftForWord > 2 ifTrue: [self LogicalShiftLeftCq: objectMemory shiftForWord - 1 R: ClassReg]. self AndCq: MethodCacheMask << objectMemory shiftForWord R: ClassReg. self MoveMw: coInterpreter methodCacheAddress asUnsignedInteger + (MethodCacheSelector << objectMemory shiftForWord) r: ClassReg R: TempReg. self annotate: (self CmpCw: selector R: TempReg) objRef: selector. jumpSelectorMiss := self JumpNonZero: 0. self MoveMw: coInterpreter methodCacheAddress asUnsignedInteger + (MethodCacheClass << objectMemory shiftForWord) r: ClassReg R: TempReg. self CmpR: SendNumArgsReg R: TempReg. self JumpZero: itsAHit.
"Last probe missed. Call ceSendFromInLineCacheMiss: to do the full lookup." jumpSelectorMiss jmpTarget: self Label. + backEnd genPushRegisterArgsForNumArgs: numArgs scratchReg: SendNumArgsReg. - backEnd genPushRegisterArgsForNumArgs: numArgs. self genSmalltalkToCStackSwitch: true. methodLabel addDependent: (self annotateAbsolutePCRef: (self MoveCw: methodLabel asInteger R: SendNumArgsReg)). self compileCallFor: #ceSendFromInLineCacheMiss: numArgs: 1 arg: SendNumArgsReg arg: nil arg: nil arg: nil resultReg: nil saveRegs: false "Note that this call does not return."!
Item was changed: ----- Method: StackToRegisterMappingCogit>>genDoubleArithmetic:preOpCheck: (in category 'primitive generators') ----- genDoubleArithmetic: arithmeticOperator preOpCheck: preOpCheckOrNil "Receiver and arg in registers. Stack looks like return address" <var: #preOpCheckOrNil declareC: 'AbstractInstruction *(*preOpCheckOrNil)(int rcvrReg, int argReg)'> | jumpFailClass jumpFailAlloc jumpFailCheck jumpImmediate jumpNonInt doOp | <var: #jumpFailClass type: #'AbstractInstruction *'> <var: #jumpFailAlloc type: #'AbstractInstruction *'> <var: #jumpImmediate type: #'AbstractInstruction *'> <var: #jumpNonInt type: #'AbstractInstruction *'> <var: #jumpFailCheck type: #'AbstractInstruction *'> <var: #doOp type: #'AbstractInstruction *'> self MoveR: Arg0Reg R: TempReg. objectRepresentation genGetDoubleValueOf: ReceiverResultReg into: DPFPReg0. self MoveR: Arg0Reg R: ClassReg. jumpImmediate := objectRepresentation genJumpImmediateInScratchReg: TempReg. objectRepresentation genGetCompactClassIndexNonImmOf: Arg0Reg into: SendNumArgsReg. objectRepresentation genCmpClassFloatCompactIndexR: SendNumArgsReg. jumpFailClass := self JumpNonZero: 0. objectRepresentation genGetDoubleValueOf: Arg0Reg into: DPFPReg1. doOp := self Label. preOpCheckOrNil ifNotNil: [jumpFailCheck := self perform: preOpCheckOrNil with: DPFPReg0 with: DPFPReg1]. self gen: arithmeticOperator operand: DPFPReg1 operand: DPFPReg0. jumpFailAlloc := objectRepresentation genAllocFloatValue: DPFPReg0 into: SendNumArgsReg scratchReg: ClassReg scratchReg: TempReg. self MoveR: SendNumArgsReg R: ReceiverResultReg. self RetN: 0. "We need to push the register args on two paths; this one and the interpreter primitive path. But the interpreter primitive path won't unless regArgsHaveBeenPushed is false." self assert: methodOrBlockNumArgs <= self numRegArgs. jumpFailClass jmpTarget: self Label. preOpCheckOrNil ifNotNil: [jumpFailCheck jmpTarget: jumpFailClass getJmpTarget]. + backEnd genPushRegisterArgsForNumArgs: methodOrBlockNumArgs scratchReg: SendNumArgsReg. - backEnd genPushRegisterArgsForNumArgs: methodOrBlockNumArgs. jumpFailClass := self Jump: 0. jumpImmediate jmpTarget: self Label. objectRepresentation smallIntegerIsOnlyImmediateType ifFalse: [jumpNonInt := objectRepresentation genJumpNotSmallIntegerInScratchReg: TempReg]. objectRepresentation genConvertSmallIntegerToIntegerInReg: ClassReg. self ConvertR: ClassReg Rd: DPFPReg1. self Jump: doOp. jumpFailAlloc jmpTarget: self Label. self compileFallbackToInterpreterPrimitive. jumpFailClass jmpTarget: self Label. objectRepresentation smallIntegerIsOnlyImmediateType ifFalse: [jumpNonInt jmpTarget: jumpFailClass getJmpTarget]. ^0!
Item was changed: ----- Method: StackToRegisterMappingCogit>>genPICMissTrampolineFor: (in category 'initialization') ----- genPICMissTrampolineFor: numArgs <inline: false> | startAddress | startAddress := methodZoneBase. opcodeIndex := 0. "N.B. a closed PIC jumps to the miss routine, not calls it, so there is only one retpc on the stack." + backEnd genPushRegisterArgsForNumArgs: numArgs scratchReg: SendNumArgsReg. - backEnd genPushRegisterArgsForNumArgs: numArgs. self genTrampolineFor: #ceCPICMiss:receiver: called: (self trampolineName: 'cePICMiss' numArgs: (numArgs <= self numRegArgs ifTrue: [numArgs] ifFalse: [-1])) numArgs: 2 arg: ClassReg arg: ReceiverResultReg arg: nil arg: nil saveRegs: false pushLinkReg: true resultReg: nil appendOpcodes: true. ^startAddress!
Item was changed: ----- Method: StackToRegisterMappingCogit>>genPushRegisterArgs (in category 'compile abstract instructions') ----- genPushRegisterArgs "Ensure that the register args are pushed before the retpc for methods with arity <= self numRegArgs." "This won't be as clumsy on a RISC. But putting the receiver and args above the return address means the CoInterpreter has a single machine-code frame format which saves us a lot of work." (regArgsHaveBeenPushed or: [methodOrBlockNumArgs > self numRegArgs]) ifFalse: + [backEnd genPushRegisterArgsForNumArgs: methodOrBlockNumArgs scratchReg: SendNumArgsReg. - [backEnd genPushRegisterArgsForNumArgs: methodOrBlockNumArgs. regArgsHaveBeenPushed := true]!
Item was changed: ----- Method: StackToRegisterMappingCogit>>genSendTrampolineFor:numArgs:called:arg:arg:arg: (in category 'initialization') ----- genSendTrampolineFor: aRoutine numArgs: numArgs called: aString arg: regOrConst0 arg: regOrConst1 arg: regOrConst2 "Generate a trampoline with three arguments. Hack: a negative value indicates an abstract register, a non-negative value indicates a constant." <var: #aRoutine type: #'void *'> <var: #aString type: #'char *'> | startAddress | <inline: false> startAddress := methodZoneBase. opcodeIndex := 0. + backEnd genPushRegisterArgsForNumArgs: numArgs scratchReg: SendNumArgsReg. - backEnd genPushRegisterArgsForNumArgs: numArgs. self genTrampolineFor: aRoutine called: aString numArgs: 3 arg: regOrConst0 arg: regOrConst1 arg: regOrConst2 arg: nil saveRegs: false pushLinkReg: true resultReg: nil appendOpcodes: true. ^startAddress!
Item was changed: ----- Method: StackToRegisterMappingCogit>>genSendTrampolineFor:numArgs:called:arg:arg:arg:arg: (in category 'initialization') ----- genSendTrampolineFor: aRoutine numArgs: numArgs called: aString arg: regOrConst0 arg: regOrConst1 arg: regOrConst2 arg: regOrConst3 "Generate a trampoline with four arguments. Hack: a negative value indicates an abstract register, a non-negative value indicates a constant." <var: #aRoutine type: #'void *'> <var: #aString type: #'char *'> | startAddress | <inline: false> startAddress := methodZoneBase. opcodeIndex := 0. + backEnd genPushRegisterArgsForNumArgs: numArgs scratchReg: SendNumArgsReg. - backEnd genPushRegisterArgsForNumArgs: numArgs. self genTrampolineFor: aRoutine called: aString numArgs: 4 arg: regOrConst0 arg: regOrConst1 arg: regOrConst2 arg: regOrConst3 saveRegs: false pushLinkReg: true resultReg: nil appendOpcodes: true. ^startAddress!
vm-dev@lists.squeakfoundation.org