[Vm-dev] VM Maker: VMMaker.oscog-eem.746.mcz

Eliot Miranda eliot.miranda at gmail.com
Mon Jun 2 16:51:06 UTC 2014


Hi Clément,

On Mon, Jun 2, 2014 at 7:40 AM, Clément Bera <bera.clement at gmail.com> wrote:

>
> Eliot,
>
> Recent commits are very exciting. Context and closure creations are now
> inlined in machine code :-)
>

I'm glad you think so :-).  What was nice is that once I had inline
context, closure and temp vector creation it was motivating to try and do a
little peephole optimization, and that turned out to be really easy.  If
you look at e.g. Collection>>inject:into: it starts with

17 <8A 01> push: (Array new: 1)
19 <6A> popIntoTemp: 2
20 <10> pushTemp: 0
21 <8E 00 02> popIntoTemp: 0 inVectorAt: 2
24 <70> self
25 <11> pushTemp: 1
26 <12> pushTemp: 2
27 <8F 21 00 0A> closureNumCopied: 2 numArgs: 1 bytes 31 to 40

With the current memory manager this generates the following machine code
on x86

17: 1057: movl $0x00000001, %ebx : BB 01 00 00 00
105c: call .+0xfffff927 (0x00000988=ceCreateNewArrayTrampoline) : E8 27 F9
FF FF
20: 1061: movl 12(%ebp), %eax : 8B 45 0C
21: 1064: movl %eax, %ds:0x4(%edx) : 89 42 04
19: 1067: movl %edx, -16(%ebp) : 89 55 F0
24: 106a: movl -12(%ebp), %eax : 8B 45 F4
106d: pushl %eax : 50
25: 106e: movl 8(%ebp), %eax : 8B 45 08
 1071: pushl %eax : 50
26: 1072: movl -16(%ebp), %eax : 8B 45 F0
1075: pushl %eax : 50
27: 1076: movl $0x0001f081, %ebx : BB 81 F0 01 00
107b: call .+0xfffff9a0 (0x00000a20=ceClosureCopyTrampoline) : E8 A0 F9 FF
FF
 1080: addl $0x00000008, %esp : 83 C4 08

The reordering of bytecodes 20 & 21 before 19 is due to the Cogit's
stack-to-register-mapping code generating strategy which defers certain
actions until operands are used.

Both ceCreateNewArrayTrampoline and ceClosureCopyTrampoline switch stacks
to the C stack and call routines in the cointerpreter to create an array
and create a closure.  Hence, the array created by
ceCreateNewArrayTrampoline is initialized with nils before being returned,
and the closure's copied values are pushed to the stack so that the
cointerpreter can copy these values into the closure.

With Spur the machine code generated is quite different and even though its
longer it is faster because the calls into the cointerpreter are expensive,
and because the current memory manager's allocation routines are slow:

17: 1463: movl %ds:0x2001c=#freeStart, %edx : 8B 15 1C 00 02 00 inline temp
vector allocation
 1469: movl $0x02000033, %eax : B8 33 00 00 02   high 32-bits of header
 146e: movl %eax, %ds:(%edx) : 89 02
1470: movl $0x01000000, %eax : B8 00 00 00 01    low 32-bits of header
 1475: movl %eax, %ds:0x4(%edx) : 89 42 04
1478: movl $0x00100000=nil, %eax : B8 00 00 10 00    initialize slot with
nil
 147d: movl %eax, %ds:0x8(%edx) : 89 42 08
1480: movl %edx, %eax : 89 D0
1482: addl $0x00000010, %eax : 83 C0 10
 1485: movl %eax, %ds:0x2001c=#freeStart : A3 1C 00 02 00  write back
allocation pointer
 148a: cmpl $0x00024680, %eax : 3D 80 46 02 00
148f: jb .+0x00000005 (0x00001496) : 72 05    check if a scavenge should be
scheduled
 1491: call .+0xfffff632 (0x00000ac8=ceSheduleScavengeTrampoline) : E8 32
F6 FF FF
19: 1496: movl %edx, -16(%ebp) : 89 55 F0
20: 1499: movl 12(%ebp), %ecx : 8B 4D 0C
21: 149c: movl -16(%ebp), %edx : 8B 55 F0    alas an inline store check is
generated
 149f: movl %ecx, %ds:0x8(%edx) : 89 4A 08
14a2: movl %ecx, %eax : 89 C8
14a4: andl $0x00000003, %eax : 83 E0 03
 14a7: jnz .+0x0000001c (0x000014c5) : 75 1C
14a9: movl $0x12345678, %eax : B8 78 56 34 12
14ae: cmpl %eax, %edx : 39 C2
 14b0: jb .+0x00000013 (0x000014c5) : 72 13
14b2: cmpl %eax, %ecx : 39 C1
14b4: jnb .+0x0000000f (0x000014c5) : 73 0F
 14b6: movb %ds:0x3(%edx), %al : 8A 42 03
14b9: andl $0x00000020, %eax : 83 E0 20
14bc: jnz .+0x00000007 (0x000014c5) : 75 07
 14be: pushl %ecx : 51
14bf: call .+0xfffff5d4 (0x00000a98=ceStoreCheckTrampoline) : E8 D4 F5 FF
FF
14c4: popl %ecx : 59
27: 14c5: movl $0x00000002, %ebx : BB 02 00 00 00    set numArgs for
ceSmallMethodContext
 14ca: call .+0xfffff639 (0x00000b08=ceSmallMethodContext) : E8 39 F6
FF FF call
to machine code context creation
14cf: movl %edx, %ecx : 89 D1
 14d1: movl %ds:0x2001c=#freeStart, %edx : 8B 15 1C 00 02 00   inline
closure allocation
 14d7: movl $0x03000025, %eax : B8 25 00 00 03
14dc: movl %eax, %ds:(%edx) : 89 02
14de: movl $0x05000000, %eax : B8 00 00 00 05
 14e3: movl %eax, %ds:0x4(%edx) : 89 42 04
14e6: movl %edx, %eax : 89 D0
14e8: addl $0x00000020, %eax : 83 C0 20
 14eb: movl %eax, %ds:0x2001c=#freeStart : A3 1C 00 02 00
14f0: cmpl $0x00024680, %eax : 3D 80 46 02 00
 14f5: jb .+0x00000005 (0x000014fc) : 72 05
14f7: call .+0xfffff5cc (0x00000ac8=ceSheduleScavengeTrampoline) : E8 CC F5
FF FF
 14fc: movl %ecx, %ds:0x8(%edx) : 89 4A 08
14ff: movl $0x0000003f, %eax : B8 3F 00 00 00       set closure's startpc
 1504: movl %eax, %ds:0xc(%edx) : 89 42 0C
1507: movl $0x00000003, %eax : B8 03 00 00 00       set closure's numArgs
(1 as a SmallInteger)
 150c: movl %eax, %ds:0x10(%edx) : 89 42 10
150f: movl -16(%ebp), %eax : 8B 45 F0          set closure's outerContext
 1512: movl %eax, %ds:0x18(%edx) : 89 42 18
1515: movl 8(%ebp), %eax : 8B 45 08           set closure's copied value
 1518: movl %eax, %ds:0x14(%edx) : 89 42 14

What's cool is that the closure creation and initialization is all inlined,
pushing nothing to the stack at all.  However, the indirect temp vector
initialization is clumsy.  The store check is unnecessary and long.  And
we're still initializing the slot with nil before assigning the temp from
"pushTemp: 0; popIntoTemp: 0 inVectorAt: 2".

So I thought I would try to implement a peephole that would look for
push: (Array new: 1)
popIntoTemp: tempIndex
 pushConstant: const or pushTemp: n
popIntoTemp: 0 inVectorAt: tempIndex
and collapse this into
 tempAt: tempIndex put: {const or temp}

The peephole's checking for the sequence is a bit long-winded but it boils
down to

 objectRepresentation genNewArrayOfSize: 1 initialized: false.
self evaluate: pushValueDesc at: bytecodePC + pushArrayDesc numBytes +
storeArrayDesc numBytes.
 reg := self ssStorePop: true toPreferredReg: TempReg.
objectRepresentation
genStoreSourceReg: reg
 slotIndex: 0
intoNewObjectInDestReg: ReceiverResultReg.
self ssPushRegister: ReceiverResultReg.
 self evaluate: storeArrayDesc at: bytecodePC + pushArrayDesc numBytes.

and now the code generated is quite a bit shorter:

17: 1463: movl %ds:0x2001c=#freeStart, %edx : 8B 15 1C 00 02 00
1469: movl $0x02000033, %eax : B8 33 00 00 02
 146e: movl %eax, %ds:(%edx) : 89 02
1470: movl $0x01000000, %eax : B8 00 00 00 01
1475: movl %eax, %ds:0x4(%edx) : 89 42 04
 1478: movl %edx, %eax : 89 D0
147a: addl $0x00000010, %eax : 83 C0 10
147d: movl %eax, %ds:0x2001c=#freeStart : A3 1C 00 02 00
 1482: cmpl $0x00024680, %eax : 3D 80 46 02 00
1487: jb .+0x00000005 (0x0000148e) : 72 05
1489: call .+0xfffff63a (0x00000ac8=ceSheduleScavengeTrampoline) : E8 3A F6
FF FF
20: 148e: movl 12(%ebp), %eax : 8B 45 0C
21: 1491: movl %eax, %ds:0x8(%edx) : 89 42 08
19: 1494: movl %edx, -16(%ebp) : 89 55 F0
27: 1497: movl $0x00000002, %ebx : BB 02 00 00 00
149c: call .+0xfffff667 (0x00000b08=ceSmallMethodContext) : E8 67 F6 FF FF
 14a1: movl %edx, %ecx : 89 D1
14a3: movl %ds:0x2001c=#freeStart, %edx : 8B 15 1C 00 02 00
14a9: movl $0x03000025, %eax : B8 25 00 00 03
 14ae: movl %eax, %ds:(%edx) : 89 02
14b0: movl $0x05000000, %eax : B8 00 00 00 05
14b5: movl %eax, %ds:0x4(%edx) : 89 42 04
 14b8: movl %edx, %eax : 89 D0
14ba: addl $0x00000020, %eax : 83 C0 20
14bd: movl %eax, %ds:0x2001c=#freeStart : A3 1C 00 02 00
 14c2: cmpl $0x00024680, %eax : 3D 80 46 02 00
14c7: jb .+0x00000005 (0x000014ce) : 72 05
14c9: call .+0xfffff5fa (0x00000ac8=ceSheduleScavengeTrampoline) : E8 FA F5
FF FF
 14ce: movl %ecx, %ds:0x8(%edx) : 89 4A 08
14d1: movl $0x0000003f, %eax : B8 3F 00 00 00
14d6: movl %eax, %ds:0xc(%edx) : 89 42 0C
 14d9: movl $0x00000003, %eax : B8 03 00 00 00
14de: movl %eax, %ds:0x10(%edx) : 89 42 10
14e1: movl -16(%ebp), %eax : 8B 45 F0
 14e4: movl %eax, %ds:0x18(%edx) : 89 42 18
14e7: movl 8(%ebp), %eax : 8B 45 08
14ea: movl %eax, %ds:0x14(%edx) : 89 42 14

Very nice.  I could also move the allocations into trampolines.  There's an
advantage to this because the test for ceSheduleScavengeTrampoline ends up
only jumping when a scavenge is required, returning if not.  And it will
save even more space.  This at the cost of a leaf call to get to the
trampoline plus setting a register with either the number of slots in the
array or the number of copied values.

But what's most satisfying is the use of the entire VMMaker framework with
the simulator and Bochs plugin allowing me to test the code generation and
the generated code immediately with no long edit-compile cycles.  I was
able to implement the entire refactoring of temp vector and closure
creation (moving it from the Cogit to the object representations) and
implement machine code allocation over a weekend in which I still took my
son fishing, went shopping, played gran turismo and watched a couple of
movies.

This is the first time I've ever implemented context creation in machine
code in a Smalltalk VM.  Context creation is complex because the context
has to be allocated and married with its stack frame.  In the VW VM and in
Cog up until now that has been done in C code, requiring an expensive
switch from the Smalltalk machine code stack to the C stack, and using a
generic instantiation routine that marries any kind of context, large or
small x method or block.  Now I have four machine code routines, large or
small x method or block, that are extremely tight and take a single
register parameter which is the argument count, which is known at compile
time.  That's possible both because Spur makes allocation simple /and/
because the whole Cog VMMaker framework makes developing and debugging
machine code so much easier than in a C VM.

Have you already done at:put: and stringAt:put: or is it your next step ?
>

That's been there for quite a while.


>  Please tell us about the new bench results with these features.
>

So in the current Cog the following, which is mostly Interval>>#do: and
SmallInteger>>#+, but involves lots of closure creation where all but the
outer closure has copied values:

[(1 to: 10) do: [:i| (1 to: 10) do: [:j| (1 to: 10) do: [:k| (1 to: 10) do:
[:l| (1 to: 10) do: [:m| (1 to: 10) do: [:n| (1 to: 10) do: [:o| (1 to: 10)
do: [:p| i + j + k + l + m + n + o + p]]]]]]]]] timeToRun

evaluates to 9002, 9 seconds.

In the new Spur VM it just evaluated to 4829, 4.8 seconds, for a -46%
speedup.

Currently the Newspeak bootstrap, which loads a number of Monticello
packages, including the Newspeak compiler, and then compiles a number of
Newspeak packages, about 80k lines of Newspeak, now takes about -51% of the
time using Spur.

Alas I now need two computers.  I tend to keep lots of tabs open in my web
browser and unless I quit Chrome my benchmark figures slow down by about a
factor of two :-( [ ;-) ].

Of course, this:

[1 to: 10 do: [:i| 1 to: 10 do: [:j| 1 to: 10 do: [:k| 1 to: 10 do: [:l| 1
to: 10 do: [:m| 1 to: 10 do: [:n| 1 to: 10 do: [:o| 1 to: 10 do: [:p| i + j
+ k + l + m + n + o + p]]]]]]]]] timeToRun

is even faster, 2320, for a -52% speedup w.r.t. Spur and a -74% speedup
w.r.t. the current VM.  And that's what you're working on Clément.
 (Clément is working on adaptive optimization which will optimize the
Interval>>#do: code into the inlined to:do: code on the fly).  And thats
going to be really exciting!


Clément
>
>
> 2014-06-02 16:14 GMT+02:00 <commits at source.squeak.org>:
>
>>
>> Eliot Miranda uploaded a new version of VMMaker to project VM Maker:
>> http://source.squeak.org/VMMaker/VMMaker.oscog-eem.746.mcz
>>
>> ==================== Summary ====================
>>
>> Name: VMMaker.oscog-eem.746
>> Author: eem
>> Time: 1 June 2014, 6:05:30.694 pm
>> UUID: cc4961d3-e629-4e28-b308-88eab314a8c9
>> Ancestors: VMMaker.oscog-eem.745
>>
>> Implement a peephole in the Spur Cogit for an indirection
>> vector initialized with a single value  Avoid initializing the
>> slot in the array to nil and instead initialize it with the value.
>>
>> Refactor setting byte1, byte2 & byte3 into
>> loadSubsequentBytesForDescriptor:at: for the peephole
>> tryCollapseTempVectorInitializationOfSize:.
>>
>> No loner inline CoInterpreter>>pre/postGCAction: for VM profiling.
>>
>> Increase the number of trampoline table slots.
>>
>> Simulator:
>> Fix CurrentImageCoInterpreterFacade for the new Spur
>> inline instantiation code.
>>
>> =============== Diff against VMMaker.oscog-eem.745 ===============
>>
>> Item was changed:
>>   ----- Method: CoInterpreter>>postGCAction: (in category 'object memory
>> support') -----
>>   postGCAction: gcModeArg
>>         "Attempt to shrink free memory, signal the gc semaphore and let
>> the Cogit do its post GC thang"
>> +       <inline: false>
>>         self assert: gcModeArg = gcMode.
>>         super postGCAction: gcModeArg.
>>         cogit cogitPostGCAction: gcModeArg.
>>         lastCoggableInterpretedBlockMethod :=
>> lastUncoggableInterpretedBlockMethod := nil.
>>         gcMode := 0!
>>
>> Item was changed:
>>   ----- Method: CoInterpreter>>preGCAction: (in category 'object memory
>> support') -----
>>   preGCAction: gcModeArg
>> +       <inline: false>
>> -       <inline: true>
>>         "Need to write back the frame pointers unless all pages are free
>> (as in snapshot).
>>          Need to set gcMode var (to avoid passing the flag through a lot
>> of the updating code)"
>>         super preGCAction: gcModeArg.
>>
>>         gcMode := gcModeArg.
>>
>>         cogit recordEventTrace ifTrue:
>>                 [| traceType |
>>                 traceType := gcModeArg == GCModeFull ifTrue:
>> [TraceFullGC] ifFalse: [TraceIncrementalGC].
>>                 self recordTrace: traceType thing: traceType source: 0].
>>
>>         cogit recordPrimTrace ifTrue:
>>                 [| traceType |
>>                 traceType := gcModeArg == GCModeFull ifTrue:
>> [TraceFullGC] ifFalse: [TraceIncrementalGC].
>>                 self fastLogPrim: traceType]!
>>
>> Item was added:
>> + ----- Method: CogObjectRepresentation>>createsArraysInline (in category
>> 'bytecode generator support') -----
>> + createsArraysInline
>> +       "Answer if the object representation allocates arrays inline.  By
>> +        default answer false. Better code can be generated when creating
>> +        arrays inline if values are /not/ flushed to the stack."
>> +       ^false!
>>
>> Item was removed:
>> - ----- Method:
>> CogObjectRepresentationFor32BitSpur>>createsClosuresInline (in category
>> 'bytecode generator support') -----
>> - createsClosuresInline
>> -       "Answer if the object representation allocates closures inline.
>>  By
>> -        default answer false. Better code can be generated when creating
>> -        closures inline if copied values are /not/ flushed to the stack."
>> -       ^true!
>>
>> Item was added:
>> + ----- Method: CogObjectRepresentationForSpur>>createsArraysInline (in
>> category 'bytecode generator support') -----
>> + createsArraysInline
>> +       "Answer if the object representation allocates arrays inline.  By
>> +        default answer false. Better code can be generated when creating
>> +        arrays inline if values are /not/ flushed to the stack."
>> +       ^true!
>>
>> Item was added:
>> + ----- Method: CogObjectRepresentationForSpur>>createsClosuresInline (in
>> category 'bytecode generator support') -----
>> + createsClosuresInline
>> +       "Answer if the object representation allocates closures inline.
>>  By
>> +        default answer false. Better code can be generated when creating
>> +        closures inline if copied values are /not/ flushed to the stack."
>> +       ^true!
>>
>> Item was changed:
>>   ----- Method: Cogit>>compileAbstractInstructionsFrom:through: (in
>> category 'compile abstract instructions') -----
>>   compileAbstractInstructionsFrom: start through: end
>>         "Loop over bytecodes, dispatching to the generator for each
>> bytecode, handling fixups in due course."
>>         | nextOpcodeIndex descriptor fixup result nExts |
>>         <var: #descriptor type: #'BytecodeDescriptor *'>
>>         <var: #fixup type: #'BytecodeFixup *'>
>>         bytecodePC := start.
>>         nExts := 0.
>>         [byte0 := (objectMemory fetchByte: bytecodePC ofObject:
>> methodObj)  + bytecodeSetOffset.
>>          descriptor := self generatorAt: byte0.
>> +        self loadSubsequentBytesForDescriptor: descriptor at: bytecodePC.
>> -        descriptor numBytes > 1 ifTrue:
>> -               [byte1 := objectMemory fetchByte: bytecodePC + 1
>> ofObject: methodObj.
>> -                descriptor numBytes > 2 ifTrue:
>> -                       [byte2 := objectMemory fetchByte: bytecodePC + 2
>> ofObject: methodObj.
>> -                        descriptor numBytes > 3 ifTrue:
>> -                               [byte3 := objectMemory fetchByte:
>> bytecodePC + 3 ofObject: methodObj.
>> -                                descriptor numBytes > 4 ifTrue:
>> -                                       [self notYetImplemented]]]].
>>          nextOpcodeIndex := opcodeIndex.
>>          result := self perform: descriptor generator.
>>          descriptor isExtension ifFalse: "extended bytecodes must consume
>> their extensions"
>>                 [self assert: (extA = 0 and: [extB = 0])].
>>          fixup := self fixupAt: bytecodePC - initialPC.
>>          fixup targetInstruction ~= 0 ifTrue:
>>                 ["There is a fixup for this bytecode.  It must point to
>> the first generated
>>                    instruction for this bytecode.  If there isn't one we
>> need to add a label."
>>                  opcodeIndex = nextOpcodeIndex ifTrue:
>>                         [self Label].
>>                  fixup targetInstruction: (self abstractInstructionAt:
>> nextOpcodeIndex)].
>>          bytecodePC := self nextBytecodePCFor: descriptor at: bytecodePC
>> exts: nExts in: methodObj.
>>          result = 0 and: [bytecodePC <= end]]
>>                 whileTrue:
>>                         [nExts := descriptor isExtension ifTrue: [nExts +
>> 1] ifFalse: [0]].
>>         self checkEnoughOpcodes.
>>         ^result!
>>
>> Item was added:
>> + ----- Method: Cogit>>loadSubsequentBytesForDescriptor:at: (in category
>> 'compile abstract instructions') -----
>> + loadSubsequentBytesForDescriptor: descriptor at: pc
>> +       <var: #descriptor type: #'BytecodeDescriptor *'>
>> +       descriptor numBytes > 1 ifTrue:
>> +               [byte1 := objectMemory fetchByte: pc + 1 ofObject:
>> methodObj.
>> +                descriptor numBytes > 2 ifTrue:
>> +                       [byte2 := objectMemory fetchByte: pc + 2
>> ofObject: methodObj.
>> +                        descriptor numBytes > 3 ifTrue:
>> +                               [byte3 := objectMemory fetchByte: pc + 3
>> ofObject: methodObj.
>> +                                descriptor numBytes > 4 ifTrue:
>> +                                       [self notYetImplemented]]]]!
>>
>> Item was added:
>> + ----- Method: CurrentImageCoInterpreterFacade class>>objectMemoryClass
>> (in category 'accessing') -----
>> + objectMemoryClass
>> +       ^self subclassResponsibility!
>>
>> Item was changed:
>>   ----- Method: CurrentImageCoInterpreterFacade>>cogit: (in category
>> 'initialize-release') -----
>>   cogit: aCogit
>>         cogit := aCogit.
>>         coInterpreter cogit: aCogit.
>> +       (objectMemory respondsTo: #cogit:) ifTrue:
>> +               [objectMemory cogit: aCogit]!
>> -       objectMemory cogit: aCogit!
>>
>> Item was added:
>> + ----- Method: CurrentImageCoInterpreterFacade>>indexablePointersFormat
>> (in category 'accessing') -----
>> + indexablePointersFormat
>> +       ^objectMemory indexablePointersFormat!
>>
>> Item was changed:
>>   ----- Method: CurrentImageCoInterpreterFacade>>initialize (in category
>> 'initialize-release') -----
>>   initialize
>>         memory := ByteArray new: 262144.
>> +       objectMemory := self class objectMemoryClass new.
>> -       objectMemory := NewCoObjectMemory new.
>>         coInterpreter := CoInterpreter new.
>>         coInterpreter
>>                 instVarNamed: 'objectMemory'
>>                         put: objectMemory;
>>                 instVarNamed: 'primitiveTable'
>>                         put: (CArrayAccessor on: CoInterpreter
>> primitiveTable copy).
>>         variables := Dictionary new.
>>         #('stackLimit') do:
>>                 [:l| self addressForLabel: l].
>>         self initializeObjectMap!
>>
>> Item was added:
>> + ----- Method: CurrentImageCoInterpreterFacade>>methodNeedsLargeContext:
>> (in category 'accessing') -----
>> + methodNeedsLargeContext: aMethodOop
>> +       ^(self objectForOop: aMethodOop) frameSize > CompiledMethod
>> smallFrameSize!
>>
>> Item was added:
>> + ----- Method:
>> CurrentImageCoInterpreterFacadeForSpurObjectRepresentation
>> class>>objectMemoryClass (in category 'accessing') -----
>> + objectMemoryClass
>> +       ^Spur32BitCoMemoryManager!
>>
>> Item was added:
>> + ----- Method:
>> CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>arrayFormat (in
>> category 'accessing') -----
>> + arrayFormat
>> +       ^objectMemory arrayFormat!
>>
>> Item was added:
>> + ----- Method:
>> CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>getScavengeThreshold
>> (in category 'accessing') -----
>> + getScavengeThreshold
>> +       ^objectMemory getScavengeThreshold ifNil: [16r24680]!
>>
>> Item was added:
>> + ----- Method:
>> CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>headerForSlots:format:classIndex:
>> (in category 'accessing') -----
>> + headerForSlots: numSlots format: formatField classIndex: classIndex
>> +       ^objectMemory headerForSlots: numSlots format: formatField
>> classIndex: classIndex!
>>
>> Item was added:
>> + ----- Method:
>> CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>numSlotsMask
>> (in category 'accessing') -----
>> + numSlotsMask
>> +       ^objectMemory numSlotsMask!
>>
>> Item was added:
>> + ----- Method:
>> CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>rememberedBitShift
>> (in category 'accessing') -----
>> + rememberedBitShift
>> +       ^objectMemory rememberedBitShift!
>>
>> Item was added:
>> + ----- Method:
>> CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>smallObjectBytesForSlots:
>> (in category 'accessing') -----
>> + smallObjectBytesForSlots: numSlots
>> +       ^objectMemory smallObjectBytesForSlots: numSlots!
>>
>> Item was added:
>> + ----- Method:
>> CurrentImageCoInterpreterFacadeForSpurObjectRepresentation>>storeCheckBoundary
>> (in category 'accessing') -----
>> + storeCheckBoundary
>> +       ^objectMemory storeCheckBoundary ifNil: [16r12345678]!
>>
>> Item was added:
>> + ----- Method:
>> CurrentImageCoInterpreterFacadeForSqueakV3ObjectRepresentation
>> class>>objectMemoryClass (in category 'accessing') -----
>> + objectMemoryClass
>> +       ^NewObjectMemory!
>>
>> Item was changed:
>>   ----- Method: SimpleStackBasedCogit class>>initializeMiscConstants (in
>> category 'class initialization') -----
>>   initializeMiscConstants
>>         super initializeMiscConstants.
>>         MaxLiteralCountForCompile := initializationOptions at:
>> #MaxLiteralCountForCompile ifAbsent: [60].
>>         NumTrampolines := NewspeakVM
>> +                                                       ifTrue: [50]
>> +                                                       ifFalse: [42]!
>> -                                                       ifTrue: [46]
>> -                                                       ifFalse: [38]!
>>
>> Item was changed:
>>   ----- Method: StackToRegisterMappingCogit
>> class>>initializeMiscConstants (in category 'class initialization') -----
>>   initializeMiscConstants
>>         super initializeMiscConstants.
>>         NumTrampolines := NewspeakVM
>> +                                                       ifTrue: [60]
>> +                                                       ifFalse: [52]!
>> -                                                       ifTrue: [58]
>> -                                                       ifFalse: [50]!
>>
>> Item was changed:
>>   ----- Method:
>> StackToRegisterMappingCogit>>compileAbstractInstructionsFrom:through: (in
>> category 'compile abstract instructions') -----
>>   compileAbstractInstructionsFrom: start through: end
>>         "Loop over bytecodes, dispatching to the generator for each
>> bytecode, handling fixups in due course."
>>         | nextOpcodeIndex descriptor nExts fixup result |
>>         <var: #descriptor type: #'BytecodeDescriptor *'>
>>         <var: #fixup type: #'BytecodeFixup *'>
>>         self traceSimStack.
>>         bytecodePC := start.
>>         nExts := 0.
>>         descriptor := nil.
>>         deadCode := false.
>>         [self cCode: '' inSmalltalk:
>>                 [(debugBytecodePointers includes: bytecodePC) ifTrue:
>> [self halt]].
>>         fixup := self fixupAt: bytecodePC - initialPC.
>>         fixup targetInstruction asUnsignedInteger > 0
>>                 ifTrue:
>>                         [deadCode := false.
>>                          fixup targetInstruction asUnsignedInteger >= 2
>> ifTrue:
>>                                 [self merge: fixup
>>                                         afterContinuation: (descriptor
>> notNil
>>
>>       and: [descriptor isUnconditionalBranch
>>
>>               or: [descriptor isReturn]]) not]]
>>                 ifFalse: "If there's no fixup following a return there's
>> no jump to that code and it is dead."
>>                         [(descriptor notNil and: [descriptor isReturn])
>> ifTrue:
>>                                 [deadCode := true]].
>>          self cCode: '' inSmalltalk:
>>                 [deadCode ifFalse:
>>                         [self assert: simStackPtr + (needsFrame ifTrue:
>> [0] ifFalse: [1])
>>                                                 = (self
>> debugStackPointerFor: bytecodePC)]].
>>          byte0 := (objectMemory fetchByte: bytecodePC ofObject:
>> methodObj) + bytecodeSetOffset.
>>          descriptor := self generatorAt: byte0.
>> +        self loadSubsequentBytesForDescriptor: descriptor at: bytecodePC.
>> -        descriptor numBytes > 1 ifTrue:
>> -               [byte1 := objectMemory fetchByte: bytecodePC + 1
>> ofObject: methodObj.
>> -                descriptor numBytes > 2 ifTrue:
>> -                       [byte2 := objectMemory fetchByte: bytecodePC + 2
>> ofObject: methodObj.
>> -                        descriptor numBytes > 3 ifTrue:
>> -                               [byte3 := objectMemory fetchByte:
>> bytecodePC + 3 ofObject: methodObj.
>> -                                descriptor numBytes > 4 ifTrue:
>> -                                       [self notYetImplemented]]]].
>>          nextOpcodeIndex := opcodeIndex.
>>          result := deadCode
>>                                 ifTrue: "insert nops for dead code that
>> is mapped so that bc to mc mapping is not many to one"
>>                                         [(descriptor isMapped
>>                                           or: [inBlock and: [descriptor
>> isMappedInBlock]]) ifTrue:
>>                                                 [self annotateBytecode:
>> self Nop].
>>                                                 0]
>>                                 ifFalse:
>>                                         [self perform: descriptor
>> generator].
>>          descriptor isExtension ifFalse: "extended bytecodes must consume
>> their extensions"
>>                 [self assert: (extA = 0 and: [extB = 0])].
>>          self traceDescriptor: descriptor; traceSimStack.
>>          (fixup targetInstruction asUnsignedInteger between: 1 and: 2)
>> ifTrue:
>>                 ["There is a fixup for this bytecode.  It must point to
>> the first generated
>>                    instruction for this bytecode.  If there isn't one we
>> need to add a label."
>>                  opcodeIndex = nextOpcodeIndex ifTrue:
>>                         [self Label].
>>                  fixup targetInstruction: (self abstractInstructionAt:
>> nextOpcodeIndex)].
>>          bytecodePC := self nextBytecodePCFor: descriptor at: bytecodePC
>> exts: nExts in: methodObj.
>>          result = 0 and: [bytecodePC <= end]] whileTrue:
>>                 [nExts := descriptor isExtension ifTrue: [nExts + 1]
>> ifFalse: [0]].
>>         self checkEnoughOpcodes.
>>         ^result!
>>
>> Item was added:
>> + ----- Method: StackToRegisterMappingCogit>>evaluate:at: (in category
>> 'peephole optimizations') -----
>> + evaluate: descriptor at: pc
>> +       <var: #descriptor type: #'BytecodeDescriptor *'>
>> +       byte0 := objectMemory fetchByte: pc ofObject: methodObj.
>> +       self assert: descriptor = (self generatorAt: bytecodeSetOffset +
>> byte0).
>> +       self loadSubsequentBytesForDescriptor: descriptor at: pc.
>> +       self perform: descriptor generator!
>>
>> Item was changed:
>>   ----- Method: StackToRegisterMappingCogit>>genPushNewArrayBytecode (in
>> category 'bytecode generators') -----
>>   genPushNewArrayBytecode
>>         | size popValues |
>>         self assert: needsFrame.
>>         optStatus isReceiverResultRegLive: false.
>>         (popValues := byte1 > 127)
>>                 ifTrue: [self ssFlushTo: simStackPtr]
>>                 ifFalse: [self ssAllocateCallReg: SendNumArgsReg and:
>> ReceiverResultReg].
>>         size := byte1 bitAnd: 127.
>> +       popValues ifFalse:
>> +               [(self tryCollapseTempVectorInitializationOfSize: size)
>> ifTrue:
>> +                       [^0]].
>>         objectRepresentation genNewArrayOfSize: size initialized:
>> popValues not.
>>         popValues ifTrue:
>>                 [size - 1 to: 0 by: -1 do:
>>                         [:i|
>>                         self PopR: TempReg.
>>                         objectRepresentation
>>                                 genStoreSourceReg: TempReg
>>                                 slotIndex: i
>>                                 intoNewObjectInDestReg:
>> ReceiverResultReg].
>>                  self ssPop: size].
>>         ^self ssPushRegister: ReceiverResultReg!
>>
>> Item was added:
>> + ----- Method:
>> StackToRegisterMappingCogit>>tryCollapseTempVectorInitializationOfSize: (in
>> category 'peephole optimizations') -----
>> + tryCollapseTempVectorInitializationOfSize: slots
>> +       "Try and collapse
>> +               push: (Array new: 1)
>> +               popIntoTemp: tempIndex
>> +               pushConstant: const or pushTemp: n
>> +               popIntoTemp: 0 inVectorAt: tempIndex
>> +        into
>> +               tempAt: tempIndex put: {const}.
>> +        One might think that we should look for a sequence of more than
>> +        one pushes and pops but this is extremely rare."
>> +       | pushArrayDesc storeArrayDesc pushValueDesc storeValueDesc reg |
>> +       <var: #pushArrayDesc type: #'BytecodeDescriptor *'>
>> +       <var: #pushValueDesc type: #'BytecodeDescriptor *'>
>> +       <var: #storeArrayDesc type: #'BytecodeDescriptor *'>
>> +       <var: #storeValueDesc type: #'BytecodeDescriptor *'>
>> +       slots ~= 1 ifTrue:
>> +               [^false].
>> +       pushArrayDesc := self generatorAt: bytecodeSetOffset
>> +
>>       + (objectMemory
>> +
>>                       fetchByte: bytecodePC
>> +
>>                       ofObject: methodObj).
>> +       self assert: pushArrayDesc generator == #genPushNewArrayBytecode.
>> +       storeArrayDesc := self generatorAt: bytecodeSetOffset
>> +
>>       + (objectMemory
>> +
>>                       fetchByte: bytecodePC
>> +
>>                                       + pushArrayDesc numBytes
>> +
>>                       ofObject: methodObj).
>> +       storeArrayDesc generator ~~
>> #genStoreAndPopTemporaryVariableBytecode ifTrue:
>> +               [^false].
>> +       pushValueDesc := self generatorAt: bytecodeSetOffset
>> +
>>       + (objectMemory
>> +
>>                       fetchByte: bytecodePC
>> +
>>                                       + pushArrayDesc numBytes
>> +
>>                                       + storeArrayDesc numBytes
>> +
>>                       ofObject: methodObj).
>> +       (pushValueDesc generator ~~ #genPushLiteralConstantBytecode
>> +        and: [pushValueDesc generator ~~
>> #genPushQuickIntegerConstantBytecode
>> +        and: [pushValueDesc generator ~~
>> #genPushTemporaryVariableBytecode]]) ifTrue:
>> +               [^false].
>> +       storeValueDesc := self generatorAt: bytecodeSetOffset
>> +
>>       + (objectMemory
>> +
>>                       fetchByte: bytecodePC
>> +
>>                                       + pushArrayDesc numBytes
>> +
>>                                       + storeArrayDesc numBytes
>> +
>>                                       + pushValueDesc numBytes
>> +
>>                       ofObject: methodObj).
>> +       storeValueDesc generator ~~ #genStoreAndPopRemoteTempLongBytecode
>> ifTrue:
>> +               [^false].
>> +
>> +       objectRepresentation genNewArrayOfSize: 1 initialized: false.
>> +       self evaluate: pushValueDesc at: bytecodePC + pushArrayDesc
>> numBytes + storeArrayDesc numBytes.
>> +       reg := self ssStorePop: true toPreferredReg: TempReg.
>> +       objectRepresentation
>> +               genStoreSourceReg: reg
>> +               slotIndex: 0
>> +               intoNewObjectInDestReg: ReceiverResultReg.
>> +       self ssPushRegister: ReceiverResultReg.
>> +       self evaluate: storeArrayDesc at: bytecodePC + pushArrayDesc
>> numBytes.
>> +       bytecodePC := bytecodePC
>> +                                       "+ pushArrayDesc numBytes this
>> gets added by nextBytecodePCFor:at:exts:in:"
>> +                                       + storeArrayDesc numBytes
>> +                                       + pushValueDesc numBytes
>> +                                       + storeValueDesc numBytes.
>> +       ^true!
>>
>>
>
>


-- 
best,
Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20140602/79b10f92/attachment-0001.htm


More information about the Vm-dev mailing list