[Vm-dev] VM Maker Inbox: VMMaker.oscog-nice.3174.mcz

Marcel Taeumel marcel.taeumel at hpi.de
Thu Mar 10 06:53:32 UTC 2022


Hi Nicolas --

Thanks for this explanation.

> However, all this might as well be a false problem. How many literal
> characters > 16rFF do we have in trunk ?
> Since we had poor support for extended (not to say unicode) fonts until
> recently, probably not that many ! A $? in source code is not helpful !


Yes, the source code in Trunk is fine. I stumbled upon this issue while debugging things around Character, #leadingChar, glyph browsing, and TextConverter. I got used to putting a $ in front of some visible thing in a text field to then send #charCode, #asInteger, or #leadingChar to it. Being able to do-it/inspect-it/print-it in any text field is very convenient. I could have done this via    '...' first    as well. An exception would have been nice, though.

For the time being, it makes sense to just signal an error when the user tries to create literal characters between 16r8000 and 16rFFFF.

If you happen to know the exact place in the image, would you mind adding this check in Trunk to EncoderForSistaV1? :-)

Best,
Marcel
Am 09.03.2022 23:16:25 schrieb Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
Hi Marcel, using extA instead of extB need synchronization between
- VM to use extA in SistaV1 interpreter (and jit)
- image side to use extA in compiled code generation (EncoderForSistaV1)

Being careful is maybe not necessary if we have all source code reachable.
I was thinking of the case when we have to decompile, that would mean
decompile with extB, recompile with extA.
One careful solution would be to
- restrict optimized Character value in range 0-16rFF
- recompile all
- switch to extA in VM and image side
- restore optimized Character value in range 0-16rFFFF
- recompile all

But mixing old extB VM and new extA image or vice versa would lead to bad
things (loose upper bits of literal character value).
Having a VM supporting both extB + extA would come at an extra cost (maybe
not that high, it could be a simple extB+extA bitAnd: 16rFF)
So I'd better ask for Eliot advice.
The main interest of extA is that it could be extended above 16rFFFF simply
by chaining several extended A (224) bytecodes.
This is not as easy with extB, we have to undo the signed transform taking
numExtB into account...

Alternatively, we could try to workaround at image side: test the VM at
image startup. Recompiling at startup is too heavy, better operate at a
lower level, that is detect and fix incompatible compiled method (find
which is using extB (225)+pushLiteralChar (233) and replace with
extA(224)+puchLiteralChar or vice versa).
Seems a bit overkill to me, and does not fix the case old image (without
workaround) + new VM.

However, all this might as well be a false problem. How many literal
characters > 16rFF do we have in trunk ?
Since we had poor support for extended (not to say unicode) fonts until
recently, probably not that many ! A $? in source code is not helpful !


Le mer. 9 mars 2022 à 13:16, Marcel Taeumel a
écrit :

>
> Hi Nicolas --
>
> Thanks for finding this workaround!
>
> Given some later point in time, how bad could it bite us back considering
> extA vs extB? Would one notice this very quickly? What does "careful
> recompilation of CompiledMethod at image side" mean here? What can go wrong?
>
> Best,
> Marcel
>
> Am 09.03.2022 13:05:56 schrieb commits at source.squeak.org <
> commits at source.squeak.org>:
>
> Nicolas Cellier uploaded a new version of VMMaker to project VM Maker
> Inbox:
> http://source.squeak.org/VMMakerInbox/VMMaker.oscog-nice.3174.mcz
>
> ==================== Summary ====================
>
> Name: VMMaker.oscog-nice.3174
> Author: nice
> Time: 9 March 2022, 1:04:51.911686 pm
> UUID: af4f8704-18fa-564c-ac59-27c6f6364c59
> Ancestors: VMMaker.oscog-eem.3173
>
> Workaround : literal characters should have unsigned values
>
> Currently in SistaV1, characters in range 16r100 16rFFFF are encoded as
> push literal character (bytecode 233) with signed extended B.
>
> This lead to negative character values in the range 16r8000 to 16rFFFF.
>
> The workaround consist in bit-oring extB with 16rFF so as to re-interpret
> it as unsigned char. BEWARE: this assumes that a single extB will be used
> rather than multiple consecutive extB. It works by now, because it's
> currently the case that image side only use 1 extB for encoding literal
> character - see EncoderForSistaV1>>#genPushCharacter: - above 16rFFFF, a
> literal index is consumed.
>
> An alternative would be to use extA rather than extB. But it would require
> a careful recompilation of CompiledMethod at image side.
>
> Similar fix is required at image side in
> InstructionStream>>#interpretNext2ByteSistaV1Instruction:for:extA:extB:startPC:
>
> =============== Diff against VMMaker.oscog-eem.3173 ===============
>
> Item was changed:
> ----- Method: SimpleStackBasedCogit>>genExtPushCharacterBytecode (in
> category 'bytecode generators') -----
> genExtPushCharacterBytecode
> "SistaV1: 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B *
> 256)"
> | value |
> + value := byte1 + ((extB bitAnd: 16rFF) << 8).
> - value := byte1 + (extB << 8).
> extB := 0.
> numExtB := 0.
> ^self genPushLiteral: (objectMemory characterObjectOf: value)!
>
> Item was changed:
> ----- Method: StackInterpreter>>extPushCharacterBytecode (in category
> 'stack bytecodes') -----
> extPushCharacterBytecode
> "SistaV1: * 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B *
> 256)"
> | value |
> + value := self fetchByte + ((extB bitAnd: 16rFF) << 8).
> - value := self fetchByte + (extB << 8).
> self fetchNextBytecode.
> self internalPush: (objectMemory characterObjectOf: value).
> numExtB := extB := 0!
>
>

Hi Marcel, using extA instead of extB need synchronization between
- VM to use extA in SistaV1 interpreter (and jit)

- image side to use extA in compiled code generation (EncoderForSistaV1)

Being careful is maybe not necessary if we have all source code reachable.
I was thinking of the case when we have to decompile, that would mean decompile with extB, recompile with extA.
One careful solution would be to
- restrict optimized Character value in range 0-16rFF
- recompile all
- switch to extA in VM and image side

- restore optimized Character value in range 0-16rFFFF
- recompile all

But mixing old extB VM and new extA image or vice versa would lead to bad things (loose upper bits of literal character value).
Having a VM supporting both extB + extA would come at an extra cost (maybe not that high, it could be a simple extB+extA bitAnd: 16rFF)
So I'd better ask for Eliot advice.
The main interest of extA is that it could be extended above 16rFFFF simply by chaining several extended A (224) bytecodes.
This is not as easy with extB, we have to undo the signed transform taking numExtB into account...

Alternatively, we could try to workaround at image side: test the VM at image startup. Recompiling at startup is too heavy, better operate at a lower level, that is detect and fix incompatible compiled method (find which is using extB (225)+pushLiteralChar (233) and replace with extA(224)+puchLiteralChar or vice versa).
Seems a bit overkill to me, and does not fix the case old image (without workaround) + new VM.

However, all this might as well be a false problem. How many literal characters > 16rFF do we have in trunk ?
Since we had poor support for extended (not to say unicode) fonts until recently, probably not that many ! A $? in source code is not helpful !



Le mer. 9 mars 2022 à 13:16, Marcel Taeumel <marcel.taeumel at hpi.de [mailto:marcel.taeumel at hpi.de]> a écrit :

 

Hi Nicolas --

Thanks for finding this workaround!

Given some later point in time, how bad could it bite us back considering extA vs extB? Would one notice this very quickly? What does "careful recompilation of CompiledMethod at image side" mean here? What can go wrong?

Best,
Marcel

Am 09.03.2022 13:05:56 schrieb commits at source.squeak.org [mailto:commits at source.squeak.org] <commits at source.squeak.org [mailto:commits at source.squeak.org]>:

Nicolas Cellier uploaded a new version of VMMaker to project VM Maker Inbox:
http://source.squeak.org/VMMakerInbox/VMMaker.oscog-nice.3174.mcz [http://source.squeak.org/VMMakerInbox/VMMaker.oscog-nice.3174.mcz]

==================== Summary ====================

Name: VMMaker.oscog-nice.3174
Author: nice
Time: 9 March 2022, 1:04:51.911686 pm
UUID: af4f8704-18fa-564c-ac59-27c6f6364c59
Ancestors: VMMaker.oscog-eem.3173

Workaround : literal characters should have unsigned values

Currently in SistaV1, characters in range 16r100 16rFFFF are encoded as push literal character (bytecode 233) with signed extended B.

This lead to negative character values in the range 16r8000 to 16rFFFF.

The workaround consist in bit-oring extB with 16rFF so as to re-interpret it as unsigned char. BEWARE: this assumes that a single extB will be used rather than multiple consecutive extB. It works by now, because it's currently the case that image side only use 1 extB for encoding literal character - see EncoderForSistaV1>>#genPushCharacter: - above 16rFFFF, a literal index is consumed.

An alternative would be to use extA rather than extB. But it would require a careful recompilation of CompiledMethod at image side.

Similar fix is required at image side in InstructionStream>>#interpretNext2ByteSistaV1Instruction:for:extA:extB:startPC:

=============== Diff against VMMaker.oscog-eem.3173 ===============

Item was changed:
----- Method: SimpleStackBasedCogit>>genExtPushCharacterBytecode (in category 'bytecode generators') -----
genExtPushCharacterBytecode
"SistaV1: 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B * 256)"
| value |
+ value := byte1 + ((extB bitAnd: 16rFF) << 8).
- value := byte1 + (extB << 8).
extB := 0.
numExtB := 0.
^self genPushLiteral: (objectMemory characterObjectOf: value)!

Item was changed:
----- Method: StackInterpreter>>extPushCharacterBytecode (in category 'stack bytecodes') -----
extPushCharacterBytecode
"SistaV1: * 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B * 256)"
| value |
+ value := self fetchByte + ((extB bitAnd: 16rFF) << 8).
- value := self fetchByte + (extB << 8).
self fetchNextBytecode.
self internalPush: (objectMemory characterObjectOf: value).
numExtB := extB := 0!



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20220310/39b8b99d/attachment-0001.html>


More information about the Vm-dev mailing list