Hi Marcel, using extA instead of extB need synchronization between
- VM to use extA in SistaV1 interpreter (and jit)
- image side to use extA in compiled code generation (EncoderForSistaV1)

Being careful is maybe not necessary if we have all source code reachable.
I was thinking of the case when we have to decompile, that would mean decompile with extB, recompile with extA.
One careful solution would be to
- restrict optimized Character value in range 0-16rFF
- recompile all
- switch to extA in VM and image side
- restore optimized Character value in range 0-16rFFFF
- recompile all

But mixing old extB VM and new extA image or vice versa would lead to bad things (loose upper bits of literal character value).
Having a VM supporting both extB + extA would come at an extra cost (maybe not that high, it could be a simple extB+extA bitAnd: 16rFF)
So I'd better ask for Eliot advice.
The main interest of extA is that it could be extended above 16rFFFF simply by chaining several extended A (224) bytecodes.
This is not as easy with extB, we have to undo the signed transform taking numExtB into account...

Alternatively, we could try to workaround at image side: test the VM at image startup. Recompiling at startup is too heavy, better operate at a lower level, that is detect and fix incompatible compiled method (find which is using extB (225)+pushLiteralChar (233) and replace with extA(224)+puchLiteralChar or vice versa).
Seems a bit overkill to me, and does not fix the case old image (without workaround) + new VM.

However, all this might as well be a false problem. How many literal characters > 16rFF do we have in trunk ?
Since we had poor support for extended (not to say unicode) fonts until recently, probably not that many ! A $? in source code is not helpful !


Le mer. 9 mars 2022 à 13:16, Marcel Taeumel <marcel.taeumel@hpi.de> a écrit :
 
Hi Nicolas --

Thanks for finding this workaround!

Given some later point in time, how bad could it bite us back considering extA vs extB? Would one notice this very quickly? What does "careful recompilation of CompiledMethod at image side" mean here? What can go wrong?

Best,
Marcel

Am 09.03.2022 13:05:56 schrieb commits@source.squeak.org <commits@source.squeak.org>:


Nicolas Cellier uploaded a new version of VMMaker to project VM Maker Inbox:
http://source.squeak.org/VMMakerInbox/VMMaker.oscog-nice.3174.mcz

==================== Summary ====================

Name: VMMaker.oscog-nice.3174
Author: nice
Time: 9 March 2022, 1:04:51.911686 pm
UUID: af4f8704-18fa-564c-ac59-27c6f6364c59
Ancestors: VMMaker.oscog-eem.3173

Workaround : literal characters should have unsigned values

Currently in SistaV1, characters in range 16r100 16rFFFF are encoded as push literal character (bytecode 233) with signed extended B.

This lead to negative character values in the range 16r8000 to 16rFFFF.

The workaround consist in bit-oring extB with 16rFF so as to re-interpret it as unsigned char. BEWARE: this assumes that a single extB will be used rather than multiple consecutive extB. It works by now, because it's currently the case that image side only use 1 extB for encoding literal character - see EncoderForSistaV1>>#genPushCharacter: - above 16rFFFF, a literal index is consumed.

An alternative would be to use extA rather than extB. But it would require a careful recompilation of CompiledMethod at image side.

Similar fix is required at image side in InstructionStream>>#interpretNext2ByteSistaV1Instruction:for:extA:extB:startPC:

=============== Diff against VMMaker.oscog-eem.3173 ===============

Item was changed:
----- Method: SimpleStackBasedCogit>>genExtPushCharacterBytecode (in category 'bytecode generators') -----
genExtPushCharacterBytecode
"SistaV1: 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B * 256)"
| value |
+ value := byte1 + ((extB bitAnd: 16rFF) << 8).
- value := byte1 + (extB << 8).
extB := 0.
numExtB := 0.
^self genPushLiteral: (objectMemory characterObjectOf: value)!

Item was changed:
----- Method: StackInterpreter>>extPushCharacterBytecode (in category 'stack bytecodes') -----
extPushCharacterBytecode
"SistaV1: * 233 11101001 iiiiiiii Push Character #iiiiiiii (+ Extend B * 256)"
| value |
+ value := self fetchByte + ((extB bitAnd: 16rFF) << 8).
- value := self fetchByte + (extB << 8).
self fetchNextBytecode.
self internalPush: (objectMemory characterObjectOf: value).
numExtB := extB := 0!