[Vm-dev] Sign-bit bug in character literals > 16r7FFF ... related to SistaV1?

Marcel Taeumel marcel.taeumel at hpi.de
Wed Mar 9 13:16:22 UTC 2022


Hi Nicolas --

Thanks! Also for the proposed workaround in VMMaker.oscog-nice.3174.

For what it's worth, one can always replace the character-literal syntax with string access:

$x.
'x' first.

Or store the code point if the optical appearance is not relevant:

Character value: 16r78. 

Best,
Marcel
Am 09.03.2022 10:02:46 schrieb Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
Hi Marcel,
yes, I agree, the bug is in bytecode encoding/decoding of immediate
Character value,
I stepped into (Compiler evaluate: (String with: $$ with: (Character value:
16r8000))), and if we step into executeMethod, we can inspect what is going
on.


Le mer. 9 mars 2022 à 08:39, Marcel Taeumel a
écrit :

>
> Hi Nicolas --
>
> There is a bug in the EncoderForSistaV1. The behavior is okay for
> EncoderForV3PlusClosures. We can discuss this on squeak-dev now, I suppose.
>
> CompiledCode preferredBytecodeSetEncoderClass: EncoderForSistaV1.
> CompiledCode preferredBytecodeSetEncoderClass: EncoderForV3PlusClosures.
>
> If you do send #halt instead of #asInteger, you get another interesting
> debugger when trying to start debugging:
>
>
>
> Best,
> Marcel
>
> Am 09.03.2022 08:34:11 schrieb Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com>:
> Ah OK, I see it on macos too
> It remains to determine which operation exactly is involved...
> The TextMorph holding the printed result is correct - a WideString, whose
> last Character is (Character value: 32768).
>
> Le mer. 9 mars 2022 à 08:08, Marcel Taeumel a
> écrit :
>
> >
> > Hi Dave, hi Nicolas --
> >
> > I am working in Windows 10.
> >
> > > I cannot reproduce on Linux 64 bit either:
> > > (Character value: 16r8000) asInteger hex ==> '16r8000'
> >
> > That's not how you would reproduce it. The bug affects character
> literals,
> > not character objects/instances. You have to evaluate code on that
> > character literal.
> >
> > Maybe this picture helps:
> >
> >
> >
> > Best,
> > Marcel
> >
> > Am 08.03.2022 18:56:09 schrieb David T. Lewis :
> >
> > I cannot reproduce on Linux 64 bit either:
> >
> > (Character value: 16r8000) asInteger hex ==> '16r8000'
> >
> > Dave
> >
> >
> > On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote:
> > >
> > > Hi Marcel,
> > > which OS ?
> > > I cannot reproduce on macos 64,
> > >
> > > Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172]
> > > 5.20211023.2003
> > > Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1 Compatible
> > Apple
> > > LLVM 10.0.1 (clang-1001.0.46.4)
> > > platform sources revision VM: 202110232003
> > >
> > > Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a
> > > ??crit :
> > >
> > > >
> > > > Hi Eliot, hi all --
> > > >
> > > > I think we have an sign-bit bug for character literals with code
> > points >
> > > > 16r7FFF.
> > > >
> > > > Steps to reproduce:
> > > >
> > > > 1. Print it: "Character value: 16r8000"
> > > > 2. Inspect the result by evaluating the character literal or send
> > > > #asInteger to it. It will most likely not render in a standard
> Squeak
> > and
> > > > show up like "$? asInteger".
> > > >
> > > > In a 32-bit VM, I will get the (positive) integer value 16r3FFF8000.
> > > > In a 64-bit VM, I will get the (negative) integer value '-16r8000'.
> > > >
> > > > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In
> > 64-bit,
> > > > this means a negative number. Not sure about bits 30 and 31 here.
> > > >
> > > > Is there a bug in the upper tag bits of immediate characters?
> > > > Is this related to the 2-byte or 3-byte byte codes in SistaV1?
> > > >
> > > > Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine
> was
> > 0
> > > > in this experiment.)
> > > >
> > > > VM: 202112201228 (VMMaker.oscog-eem.3116)
> > > >
> > > > Best,
> > > > Marcel
> > > >
> >
> >
> Ah OK, I see it on macos too
> It remains to determine which operation exactly is involved...
> The TextMorph holding the printed result is correct - a WideString, whose
> last Character is (Character value: 32768).
>
> Le mer. 9 mars 2022 à 08:08, Marcel Taeumel a
> écrit :
>
>>
>>
>> Hi Dave, hi Nicolas --
>>
>> I am working in Windows 10.
>>
>> > I cannot reproduce on Linux 64 bit either:
>> > (Character value: 16r8000) asInteger hex ==> '16r8000'
>>
>> That's not how you would reproduce it. The bug affects character
>> literals, not character objects/instances. You have to evaluate code on
>> that character literal.
>>
>> Maybe this picture helps:
>>
>>
>>
>> Best,
>> Marcel
>>
>>
>> Am 08.03.2022 18:56:09 schrieb David T. Lewis :
>>
>> I cannot reproduce on Linux 64 bit either:
>>
>> (Character value: 16r8000) asInteger hex ==> '16r8000'
>>
>> Dave
>>
>>
>> On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote:
>> >
>> > Hi Marcel,
>> > which OS ?
>> > I cannot reproduce on macos 64,
>> >
>> > Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172]
>> > 5.20211023.2003
>> > Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1 Compatible
>> Apple
>> > LLVM 10.0.1 (clang-1001.0.46.4)
>> > platform sources revision VM: 202110232003
>> >
>> > Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a
>> > ??crit :
>> >
>> > >
>> > > Hi Eliot, hi all --
>> > >
>> > > I think we have an sign-bit bug for character literals with code
>> points >
>> > > 16r7FFF.
>> > >
>> > > Steps to reproduce:
>> > >
>> > > 1. Print it: "Character value: 16r8000"
>> > > 2. Inspect the result by evaluating the character literal or send
>> > > #asInteger to it. It will most likely not render in a standard Squeak
>> and
>> > > show up like "$? asInteger".
>> > >
>> > > In a 32-bit VM, I will get the (positive) integer value 16r3FFF8000.
>> > > In a 64-bit VM, I will get the (negative) integer value '-16r8000'.
>> > >
>> > > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In
>> 64-bit,
>> > > this means a negative number. Not sure about bits 30 and 31 here.
>> > >
>> > > Is there a bug in the upper tag bits of immediate characters?
>> > > Is this related to the 2-byte or 3-byte byte codes in SistaV1?
>> > >
>> > > Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine
>> was 0
>> > > in this experiment.)
>> > >
>> > > VM: 202112201228 (VMMaker.oscog-eem.3116)
>> > >
>> > > Best,
>> > > Marcel
>> > >
>>
>>
>

Hi Marcel,
yes, I agree, the bug is in bytecode encoding/decoding of immediate Character value,
I stepped into (Compiler evaluate: (String with: $$ with: (Character value: 16r8000))), and if we step into executeMethod, we can inspect what is going on.


Le mer. 9 mars 2022 à 08:39, Marcel Taeumel <marcel.taeumel at hpi.de [mailto:marcel.taeumel at hpi.de]> a écrit :

 

Hi Nicolas --


There is a bug in the EncoderForSistaV1. The behavior is okay for EncoderForV3PlusClosures. We can discuss this on squeak-dev now, I suppose.

CompiledCode preferredBytecodeSetEncoderClass: EncoderForSistaV1.

CompiledCode preferredBytecodeSetEncoderClass: EncoderForV3PlusClosures.


If you do send #halt instead of #asInteger, you get another interesting debugger when trying to start debugging:



Best,
Marcel

Am 09.03.2022 08:34:11 schrieb Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com [mailto:nicolas.cellier.aka.nice at gmail.com]>:
Ah OK, I see it on macos too

It remains to determine which operation exactly is involved...

The TextMorph holding the printed result is correct - a WideString, whose

last Character is (Character value: 32768).



Le mer. 9 mars 2022 à 08:08, Marcel Taeumel a

écrit :



>

> Hi Dave, hi Nicolas --

>

> I am working in Windows 10.

>

> > I cannot reproduce on Linux 64 bit either:

> > (Character value: 16r8000) asInteger hex ==> '16r8000'

>

> That's not how you would reproduce it. The bug affects character literals,

> not character objects/instances. You have to evaluate code on that

> character literal.

>

> Maybe this picture helps:

>

>

>

> Best,

> Marcel

>

> Am 08.03.2022 18:56:09 schrieb David T. Lewis :

>

> I cannot reproduce on Linux 64 bit either:

>

> (Character value: 16r8000) asInteger hex ==> '16r8000'

>

> Dave

>

>

> On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote:

> >

> > Hi Marcel,

> > which OS ?

> > I cannot reproduce on macos 64,

> >

> > Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172]

> > 5.20211023.2003

> > Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1 Compatible

> Apple

> > LLVM 10.0.1 (clang-1001.0.46.4)

> > platform sources revision VM: 202110232003

> >

> > Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a

> > ??crit :

> >

> > >

> > > Hi Eliot, hi all --

> > >

> > > I think we have an sign-bit bug for character literals with code

> points >

> > > 16r7FFF.

> > >

> > > Steps to reproduce:

> > >

> > > 1. Print it: "Character value: 16r8000"

> > > 2. Inspect the result by evaluating the character literal or send

> > > #asInteger to it. It will most likely not render in a standard Squeak

> and

> > > show up like "$? asInteger".

> > >

> > > In a 32-bit VM, I will get the (positive) integer value 16r3FFF8000.

> > > In a 64-bit VM, I will get the (negative) integer value '-16r8000'.

> > >

> > > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In

> 64-bit,

> > > this means a negative number. Not sure about bits 30 and 31 here.

> > >

> > > Is there a bug in the upper tag bits of immediate characters?

> > > Is this related to the 2-byte or 3-byte byte codes in SistaV1?

> > >

> > > Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine was

> 0

> > > in this experiment.)

> > >

> > > VM: 202112201228 (VMMaker.oscog-eem.3116)

> > >

> > > Best,

> > > Marcel

> > >

>

>


Ah OK, I see it on macos too

It remains to determine which operation exactly is involved...
The TextMorph holding the printed result is correct - a WideString, whose last Character is (Character value: 32768).


Le mer. 9 mars 2022 à 08:08, Marcel Taeumel <marcel.taeumel at hpi.de [mailto:marcel.taeumel at hpi.de]> a écrit :

 


Hi Dave, hi Nicolas --

I am working in Windows 10.

> I cannot reproduce on Linux 64 bit either:
> (Character value: 16r8000) asInteger hex ==> '16r8000'

That's not how you would reproduce it. The bug affects character literals, not character objects/instances. You have to evaluate code on that character literal.

Maybe this picture helps:



Best,
Marcel




Am 08.03.2022 18:56:09 schrieb David T. Lewis <lewis at mail.msen.com [mailto:lewis at mail.msen.com]>:

I cannot reproduce on Linux 64 bit either:

(Character value: 16r8000) asInteger hex ==> '16r8000'

Dave


On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote:
>
> Hi Marcel,
> which OS ?
> I cannot reproduce on macos 64,
>
> Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172]
> 5.20211023.2003
> Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1 Compatible Apple
> LLVM 10.0.1 (clang-1001.0.46.4)
> platform sources revision VM: 202110232003
>
> Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a
> ??crit :
>
> >
> > Hi Eliot, hi all --
> >
> > I think we have an sign-bit bug for character literals with code points >
> > 16r7FFF.
> >
> > Steps to reproduce:
> >
> > 1. Print it: "Character value: 16r8000"
> > 2. Inspect the result by evaluating the character literal or send
> > #asInteger to it. It will most likely not render in a standard Squeak and
> > show up like "$? asInteger".
> >
> > In a 32-bit VM, I will get the (positive) integer value 16r3FFF8000.
> > In a 64-bit VM, I will get the (negative) integer value '-16r8000'.
> >
> > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In 64-bit,
> > this means a negative number. Not sure about bits 30 and 31 here.
> >
> > Is there a bug in the upper tag bits of immediate characters?
> > Is this related to the 2-byte or 3-byte byte codes in SistaV1?
> >
> > Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine was 0
> > in this experiment.)
> >
> > VM: 202112201228 (VMMaker.oscog-eem.3116)
> >
> > Best,
> > Marcel
> >




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20220309/24af8ca3/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 14513 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20220309/24af8ca3/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 90427 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20220309/24af8ca3/attachment-0003.png>


More information about the Vm-dev mailing list