[Vm-dev] Sign-bit bug in character literals > 16r7FFF ... related to SistaV1?

Clément Béra bera.clement at gmail.com
Wed Mar 9 19:23:33 UTC 2022


Seeing this, I believe that bit was used for something else in sista and we
agreed with Eliot 32k literals was enough? I cannot remember.
I think the bit meant Cogit should not generate profiling counter for the
method or something like that.

On Wed, Mar 9, 2022 at 3:41 PM Nicolas Cellier <
nicolas.cellier.aka.nice at gmail.com> wrote:

>
> Or just restrict EncoderForSistaV1>>#genPushCharacter:
>
> ...snip...
>      (code < 0 or: [code > 16r7FFF]) ifTrue:
>          [^self outOfRangeError: 'character' index: code range: 0 to:
> 16r7FFF].
> ...snip...
>
> Le mer. 9 mars 2022 à 14:16, Marcel Taeumel <marcel.taeumel at hpi.de> a
> écrit :
>
>>
>> Hi Nicolas --
>>
>> Thanks! Also for the proposed workaround in VMMaker.oscog-nice.3174.
>>
>> For what it's worth, one can always replace the character-literal syntax
>> with string access:
>>
>> $x.
>> 'x' first.
>>
>> Or store the code point if the optical appearance is not relevant:
>>
>> Character value: 16r78.
>>
>> Best,
>> Marcel
>>
>> Am 09.03.2022 10:02:46 schrieb Nicolas Cellier <
>> nicolas.cellier.aka.nice at gmail.com>:
>> Hi Marcel,
>> yes, I agree, the bug is in bytecode encoding/decoding of immediate
>> Character value,
>> I stepped into (Compiler evaluate: (String with: $$ with: (Character
>> value:
>> 16r8000))), and if we step into executeMethod, we can inspect what is
>> going
>> on.
>>
>>
>> Le mer. 9 mars 2022 à 08:39, Marcel Taeumel a
>> écrit :
>>
>> >
>> > Hi Nicolas --
>> >
>> > There is a bug in the EncoderForSistaV1. The behavior is okay for
>> > EncoderForV3PlusClosures. We can discuss this on squeak-dev now, I
>> suppose.
>> >
>> > CompiledCode preferredBytecodeSetEncoderClass: EncoderForSistaV1.
>> > CompiledCode preferredBytecodeSetEncoderClass:
>> EncoderForV3PlusClosures.
>> >
>> > If you do send #halt instead of #asInteger, you get another interesting
>> > debugger when trying to start debugging:
>> >
>> >
>> >
>> > Best,
>> > Marcel
>> >
>> > Am 09.03.2022 08:34:11 schrieb Nicolas Cellier <
>> > nicolas.cellier.aka.nice at gmail.com>:
>> > Ah OK, I see it on macos too
>> > It remains to determine which operation exactly is involved...
>> > The TextMorph holding the printed result is correct - a WideString,
>> whose
>> > last Character is (Character value: 32768).
>> >
>> > Le mer. 9 mars 2022 à 08:08, Marcel Taeumel a
>> > écrit :
>> >
>> > >
>> > > Hi Dave, hi Nicolas --
>> > >
>> > > I am working in Windows 10.
>> > >
>> > > > I cannot reproduce on Linux 64 bit either:
>> > > > (Character value: 16r8000) asInteger hex ==> '16r8000'
>> > >
>> > > That's not how you would reproduce it. The bug affects character
>> > literals,
>> > > not character objects/instances. You have to evaluate code on that
>> > > character literal.
>> > >
>> > > Maybe this picture helps:
>> > >
>> > >
>> > >
>> > > Best,
>> > > Marcel
>> > >
>> > > Am 08.03.2022 18:56:09 schrieb David T. Lewis :
>> > >
>> > > I cannot reproduce on Linux 64 bit either:
>> > >
>> > > (Character value: 16r8000) asInteger hex ==> '16r8000'
>> > >
>> > > Dave
>> > >
>> > >
>> > > On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote:
>> > > >
>> > > > Hi Marcel,
>> > > > which OS ?
>> > > > I cannot reproduce on macos 64,
>> > > >
>> > > > Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172]
>> > > > 5.20211023.2003
>> > > > Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1
>> Compatible
>> > > Apple
>> > > > LLVM 10.0.1 (clang-1001.0.46.4)
>> > > > platform sources revision VM: 202110232003
>> > > >
>> > > > Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a
>> > > > ??crit :
>> > > >
>> > > > >
>> > > > > Hi Eliot, hi all --
>> > > > >
>> > > > > I think we have an sign-bit bug for character literals with code
>> > > points >
>> > > > > 16r7FFF.
>> > > > >
>> > > > > Steps to reproduce:
>> > > > >
>> > > > > 1. Print it: "Character value: 16r8000"
>> > > > > 2. Inspect the result by evaluating the character literal or send
>> > > > > #asInteger to it. It will most likely not render in a standard
>> > Squeak
>> > > and
>> > > > > show up like "$? asInteger".
>> > > > >
>> > > > > In a 32-bit VM, I will get the (positive) integer value
>> 16r3FFF8000.
>> > > > > In a 64-bit VM, I will get the (negative) integer value
>> '-16r8000'.
>> > > > >
>> > > > > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1.
>> In
>> > > 64-bit,
>> > > > > this means a negative number. Not sure about bits 30 and 31 here.
>> > > > >
>> > > > > Is there a bug in the upper tag bits of immediate characters?
>> > > > > Is this related to the 2-byte or 3-byte byte codes in SistaV1?
>> > > > >
>> > > > > Works fine up to 16r7FFF. (This is unrelated to #leadingChar.
>> Mine
>> > was
>> > > 0
>> > > > > in this experiment.)
>> > > > >
>> > > > > VM: 202112201228 (VMMaker.oscog-eem.3116)
>> > > > >
>> > > > > Best,
>> > > > > Marcel
>> > > > >
>> > >
>> > >
>> > Ah OK, I see it on macos too
>> > It remains to determine which operation exactly is involved...
>> > The TextMorph holding the printed result is correct - a WideString,
>> whose
>> > last Character is (Character value: 32768).
>> >
>> > Le mer. 9 mars 2022 à 08:08, Marcel Taeumel a
>> > écrit :
>> >
>> >>
>> >>
>> >> Hi Dave, hi Nicolas --
>> >>
>> >> I am working in Windows 10.
>> >>
>> >> > I cannot reproduce on Linux 64 bit either:
>> >> > (Character value: 16r8000) asInteger hex ==> '16r8000'
>> >>
>> >> That's not how you would reproduce it. The bug affects character
>> >> literals, not character objects/instances. You have to evaluate code
>> on
>> >> that character literal.
>> >>
>> >> Maybe this picture helps:
>> >>
>> >>
>> >>
>> >> Best,
>> >> Marcel
>> >>
>> >>
>> >> Am 08.03.2022 18:56:09 schrieb David T. Lewis :
>> >>
>> >> I cannot reproduce on Linux 64 bit either:
>> >>
>> >> (Character value: 16r8000) asInteger hex ==> '16r8000'
>> >>
>> >> Dave
>> >>
>> >>
>> >> On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote:
>> >> >
>> >> > Hi Marcel,
>> >> > which OS ?
>> >> > I cannot reproduce on macos 64,
>> >> >
>> >> > Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172]
>> >> > 5.20211023.2003
>> >> > Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1 Compatible
>> >> Apple
>> >> > LLVM 10.0.1 (clang-1001.0.46.4)
>> >> > platform sources revision VM: 202110232003
>> >> >
>> >> > Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a
>> >> > ??crit :
>> >> >
>> >> > >
>> >> > > Hi Eliot, hi all --
>> >> > >
>> >> > > I think we have an sign-bit bug for character literals with code
>> >> points >
>> >> > > 16r7FFF.
>> >> > >
>> >> > > Steps to reproduce:
>> >> > >
>> >> > > 1. Print it: "Character value: 16r8000"
>> >> > > 2. Inspect the result by evaluating the character literal or send
>> >> > > #asInteger to it. It will most likely not render in a standard
>> Squeak
>> >> and
>> >> > > show up like "$? asInteger".
>> >> > >
>> >> > > In a 32-bit VM, I will get the (positive) integer value
>> 16r3FFF8000.
>> >> > > In a 64-bit VM, I will get the (negative) integer value
>> '-16r8000'.
>> >> > >
>> >> > > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In
>> >> 64-bit,
>> >> > > this means a negative number. Not sure about bits 30 and 31 here.
>> >> > >
>> >> > > Is there a bug in the upper tag bits of immediate characters?
>> >> > > Is this related to the 2-byte or 3-byte byte codes in SistaV1?
>> >> > >
>> >> > > Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine
>> >> was 0
>> >> > > in this experiment.)
>> >> > >
>> >> > > VM: 202112201228 (VMMaker.oscog-eem.3116)
>> >> > >
>> >> > > Best,
>> >> > > Marcel
>> >> > >
>> >>
>> >>
>> >
>> Hi Marcel,
>> yes, I agree, the bug is in bytecode encoding/decoding of immediate
>> Character value,
>> I stepped into (Compiler evaluate: (String with: $$ with: (Character
>> value: 16r8000))), and if we step into executeMethod, we can inspect what
>> is going on.
>>
>>
>> Le mer. 9 mars 2022 à 08:39, Marcel Taeumel <marcel.taeumel at hpi.de> a
>> écrit :
>>
>>>
>>>
>>> Hi Nicolas --
>>>
>>>
>>> There is a bug in the EncoderForSistaV1. The behavior is okay for
>>> EncoderForV3PlusClosures. We can discuss this on squeak-dev now, I suppose.
>>>
>>> CompiledCode preferredBytecodeSetEncoderClass: EncoderForSistaV1.
>>> CompiledCode preferredBytecodeSetEncoderClass: EncoderForV3PlusClosures.
>>>
>>> If you do send #halt instead of #asInteger, you get another interesting
>>> debugger when trying to start debugging:
>>>
>>>
>>>
>>> Best,
>>> Marcel
>>>
>>>
>>> Am 09.03.2022 08:34:11 schrieb Nicolas Cellier <
>>> nicolas.cellier.aka.nice at gmail.com>:
>>> Ah OK, I see it on macos too
>>>
>>> It remains to determine which operation exactly is involved...
>>>
>>> The TextMorph holding the printed result is correct - a WideString,
>>> whose
>>>
>>> last Character is (Character value: 32768).
>>>
>>>
>>>
>>> Le mer. 9 mars 2022 à 08:08, Marcel Taeumel a
>>>
>>> écrit :
>>>
>>>
>>>
>>> >
>>>
>>> > Hi Dave, hi Nicolas --
>>>
>>> >
>>>
>>> > I am working in Windows 10.
>>>
>>> >
>>>
>>> > > I cannot reproduce on Linux 64 bit either:
>>>
>>> > > (Character value: 16r8000) asInteger hex ==> '16r8000'
>>>
>>> >
>>>
>>> > That's not how you would reproduce it. The bug affects character
>>> literals,
>>>
>>> > not character objects/instances. You have to evaluate code on that
>>>
>>> > character literal.
>>>
>>> >
>>>
>>> > Maybe this picture helps:
>>>
>>> >
>>>
>>> >
>>>
>>> >
>>>
>>> > Best,
>>>
>>> > Marcel
>>>
>>> >
>>>
>>> > Am 08.03.2022 18:56:09 schrieb David T. Lewis :
>>>
>>> >
>>>
>>> > I cannot reproduce on Linux 64 bit either:
>>>
>>> >
>>>
>>> > (Character value: 16r8000) asInteger hex ==> '16r8000'
>>>
>>> >
>>>
>>> > Dave
>>>
>>> >
>>>
>>> >
>>>
>>> > On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote:
>>>
>>> > >
>>>
>>> > > Hi Marcel,
>>>
>>> > > which OS ?
>>>
>>> > > I cannot reproduce on macos 64,
>>>
>>> > >
>>>
>>> > > Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172]
>>>
>>> > > 5.20211023.2003
>>>
>>> > > Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1 Compatible
>>>
>>> > Apple
>>>
>>> > > LLVM 10.0.1 (clang-1001.0.46.4)
>>>
>>> > > platform sources revision VM: 202110232003
>>>
>>> > >
>>>
>>> > > Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a
>>>
>>> > > ??crit :
>>>
>>> > >
>>>
>>> > > >
>>>
>>> > > > Hi Eliot, hi all --
>>>
>>> > > >
>>>
>>> > > > I think we have an sign-bit bug for character literals with code
>>>
>>> > points >
>>>
>>> > > > 16r7FFF.
>>>
>>> > > >
>>>
>>> > > > Steps to reproduce:
>>>
>>> > > >
>>>
>>> > > > 1. Print it: "Character value: 16r8000"
>>>
>>> > > > 2. Inspect the result by evaluating the character literal or send
>>>
>>> > > > #asInteger to it. It will most likely not render in a standard
>>> Squeak
>>>
>>> > and
>>>
>>> > > > show up like "$? asInteger".
>>>
>>> > > >
>>>
>>> > > > In a 32-bit VM, I will get the (positive) integer value
>>> 16r3FFF8000.
>>>
>>> > > > In a 64-bit VM, I will get the (negative) integer value
>>> '-16r8000'.
>>>
>>> > > >
>>>
>>> > > > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In
>>>
>>> > 64-bit,
>>>
>>> > > > this means a negative number. Not sure about bits 30 and 31 here.
>>>
>>> > > >
>>>
>>> > > > Is there a bug in the upper tag bits of immediate characters?
>>>
>>> > > > Is this related to the 2-byte or 3-byte byte codes in SistaV1?
>>>
>>> > > >
>>>
>>> > > > Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine
>>> was
>>>
>>> > 0
>>>
>>> > > > in this experiment.)
>>>
>>> > > >
>>>
>>> > > > VM: 202112201228 (VMMaker.oscog-eem.3116)
>>>
>>> > > >
>>>
>>> > > > Best,
>>>
>>> > > > Marcel
>>>
>>> > > >
>>>
>>> >
>>>
>>> >
>>>
>>> Ah OK, I see it on macos too
>>> It remains to determine which operation exactly is involved...
>>> The TextMorph holding the printed result is correct - a WideString,
>>> whose last Character is (Character value: 32768).
>>>
>>> Le mer. 9 mars 2022 à 08:08, Marcel Taeumel <marcel.taeumel at hpi.de> a
>>> écrit :
>>>
>>>>
>>>>
>>>>
>>>> Hi Dave, hi Nicolas --
>>>>
>>>> I am working in Windows 10.
>>>>
>>>> > I cannot reproduce on Linux 64 bit either:
>>>> > (Character value: 16r8000) asInteger hex ==> '16r8000'
>>>>
>>>> That's not how you would reproduce it. The bug affects character
>>>> literals, not character objects/instances. You have to evaluate code on
>>>> that character literal.
>>>>
>>>> Maybe this picture helps:
>>>>
>>>>
>>>>
>>>> Best,
>>>> Marcel
>>>>
>>>>
>>>>
>>>>
>>>> Am 08.03.2022 18:56:09 schrieb David T. Lewis <lewis at mail.msen.com>:
>>>>
>>>> I cannot reproduce on Linux 64 bit either:
>>>>
>>>> (Character value: 16r8000) asInteger hex ==> '16r8000'
>>>>
>>>> Dave
>>>>
>>>>
>>>> On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote:
>>>> >
>>>> > Hi Marcel,
>>>> > which OS ?
>>>> > I cannot reproduce on macos 64,
>>>> >
>>>> > Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172]
>>>> > 5.20211023.2003
>>>> > Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1 Compatible
>>>> Apple
>>>> > LLVM 10.0.1 (clang-1001.0.46.4)
>>>> > platform sources revision VM: 202110232003
>>>> >
>>>> > Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a
>>>> > ??crit :
>>>> >
>>>> > >
>>>> > > Hi Eliot, hi all --
>>>> > >
>>>> > > I think we have an sign-bit bug for character literals with code
>>>> points >
>>>> > > 16r7FFF.
>>>> > >
>>>> > > Steps to reproduce:
>>>> > >
>>>> > > 1. Print it: "Character value: 16r8000"
>>>> > > 2. Inspect the result by evaluating the character literal or send
>>>> > > #asInteger to it. It will most likely not render in a standard
>>>> Squeak and
>>>> > > show up like "$? asInteger".
>>>> > >
>>>> > > In a 32-bit VM, I will get the (positive) integer value 16r3FFF8000.
>>>> > > In a 64-bit VM, I will get the (negative) integer value '-16r8000'.
>>>> > >
>>>> > > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In
>>>> 64-bit,
>>>> > > this means a negative number. Not sure about bits 30 and 31 here.
>>>> > >
>>>> > > Is there a bug in the upper tag bits of immediate characters?
>>>> > > Is this related to the 2-byte or 3-byte byte codes in SistaV1?
>>>> > >
>>>> > > Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine
>>>> was 0
>>>> > > in this experiment.)
>>>> > >
>>>> > > VM: 202112201228 (VMMaker.oscog-eem.3116)
>>>> > >
>>>> > > Best,
>>>> > > Marcel
>>>> > >
>>>>
>>>>
>>>
>>>
>>

-- 
Clément Béra
https://clementbera.github.io/
https://clementbera.wordpress.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20220309/9b099c67/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 14513 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20220309/9b099c67/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 90427 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20220309/9b099c67/attachment-0003.png>


More information about the Vm-dev mailing list