[Vm-dev] Sign-bit bug in character literals > 16r7FFF ... related to SistaV1?
Eliot Miranda
eliot.miranda at gmail.com
Thu Mar 10 15:59:19 UTC 2022
> On Mar 9, 2022, at 11:24 AM, Clément Béra <bera.clement at gmail.com> wrote:
>
>
> Seeing this, I believe that bit was used for something else in sista and we agreed with Eliot 32k literals was enough? I cannot remember.
> I think the bit meant Cogit should not generate profiling counter for the method or something like that.
Exactly
>
>> On Wed, Mar 9, 2022 at 3:41 PM Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com> wrote:
>>
>> Or just restrict EncoderForSistaV1>>#genPushCharacter:
>>
>> ...snip...
>> (code < 0 or: [code > 16r7FFF]) ifTrue:
>> [^self outOfRangeError: 'character' index: code range: 0 to: 16r7FFF].
>> ...snip...
>>
>>> Le mer. 9 mars 2022 à 14:16, Marcel Taeumel <marcel.taeumel at hpi.de> a écrit :
>>>
>>> Hi Nicolas --
>>>
>>> Thanks! Also for the proposed workaround in VMMaker.oscog-nice.3174.
>>>
>>> For what it's worth, one can always replace the character-literal syntax with string access:
>>>
>>> $x.
>>> 'x' first.
>>>
>>> Or store the code point if the optical appearance is not relevant:
>>>
>>> Character value: 16r78.
>>>
>>> Best,
>>> Marcel
>>>> Am 09.03.2022 10:02:46 schrieb Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
>>>>
>>>> Hi Marcel,
>>>> yes, I agree, the bug is in bytecode encoding/decoding of immediate
>>>> Character value,
>>>> I stepped into (Compiler evaluate: (String with: $$ with: (Character value:
>>>> 16r8000))), and if we step into executeMethod, we can inspect what is going
>>>> on.
>>>>
>>>>
>>>> Le mer. 9 mars 2022 à 08:39, Marcel Taeumel a
>>>> écrit :
>>>>
>>>> >
>>>> > Hi Nicolas --
>>>> >
>>>> > There is a bug in the EncoderForSistaV1. The behavior is okay for
>>>> > EncoderForV3PlusClosures. We can discuss this on squeak-dev now, I suppose.
>>>> >
>>>> > CompiledCode preferredBytecodeSetEncoderClass: EncoderForSistaV1.
>>>> > CompiledCode preferredBytecodeSetEncoderClass: EncoderForV3PlusClosures.
>>>> >
>>>> > If you do send #halt instead of #asInteger, you get another interesting
>>>> > debugger when trying to start debugging:
>>>> >
>>>> >
>>>> >
>>>> > Best,
>>>> > Marcel
>>>> >
>>>> > Am 09.03.2022 08:34:11 schrieb Nicolas Cellier <
>>>> > nicolas.cellier.aka.nice at gmail.com>:
>>>> > Ah OK, I see it on macos too
>>>> > It remains to determine which operation exactly is involved...
>>>> > The TextMorph holding the printed result is correct - a WideString, whose
>>>> > last Character is (Character value: 32768).
>>>> >
>>>> > Le mer. 9 mars 2022 à 08:08, Marcel Taeumel a
>>>> > écrit :
>>>> >
>>>> > >
>>>> > > Hi Dave, hi Nicolas --
>>>> > >
>>>> > > I am working in Windows 10.
>>>> > >
>>>> > > > I cannot reproduce on Linux 64 bit either:
>>>> > > > (Character value: 16r8000) asInteger hex ==> '16r8000'
>>>> > >
>>>> > > That's not how you would reproduce it. The bug affects character
>>>> > literals,
>>>> > > not character objects/instances. You have to evaluate code on that
>>>> > > character literal.
>>>> > >
>>>> > > Maybe this picture helps:
>>>> > >
>>>> > >
>>>> > >
>>>> > > Best,
>>>> > > Marcel
>>>> > >
>>>> > > Am 08.03.2022 18:56:09 schrieb David T. Lewis :
>>>> > >
>>>> > > I cannot reproduce on Linux 64 bit either:
>>>> > >
>>>> > > (Character value: 16r8000) asInteger hex ==> '16r8000'
>>>> > >
>>>> > > Dave
>>>> > >
>>>> > >
>>>> > > On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote:
>>>> > > >
>>>> > > > Hi Marcel,
>>>> > > > which OS ?
>>>> > > > I cannot reproduce on macos 64,
>>>> > > >
>>>> > > > Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172]
>>>> > > > 5.20211023.2003
>>>> > > > Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1 Compatible
>>>> > > Apple
>>>> > > > LLVM 10.0.1 (clang-1001.0.46.4)
>>>> > > > platform sources revision VM: 202110232003
>>>> > > >
>>>> > > > Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a
>>>> > > > ??crit :
>>>> > > >
>>>> > > > >
>>>> > > > > Hi Eliot, hi all --
>>>> > > > >
>>>> > > > > I think we have an sign-bit bug for character literals with code
>>>> > > points >
>>>> > > > > 16r7FFF.
>>>> > > > >
>>>> > > > > Steps to reproduce:
>>>> > > > >
>>>> > > > > 1. Print it: "Character value: 16r8000"
>>>> > > > > 2. Inspect the result by evaluating the character literal or send
>>>> > > > > #asInteger to it. It will most likely not render in a standard
>>>> > Squeak
>>>> > > and
>>>> > > > > show up like "$? asInteger".
>>>> > > > >
>>>> > > > > In a 32-bit VM, I will get the (positive) integer value 16r3FFF8000.
>>>> > > > > In a 64-bit VM, I will get the (negative) integer value '-16r8000'.
>>>> > > > >
>>>> > > > > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In
>>>> > > 64-bit,
>>>> > > > > this means a negative number. Not sure about bits 30 and 31 here.
>>>> > > > >
>>>> > > > > Is there a bug in the upper tag bits of immediate characters?
>>>> > > > > Is this related to the 2-byte or 3-byte byte codes in SistaV1?
>>>> > > > >
>>>> > > > > Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine
>>>> > was
>>>> > > 0
>>>> > > > > in this experiment.)
>>>> > > > >
>>>> > > > > VM: 202112201228 (VMMaker.oscog-eem.3116)
>>>> > > > >
>>>> > > > > Best,
>>>> > > > > Marcel
>>>> > > > >
>>>> > >
>>>> > >
>>>> > Ah OK, I see it on macos too
>>>> > It remains to determine which operation exactly is involved...
>>>> > The TextMorph holding the printed result is correct - a WideString, whose
>>>> > last Character is (Character value: 32768).
>>>> >
>>>> > Le mer. 9 mars 2022 à 08:08, Marcel Taeumel a
>>>> > écrit :
>>>> >
>>>> >>
>>>> >>
>>>> >> Hi Dave, hi Nicolas --
>>>> >>
>>>> >> I am working in Windows 10.
>>>> >>
>>>> >> > I cannot reproduce on Linux 64 bit either:
>>>> >> > (Character value: 16r8000) asInteger hex ==> '16r8000'
>>>> >>
>>>> >> That's not how you would reproduce it. The bug affects character
>>>> >> literals, not character objects/instances. You have to evaluate code on
>>>> >> that character literal.
>>>> >>
>>>> >> Maybe this picture helps:
>>>> >>
>>>> >>
>>>> >>
>>>> >> Best,
>>>> >> Marcel
>>>> >>
>>>> >>
>>>> >> Am 08.03.2022 18:56:09 schrieb David T. Lewis :
>>>> >>
>>>> >> I cannot reproduce on Linux 64 bit either:
>>>> >>
>>>> >> (Character value: 16r8000) asInteger hex ==> '16r8000'
>>>> >>
>>>> >> Dave
>>>> >>
>>>> >>
>>>> >> On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote:
>>>> >> >
>>>> >> > Hi Marcel,
>>>> >> > which OS ?
>>>> >> > I cannot reproduce on macos 64,
>>>> >> >
>>>> >> > Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172]
>>>> >> > 5.20211023.2003
>>>> >> > Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1 Compatible
>>>> >> Apple
>>>> >> > LLVM 10.0.1 (clang-1001.0.46.4)
>>>> >> > platform sources revision VM: 202110232003
>>>> >> >
>>>> >> > Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a
>>>> >> > ??crit :
>>>> >> >
>>>> >> > >
>>>> >> > > Hi Eliot, hi all --
>>>> >> > >
>>>> >> > > I think we have an sign-bit bug for character literals with code
>>>> >> points >
>>>> >> > > 16r7FFF.
>>>> >> > >
>>>> >> > > Steps to reproduce:
>>>> >> > >
>>>> >> > > 1. Print it: "Character value: 16r8000"
>>>> >> > > 2. Inspect the result by evaluating the character literal or send
>>>> >> > > #asInteger to it. It will most likely not render in a standard Squeak
>>>> >> and
>>>> >> > > show up like "$? asInteger".
>>>> >> > >
>>>> >> > > In a 32-bit VM, I will get the (positive) integer value 16r3FFF8000.
>>>> >> > > In a 64-bit VM, I will get the (negative) integer value '-16r8000'.
>>>> >> > >
>>>> >> > > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In
>>>> >> 64-bit,
>>>> >> > > this means a negative number. Not sure about bits 30 and 31 here.
>>>> >> > >
>>>> >> > > Is there a bug in the upper tag bits of immediate characters?
>>>> >> > > Is this related to the 2-byte or 3-byte byte codes in SistaV1?
>>>> >> > >
>>>> >> > > Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine
>>>> >> was 0
>>>> >> > > in this experiment.)
>>>> >> > >
>>>> >> > > VM: 202112201228 (VMMaker.oscog-eem.3116)
>>>> >> > >
>>>> >> > > Best,
>>>> >> > > Marcel
>>>> >> > >
>>>> >>
>>>> >>
>>>> >
>>>> Hi Marcel,
>>>> yes, I agree, the bug is in bytecode encoding/decoding of immediate Character value,
>>>> I stepped into (Compiler evaluate: (String with: $$ with: (Character value: 16r8000))), and if we step into executeMethod, we can inspect what is going on.
>>>>
>>>>
>>>>> Le mer. 9 mars 2022 à 08:39, Marcel Taeumel <marcel.taeumel at hpi.de> a écrit :
>>>>>
>>>>>
>>>>> Hi Nicolas --
>>>>>
>>>>>
>>>>> There is a bug in the EncoderForSistaV1. The behavior is okay for EncoderForV3PlusClosures. We can discuss this on squeak-dev now, I suppose.
>>>>>
>>>>> CompiledCode preferredBytecodeSetEncoderClass: EncoderForSistaV1.
>>>>> CompiledCode preferredBytecodeSetEncoderClass: EncoderForV3PlusClosures.
>>>>>
>>>>> If you do send #halt instead of #asInteger, you get another interesting debugger when trying to start debugging:
>>>>>
>>>>>
>>>>>
>>>>> Best,
>>>>> Marcel
>>>>>>
>>>>>> Am 09.03.2022 08:34:11 schrieb Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
>>>>>>
>>>>>> Ah OK, I see it on macos too
>>>>>>
>>>>>> It remains to determine which operation exactly is involved...
>>>>>>
>>>>>> The TextMorph holding the printed result is correct - a WideString, whose
>>>>>>
>>>>>> last Character is (Character value: 32768).
>>>>>>
>>>>>>
>>>>>>
>>>>>> Le mer. 9 mars 2022 à 08:08, Marcel Taeumel a
>>>>>>
>>>>>> écrit :
>>>>>>
>>>>>>
>>>>>>
>>>>>> >
>>>>>>
>>>>>> > Hi Dave, hi Nicolas --
>>>>>>
>>>>>> >
>>>>>>
>>>>>> > I am working in Windows 10.
>>>>>>
>>>>>> >
>>>>>>
>>>>>> > > I cannot reproduce on Linux 64 bit either:
>>>>>>
>>>>>> > > (Character value: 16r8000) asInteger hex ==> '16r8000'
>>>>>>
>>>>>> >
>>>>>>
>>>>>> > That's not how you would reproduce it. The bug affects character literals,
>>>>>>
>>>>>> > not character objects/instances. You have to evaluate code on that
>>>>>>
>>>>>> > character literal.
>>>>>>
>>>>>> >
>>>>>>
>>>>>> > Maybe this picture helps:
>>>>>>
>>>>>> >
>>>>>>
>>>>>> >
>>>>>>
>>>>>> >
>>>>>>
>>>>>> > Best,
>>>>>>
>>>>>> > Marcel
>>>>>>
>>>>>> >
>>>>>>
>>>>>> > Am 08.03.2022 18:56:09 schrieb David T. Lewis :
>>>>>>
>>>>>> >
>>>>>>
>>>>>> > I cannot reproduce on Linux 64 bit either:
>>>>>>
>>>>>> >
>>>>>>
>>>>>> > (Character value: 16r8000) asInteger hex ==> '16r8000'
>>>>>>
>>>>>> >
>>>>>>
>>>>>> > Dave
>>>>>>
>>>>>> >
>>>>>>
>>>>>> >
>>>>>>
>>>>>> > On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote:
>>>>>>
>>>>>> > >
>>>>>>
>>>>>> > > Hi Marcel,
>>>>>>
>>>>>> > > which OS ?
>>>>>>
>>>>>> > > I cannot reproduce on macos 64,
>>>>>>
>>>>>> > >
>>>>>>
>>>>>> > > Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172]
>>>>>>
>>>>>> > > 5.20211023.2003
>>>>>>
>>>>>> > > Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1 Compatible
>>>>>>
>>>>>> > Apple
>>>>>>
>>>>>> > > LLVM 10.0.1 (clang-1001.0.46.4)
>>>>>>
>>>>>> > > platform sources revision VM: 202110232003
>>>>>>
>>>>>> > >
>>>>>>
>>>>>> > > Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a
>>>>>>
>>>>>> > > ??crit :
>>>>>>
>>>>>> > >
>>>>>>
>>>>>> > > >
>>>>>>
>>>>>> > > > Hi Eliot, hi all --
>>>>>>
>>>>>> > > >
>>>>>>
>>>>>> > > > I think we have an sign-bit bug for character literals with code
>>>>>>
>>>>>> > points >
>>>>>>
>>>>>> > > > 16r7FFF.
>>>>>>
>>>>>> > > >
>>>>>>
>>>>>> > > > Steps to reproduce:
>>>>>>
>>>>>> > > >
>>>>>>
>>>>>> > > > 1. Print it: "Character value: 16r8000"
>>>>>>
>>>>>> > > > 2. Inspect the result by evaluating the character literal or send
>>>>>>
>>>>>> > > > #asInteger to it. It will most likely not render in a standard Squeak
>>>>>>
>>>>>> > and
>>>>>>
>>>>>> > > > show up like "$? asInteger".
>>>>>>
>>>>>> > > >
>>>>>>
>>>>>> > > > In a 32-bit VM, I will get the (positive) integer value 16r3FFF8000.
>>>>>>
>>>>>> > > > In a 64-bit VM, I will get the (negative) integer value '-16r8000'.
>>>>>>
>>>>>> > > >
>>>>>>
>>>>>> > > > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In
>>>>>>
>>>>>> > 64-bit,
>>>>>>
>>>>>> > > > this means a negative number. Not sure about bits 30 and 31 here.
>>>>>>
>>>>>> > > >
>>>>>>
>>>>>> > > > Is there a bug in the upper tag bits of immediate characters?
>>>>>>
>>>>>> > > > Is this related to the 2-byte or 3-byte byte codes in SistaV1?
>>>>>>
>>>>>> > > >
>>>>>>
>>>>>> > > > Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine was
>>>>>>
>>>>>> > 0
>>>>>>
>>>>>> > > > in this experiment.)
>>>>>>
>>>>>> > > >
>>>>>>
>>>>>> > > > VM: 202112201228 (VMMaker.oscog-eem.3116)
>>>>>>
>>>>>> > > >
>>>>>>
>>>>>> > > > Best,
>>>>>>
>>>>>> > > > Marcel
>>>>>>
>>>>>> > > >
>>>>>>
>>>>>> >
>>>>>>
>>>>>> >
>>>>>>
>>>>>> Ah OK, I see it on macos too
>>>>>> It remains to determine which operation exactly is involved...
>>>>>> The TextMorph holding the printed result is correct - a WideString, whose last Character is (Character value: 32768).
>>>>>>
>>>>>>> Le mer. 9 mars 2022 à 08:08, Marcel Taeumel <marcel.taeumel at hpi.de> a écrit :
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi Dave, hi Nicolas --
>>>>>>>
>>>>>>> I am working in Windows 10.
>>>>>>>
>>>>>>> > I cannot reproduce on Linux 64 bit either:
>>>>>>> > (Character value: 16r8000) asInteger hex ==> '16r8000'
>>>>>>>
>>>>>>> That's not how you would reproduce it. The bug affects character literals, not character objects/instances. You have to evaluate code on that character literal.
>>>>>>>
>>>>>>> Maybe this picture helps:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>> Marcel
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 08.03.2022 18:56:09 schrieb David T. Lewis <lewis at mail.msen.com>:
>>>>>>>>
>>>>>>>>
>>>>>>>> I cannot reproduce on Linux 64 bit either:
>>>>>>>>
>>>>>>>> (Character value: 16r8000) asInteger hex ==> '16r8000'
>>>>>>>>
>>>>>>>> Dave
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote:
>>>>>>>> >
>>>>>>>> > Hi Marcel,
>>>>>>>> > which OS ?
>>>>>>>> > I cannot reproduce on macos 64,
>>>>>>>> >
>>>>>>>> > Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172]
>>>>>>>> > 5.20211023.2003
>>>>>>>> > Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1 Compatible Apple
>>>>>>>> > LLVM 10.0.1 (clang-1001.0.46.4)
>>>>>>>> > platform sources revision VM: 202110232003
>>>>>>>> >
>>>>>>>> > Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a
>>>>>>>> > ??crit :
>>>>>>>> >
>>>>>>>> > >
>>>>>>>> > > Hi Eliot, hi all --
>>>>>>>> > >
>>>>>>>> > > I think we have an sign-bit bug for character literals with code points >
>>>>>>>> > > 16r7FFF.
>>>>>>>> > >
>>>>>>>> > > Steps to reproduce:
>>>>>>>> > >
>>>>>>>> > > 1. Print it: "Character value: 16r8000"
>>>>>>>> > > 2. Inspect the result by evaluating the character literal or send
>>>>>>>> > > #asInteger to it. It will most likely not render in a standard Squeak and
>>>>>>>> > > show up like "$? asInteger".
>>>>>>>> > >
>>>>>>>> > > In a 32-bit VM, I will get the (positive) integer value 16r3FFF8000.
>>>>>>>> > > In a 64-bit VM, I will get the (negative) integer value '-16r8000'.
>>>>>>>> > >
>>>>>>>> > > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In 64-bit,
>>>>>>>> > > this means a negative number. Not sure about bits 30 and 31 here.
>>>>>>>> > >
>>>>>>>> > > Is there a bug in the upper tag bits of immediate characters?
>>>>>>>> > > Is this related to the 2-byte or 3-byte byte codes in SistaV1?
>>>>>>>> > >
>>>>>>>> > > Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine was 0
>>>>>>>> > > in this experiment.)
>>>>>>>> > >
>>>>>>>> > > VM: 202112201228 (VMMaker.oscog-eem.3116)
>>>>>>>> > >
>>>>>>>> > > Best,
>>>>>>>> > > Marcel
>>>>>>>> > >
>>>>>>>>
>>>>>>
>>>>>>
>>>>
>
>
> --
> Clément Béra
> https://clementbera.github.io/
> https://clementbera.wordpress.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20220310/24578c23/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 90427 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20220310/24578c23/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 14513 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20220310/24578c23/attachment-0003.png>
More information about the Vm-dev
mailing list