[Vm-dev] Sign-bit bug in character literals > 16r7FFF ... related to SistaV1?

Eliot Miranda eliot.miranda at gmail.com
Thu Mar 10 15:59:19 UTC 2022



> On Mar 9, 2022, at 11:24 AM, Clément Béra <bera.clement at gmail.com> wrote:
> 
> 
> Seeing this, I believe that bit was used for something else in sista and we agreed with Eliot 32k literals was enough? I cannot remember. 
> I think the bit meant Cogit should not generate profiling counter for the method or something like that.

Exactly

> 
>> On Wed, Mar 9, 2022 at 3:41 PM Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com> wrote:
>>  
>> Or just restrict EncoderForSistaV1>>#genPushCharacter:
>> 
>> ...snip...
>>      (code < 0 or: [code > 16r7FFF]) ifTrue:
>>          [^self outOfRangeError: 'character' index: code range: 0 to: 16r7FFF].
>> ...snip...
>> 
>>> Le mer. 9 mars 2022 à 14:16, Marcel Taeumel <marcel.taeumel at hpi.de> a écrit :
>>>  
>>> Hi Nicolas --
>>> 
>>> Thanks! Also for the proposed workaround in VMMaker.oscog-nice.3174.
>>> 
>>> For what it's worth, one can always replace the character-literal syntax with string access:
>>> 
>>> $x.
>>> 'x' first.
>>> 
>>> Or store the code point if the optical appearance is not relevant:
>>> 
>>> Character value: 16r78. 
>>> 
>>> Best,
>>> Marcel
>>>> Am 09.03.2022 10:02:46 schrieb Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
>>>> 
>>>> Hi Marcel, 
>>>> yes, I agree, the bug is in bytecode encoding/decoding of immediate 
>>>> Character value, 
>>>> I stepped into (Compiler evaluate: (String with: $$ with: (Character value: 
>>>> 16r8000))), and if we step into executeMethod, we can inspect what is going 
>>>> on. 
>>>> 
>>>> 
>>>> Le mer. 9 mars 2022 à 08:39, Marcel Taeumel a 
>>>> écrit : 
>>>> 
>>>> > 
>>>> > Hi Nicolas -- 
>>>> > 
>>>> > There is a bug in the EncoderForSistaV1. The behavior is okay for 
>>>> > EncoderForV3PlusClosures. We can discuss this on squeak-dev now, I suppose. 
>>>> > 
>>>> > CompiledCode preferredBytecodeSetEncoderClass: EncoderForSistaV1. 
>>>> > CompiledCode preferredBytecodeSetEncoderClass: EncoderForV3PlusClosures. 
>>>> > 
>>>> > If you do send #halt instead of #asInteger, you get another interesting 
>>>> > debugger when trying to start debugging: 
>>>> > 
>>>> > 
>>>> > 
>>>> > Best, 
>>>> > Marcel 
>>>> > 
>>>> > Am 09.03.2022 08:34:11 schrieb Nicolas Cellier < 
>>>> > nicolas.cellier.aka.nice at gmail.com>: 
>>>> > Ah OK, I see it on macos too 
>>>> > It remains to determine which operation exactly is involved... 
>>>> > The TextMorph holding the printed result is correct - a WideString, whose 
>>>> > last Character is (Character value: 32768). 
>>>> > 
>>>> > Le mer. 9 mars 2022 à 08:08, Marcel Taeumel a 
>>>> > écrit : 
>>>> > 
>>>> > > 
>>>> > > Hi Dave, hi Nicolas -- 
>>>> > > 
>>>> > > I am working in Windows 10. 
>>>> > > 
>>>> > > > I cannot reproduce on Linux 64 bit either: 
>>>> > > > (Character value: 16r8000) asInteger hex ==> '16r8000' 
>>>> > > 
>>>> > > That's not how you would reproduce it. The bug affects character 
>>>> > literals, 
>>>> > > not character objects/instances. You have to evaluate code on that 
>>>> > > character literal. 
>>>> > > 
>>>> > > Maybe this picture helps: 
>>>> > > 
>>>> > > 
>>>> > > 
>>>> > > Best, 
>>>> > > Marcel 
>>>> > > 
>>>> > > Am 08.03.2022 18:56:09 schrieb David T. Lewis : 
>>>> > > 
>>>> > > I cannot reproduce on Linux 64 bit either: 
>>>> > > 
>>>> > > (Character value: 16r8000) asInteger hex ==> '16r8000' 
>>>> > > 
>>>> > > Dave 
>>>> > > 
>>>> > > 
>>>> > > On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote: 
>>>> > > > 
>>>> > > > Hi Marcel, 
>>>> > > > which OS ? 
>>>> > > > I cannot reproduce on macos 64, 
>>>> > > > 
>>>> > > > Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172] 
>>>> > > > 5.20211023.2003 
>>>> > > > Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1 Compatible 
>>>> > > Apple 
>>>> > > > LLVM 10.0.1 (clang-1001.0.46.4) 
>>>> > > > platform sources revision VM: 202110232003 
>>>> > > > 
>>>> > > > Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a 
>>>> > > > ??crit : 
>>>> > > > 
>>>> > > > > 
>>>> > > > > Hi Eliot, hi all -- 
>>>> > > > > 
>>>> > > > > I think we have an sign-bit bug for character literals with code 
>>>> > > points > 
>>>> > > > > 16r7FFF. 
>>>> > > > > 
>>>> > > > > Steps to reproduce: 
>>>> > > > > 
>>>> > > > > 1. Print it: "Character value: 16r8000" 
>>>> > > > > 2. Inspect the result by evaluating the character literal or send 
>>>> > > > > #asInteger to it. It will most likely not render in a standard 
>>>> > Squeak 
>>>> > > and 
>>>> > > > > show up like "$? asInteger". 
>>>> > > > > 
>>>> > > > > In a 32-bit VM, I will get the (positive) integer value 16r3FFF8000. 
>>>> > > > > In a 64-bit VM, I will get the (negative) integer value '-16r8000'. 
>>>> > > > > 
>>>> > > > > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In 
>>>> > > 64-bit, 
>>>> > > > > this means a negative number. Not sure about bits 30 and 31 here. 
>>>> > > > > 
>>>> > > > > Is there a bug in the upper tag bits of immediate characters? 
>>>> > > > > Is this related to the 2-byte or 3-byte byte codes in SistaV1? 
>>>> > > > > 
>>>> > > > > Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine 
>>>> > was 
>>>> > > 0 
>>>> > > > > in this experiment.) 
>>>> > > > > 
>>>> > > > > VM: 202112201228 (VMMaker.oscog-eem.3116) 
>>>> > > > > 
>>>> > > > > Best, 
>>>> > > > > Marcel 
>>>> > > > > 
>>>> > > 
>>>> > > 
>>>> > Ah OK, I see it on macos too 
>>>> > It remains to determine which operation exactly is involved... 
>>>> > The TextMorph holding the printed result is correct - a WideString, whose 
>>>> > last Character is (Character value: 32768). 
>>>> > 
>>>> > Le mer. 9 mars 2022 à 08:08, Marcel Taeumel a 
>>>> > écrit : 
>>>> > 
>>>> >> 
>>>> >> 
>>>> >> Hi Dave, hi Nicolas -- 
>>>> >> 
>>>> >> I am working in Windows 10. 
>>>> >> 
>>>> >> > I cannot reproduce on Linux 64 bit either: 
>>>> >> > (Character value: 16r8000) asInteger hex ==> '16r8000' 
>>>> >> 
>>>> >> That's not how you would reproduce it. The bug affects character 
>>>> >> literals, not character objects/instances. You have to evaluate code on 
>>>> >> that character literal. 
>>>> >> 
>>>> >> Maybe this picture helps: 
>>>> >> 
>>>> >> 
>>>> >> 
>>>> >> Best, 
>>>> >> Marcel 
>>>> >> 
>>>> >> 
>>>> >> Am 08.03.2022 18:56:09 schrieb David T. Lewis : 
>>>> >> 
>>>> >> I cannot reproduce on Linux 64 bit either: 
>>>> >> 
>>>> >> (Character value: 16r8000) asInteger hex ==> '16r8000' 
>>>> >> 
>>>> >> Dave 
>>>> >> 
>>>> >> 
>>>> >> On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote: 
>>>> >> > 
>>>> >> > Hi Marcel, 
>>>> >> > which OS ? 
>>>> >> > I cannot reproduce on macos 64, 
>>>> >> > 
>>>> >> > Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172] 
>>>> >> > 5.20211023.2003 
>>>> >> > Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1 Compatible 
>>>> >> Apple 
>>>> >> > LLVM 10.0.1 (clang-1001.0.46.4) 
>>>> >> > platform sources revision VM: 202110232003 
>>>> >> > 
>>>> >> > Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a 
>>>> >> > ??crit : 
>>>> >> > 
>>>> >> > > 
>>>> >> > > Hi Eliot, hi all -- 
>>>> >> > > 
>>>> >> > > I think we have an sign-bit bug for character literals with code 
>>>> >> points > 
>>>> >> > > 16r7FFF. 
>>>> >> > > 
>>>> >> > > Steps to reproduce: 
>>>> >> > > 
>>>> >> > > 1. Print it: "Character value: 16r8000" 
>>>> >> > > 2. Inspect the result by evaluating the character literal or send 
>>>> >> > > #asInteger to it. It will most likely not render in a standard Squeak 
>>>> >> and 
>>>> >> > > show up like "$? asInteger". 
>>>> >> > > 
>>>> >> > > In a 32-bit VM, I will get the (positive) integer value 16r3FFF8000. 
>>>> >> > > In a 64-bit VM, I will get the (negative) integer value '-16r8000'. 
>>>> >> > > 
>>>> >> > > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In 
>>>> >> 64-bit, 
>>>> >> > > this means a negative number. Not sure about bits 30 and 31 here. 
>>>> >> > > 
>>>> >> > > Is there a bug in the upper tag bits of immediate characters? 
>>>> >> > > Is this related to the 2-byte or 3-byte byte codes in SistaV1? 
>>>> >> > > 
>>>> >> > > Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine 
>>>> >> was 0 
>>>> >> > > in this experiment.) 
>>>> >> > > 
>>>> >> > > VM: 202112201228 (VMMaker.oscog-eem.3116) 
>>>> >> > > 
>>>> >> > > Best, 
>>>> >> > > Marcel 
>>>> >> > > 
>>>> >> 
>>>> >> 
>>>> > 
>>>> Hi Marcel,
>>>> yes, I agree, the bug is in bytecode encoding/decoding of immediate Character value,
>>>> I stepped into (Compiler evaluate: (String with: $$ with: (Character value: 16r8000))), and if we step into executeMethod, we can inspect what is going on.
>>>> 
>>>> 
>>>>> Le mer. 9 mars 2022 à 08:39, Marcel Taeumel <marcel.taeumel at hpi.de> a écrit :
>>>>>  
>>>>> 
>>>>> Hi Nicolas --
>>>>> 
>>>>> 
>>>>> There is a bug in the EncoderForSistaV1. The behavior is okay for EncoderForV3PlusClosures. We can discuss this on squeak-dev now, I suppose.
>>>>> 
>>>>> CompiledCode preferredBytecodeSetEncoderClass: EncoderForSistaV1.
>>>>> CompiledCode preferredBytecodeSetEncoderClass: EncoderForV3PlusClosures.
>>>>> 
>>>>> If you do send #halt instead of #asInteger, you get another interesting debugger when trying to start debugging:
>>>>> 
>>>>> 
>>>>> 
>>>>> Best,
>>>>> Marcel
>>>>>> 
>>>>>> Am 09.03.2022 08:34:11 schrieb Nicolas Cellier <nicolas.cellier.aka.nice at gmail.com>:
>>>>>> 
>>>>>> Ah OK, I see it on macos too 
>>>>>> 
>>>>>> It remains to determine which operation exactly is involved... 
>>>>>> 
>>>>>> The TextMorph holding the printed result is correct - a WideString, whose 
>>>>>> 
>>>>>> last Character is (Character value: 32768). 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Le mer. 9 mars 2022 à 08:08, Marcel Taeumel a 
>>>>>> 
>>>>>> écrit : 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> > Hi Dave, hi Nicolas -- 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> > I am working in Windows 10. 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> > > I cannot reproduce on Linux 64 bit either: 
>>>>>> 
>>>>>> > > (Character value: 16r8000) asInteger hex ==> '16r8000' 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> > That's not how you would reproduce it. The bug affects character literals, 
>>>>>> 
>>>>>> > not character objects/instances. You have to evaluate code on that 
>>>>>> 
>>>>>> > character literal. 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> > Maybe this picture helps: 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> > Best, 
>>>>>> 
>>>>>> > Marcel 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> > Am 08.03.2022 18:56:09 schrieb David T. Lewis : 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> > I cannot reproduce on Linux 64 bit either: 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> > (Character value: 16r8000) asInteger hex ==> '16r8000' 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> > Dave 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> > On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote: 
>>>>>> 
>>>>>> > > 
>>>>>> 
>>>>>> > > Hi Marcel, 
>>>>>> 
>>>>>> > > which OS ? 
>>>>>> 
>>>>>> > > I cannot reproduce on macos 64, 
>>>>>> 
>>>>>> > > 
>>>>>> 
>>>>>> > > Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172] 
>>>>>> 
>>>>>> > > 5.20211023.2003 
>>>>>> 
>>>>>> > > Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1 Compatible 
>>>>>> 
>>>>>> > Apple 
>>>>>> 
>>>>>> > > LLVM 10.0.1 (clang-1001.0.46.4) 
>>>>>> 
>>>>>> > > platform sources revision VM: 202110232003 
>>>>>> 
>>>>>> > > 
>>>>>> 
>>>>>> > > Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a 
>>>>>> 
>>>>>> > > ??crit : 
>>>>>> 
>>>>>> > > 
>>>>>> 
>>>>>> > > > 
>>>>>> 
>>>>>> > > > Hi Eliot, hi all -- 
>>>>>> 
>>>>>> > > > 
>>>>>> 
>>>>>> > > > I think we have an sign-bit bug for character literals with code 
>>>>>> 
>>>>>> > points > 
>>>>>> 
>>>>>> > > > 16r7FFF. 
>>>>>> 
>>>>>> > > > 
>>>>>> 
>>>>>> > > > Steps to reproduce: 
>>>>>> 
>>>>>> > > > 
>>>>>> 
>>>>>> > > > 1. Print it: "Character value: 16r8000" 
>>>>>> 
>>>>>> > > > 2. Inspect the result by evaluating the character literal or send 
>>>>>> 
>>>>>> > > > #asInteger to it. It will most likely not render in a standard Squeak 
>>>>>> 
>>>>>> > and 
>>>>>> 
>>>>>> > > > show up like "$? asInteger". 
>>>>>> 
>>>>>> > > > 
>>>>>> 
>>>>>> > > > In a 32-bit VM, I will get the (positive) integer value 16r3FFF8000. 
>>>>>> 
>>>>>> > > > In a 64-bit VM, I will get the (negative) integer value '-16r8000'. 
>>>>>> 
>>>>>> > > > 
>>>>>> 
>>>>>> > > > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In 
>>>>>> 
>>>>>> > 64-bit, 
>>>>>> 
>>>>>> > > > this means a negative number. Not sure about bits 30 and 31 here. 
>>>>>> 
>>>>>> > > > 
>>>>>> 
>>>>>> > > > Is there a bug in the upper tag bits of immediate characters? 
>>>>>> 
>>>>>> > > > Is this related to the 2-byte or 3-byte byte codes in SistaV1? 
>>>>>> 
>>>>>> > > > 
>>>>>> 
>>>>>> > > > Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine was 
>>>>>> 
>>>>>> > 0 
>>>>>> 
>>>>>> > > > in this experiment.) 
>>>>>> 
>>>>>> > > > 
>>>>>> 
>>>>>> > > > VM: 202112201228 (VMMaker.oscog-eem.3116) 
>>>>>> 
>>>>>> > > > 
>>>>>> 
>>>>>> > > > Best, 
>>>>>> 
>>>>>> > > > Marcel 
>>>>>> 
>>>>>> > > > 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> > 
>>>>>> 
>>>>>> Ah OK, I see it on macos too
>>>>>> It remains to determine which operation exactly is involved...
>>>>>> The TextMorph holding the printed result is correct - a WideString, whose last Character is (Character value: 32768).
>>>>>> 
>>>>>>> Le mer. 9 mars 2022 à 08:08, Marcel Taeumel <marcel.taeumel at hpi.de> a écrit :
>>>>>>>  
>>>>>>> 
>>>>>>> 
>>>>>>> Hi Dave, hi Nicolas --
>>>>>>> 
>>>>>>> I am working in Windows 10.
>>>>>>> 
>>>>>>> > I cannot reproduce on Linux 64 bit either:
>>>>>>> > (Character value: 16r8000) asInteger hex ==> '16r8000'
>>>>>>> 
>>>>>>> That's not how you would reproduce it. The bug affects character literals, not character objects/instances. You have to evaluate code on that character literal.
>>>>>>> 
>>>>>>> Maybe this picture helps:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> Marcel
>>>>>>> 
>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Am 08.03.2022 18:56:09 schrieb David T. Lewis <lewis at mail.msen.com>:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> I cannot reproduce on Linux 64 bit either:
>>>>>>>> 
>>>>>>>> (Character value: 16r8000) asInteger hex ==> '16r8000' 
>>>>>>>> 
>>>>>>>> Dave
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Mar 08, 2022 at 06:45:23PM +0100, Nicolas Cellier wrote:
>>>>>>>> > 
>>>>>>>> > Hi Marcel,
>>>>>>>> > which OS ?
>>>>>>>> > I cannot reproduce on macos 64,
>>>>>>>> > 
>>>>>>>> > Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3172]
>>>>>>>> > 5.20211023.2003
>>>>>>>> > Mac OS X built on Mar 6 2022 15:31:16 CET Compiler: 4.2.1 Compatible Apple
>>>>>>>> > LLVM 10.0.1 (clang-1001.0.46.4)
>>>>>>>> > platform sources revision VM: 202110232003
>>>>>>>> > 
>>>>>>>> > Le mar. 8 mars 2022 ?? 17:57, Marcel Taeumel a
>>>>>>>> > ??crit :
>>>>>>>> > 
>>>>>>>> > >
>>>>>>>> > > Hi Eliot, hi all --
>>>>>>>> > >
>>>>>>>> > > I think we have an sign-bit bug for character literals with code points >
>>>>>>>> > > 16r7FFF.
>>>>>>>> > >
>>>>>>>> > > Steps to reproduce:
>>>>>>>> > >
>>>>>>>> > > 1. Print it: "Character value: 16r8000"
>>>>>>>> > > 2. Inspect the result by evaluating the character literal or send
>>>>>>>> > > #asInteger to it. It will most likely not render in a standard Squeak and
>>>>>>>> > > show up like "$? asInteger".
>>>>>>>> > >
>>>>>>>> > > In a 32-bit VM, I will get the (positive) integer value 16r3FFF8000.
>>>>>>>> > > In a 64-bit VM, I will get the (negative) integer value '-16r8000'.
>>>>>>>> > >
>>>>>>>> > > Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In 64-bit,
>>>>>>>> > > this means a negative number. Not sure about bits 30 and 31 here.
>>>>>>>> > >
>>>>>>>> > > Is there a bug in the upper tag bits of immediate characters?
>>>>>>>> > > Is this related to the 2-byte or 3-byte byte codes in SistaV1?
>>>>>>>> > >
>>>>>>>> > > Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine was 0
>>>>>>>> > > in this experiment.)
>>>>>>>> > >
>>>>>>>> > > VM: 202112201228 (VMMaker.oscog-eem.3116)
>>>>>>>> > >
>>>>>>>> > > Best,
>>>>>>>> > > Marcel
>>>>>>>> > >
>>>>>>>> 
>>>>>> 
>>>>>> 
>>>> 
> 
> 
> -- 
> Clément Béra
> https://clementbera.github.io/
> https://clementbera.wordpress.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20220310/24578c23/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 90427 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20220310/24578c23/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 14513 bytes
Desc: not available
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20220310/24578c23/attachment-0003.png>


More information about the Vm-dev mailing list