[squeak-dev] Sign-bit bug in character literals > 16r7FFF ... related to SistaV1?

David T. Lewis lewis at mail.msen.com
Tue Mar 8 23:39:41 UTC 2022

On Tue, Mar 08, 2022 at 05:57:32PM +0100, Marcel Taeumel wrote:
> Hi Eliot, hi all --
> I think we have an sign-bit bug for character literals with code points > 16r7FFF.
> Steps to reproduce:
> 1. Print it: "Character value: 16r8000"
> 2. Inspect the result by evaluating the character literal or send #asInteger to it. It will most likely not render in a standard Squeak and show up like "$? asInteger".
> In a 32-bit VM, I will get the (positive) integer value 16r3FFF8000.
> In a 64-bit VM, I will get the (negative) integer value '-16r8000'.
> Somehow, starting at bit 0, the bits 16 to 29 flip from 0 to 1. In 64-bit, this means a negative number. Not sure about bits 30 and 31 here.
> Is there a bug in the upper tag bits of immediate characters?
> Is this related to the 2-byte or 3-byte byte codes in SistaV1?
> Works fine up to 16r7FFF. (This is unrelated to #leadingChar. Mine was 0 in this experiment.)
> VM:??202112201228 (VMMaker.oscog-eem.3116)
> Best,
> Marcel

Hi Marcel,

This integer type declaration stuff is enough to give anybody a headache,
so here is a tip to make it slightly less obscure without leaving the
comfort of the Squeak image.

First load package TwosComplement, either from SqueakMap or from

Then take a look at the two suspicious integer values that you get from
your 32-bit VM and 64-bit VM, rendering them as 32-bit twos complement
(the common case for C int on most platforms):

	{ 16r3FFF8000 asRegister: 32 . -16r8000 asRegister: 32} inspect.

or the simpler version (since 32 bits is the default):

	{ 16r3FFF8000 asRegister . -16r8000 asRegister} inspect.

This shows the low order 16 bits (the actual character value) as valid
in both cases, and the high order 16 bits as garbage related to integer
type declaration and/or sign extension in the VM. 

Very likely this will turn out to be an issue in primitive 171,
InterpreterPrimitives>>primitiveImmediateAsInteger. And it may be an
issue related to integer data types in Windows versus unix-based systems.

The issue will not be related to the upper tag bits of immediate
characters, and it will not be related to the 2-byte or 3-byte byte
codes in SistaV1. It's just some sort of type declaration issue in
the VM code, that's all.


More information about the Squeak-dev mailing list