[Vm-dev] Re: I need help building Cog on 64bit Linux (new Squeak server)

Nicolas Cellier nicolas.cellier.aka.nice at gmail.com
Mon Feb 4 12:52:07 UTC 2013


2013/2/4 Frank Shearar <frank.shearar at gmail.com>:
>
> On 4 February 2013 08:41, Jeremy Kajikawa <jeremy.kajikawa at gmail.com> wrote:
>>
>>
>> The behaviour in SmallTalk relies on the underlying framework which you state is C language and full of undefined behaviour...
>
> That's not quite true. Smalltalk's behaviour is perfectly well
> defined, in terms of a stack machine. This PARTICULAR VM is built
> using Slang, a stripped down dialect of Smalltalk that can be mapped
> quite easily to C.
>
> There is nothing in Smalltalk's definition about how you implement a
> virtual machine and, in fact, people have implemented a VM in PyPy,
> while others have run Smalltalk _directly_on_hardware_.
>
> frank
>
>> Doesn't that make the SmallTalk environment Undefined Behavior as well by inheriting the UB from the C layer it builds on?
>
> No.
>
>> Or is that just irrational F.U.D. in promotion of SmallTalk by detraction against C?
>
> Yes.
>

Irrational FUD?
No, it's just a question of using the right tool for the job.

You must keep in mind that when you write a VM, you have to deal with
low level data representation, and you want to be in control.
It's like using kind of bit fields (not the unfortunately unportable ones of C).
Then the VM has to perform all kind of bit ops to retrieve data
(shift, or, and etc...).
Some of these ops can be problematic for signed values (arithmetic vs
logical shift, two-complement vs one-complement etc...).

The fact is that with traditional C compilers, some of these ops were
reliably well defined for 2-complement machines, maybe not by the
standard, but at least by the compiler, and this was the case with old
gcc (2.95 & 3.4).
So C compiler served very well our goal to have a kind of generic
assembly (I think, its original purpose).

It's not the same for more recent version of C compilers: our VM code
that used to just work now doesn't.
I never said it was the fault of C. Relying on non standard, non
supported features is entirely our fault.
But we really need these features to program the VM.
And these features are well defined at assembly level...
That just indicates that the gap between C and assembly has grown a bit.

Of course, it's still possible to use C and keep this fine grain control.
But we have to evaluate cost.
Well defined constructs are somehow technical and non intuitive, not
to say convoluted.
That's the only thing I said against C, that the signed arithmetic
model is particularly broken, in the sense that it has a plethora of
undefined behavior.
And I'd like to be proven wrong, but I doubt I'll be.
To avoid UB we gonna need plenty of cast to unsigned, even if the
nature of underlying data is 2-complement signed. That's far from
ideal.

Tracking and correcting has a cost that we did not want to anticipate
and that we now face.
It's more complex than just fixing the code, since it is generated, we
must fix the generator...
But the generator is hackish and does not reflect the complexity of
underlying C rules...
It just used to work, along with traditional compilers...
Entirely our fault again for sure, but what to do now?
Either we reflect the complex C arithmetic model in the generator - or
we patch it like mad to workaround known defects in generated code.
In the short term, we will probably hack and patch if we can.
Since we also target JIT, and have some progress on this front, we can
legitimately wonder if C will still be the right tool in the future.

Nicolas

> frank
>
>> *both* languages are capable and leave defining behavior to the authoring programmer/coder/software-developer/...
>>
>> Or at least that is my own understanding...
>>
>> Belxjander
>>
>> On Feb 4, 2013 5:32 PM, "Camillo Bruni" <camillobruni at gmail.com> wrote:
>>>
>>>
>>>
>>> On 2013-02-04, at 08:49, Igor Stasenko <siguctua at gmail.com> wrote:
>>>
>>> >
>>> > On 28 January 2013 23:07, Nicolas Cellier
>>> > <nicolas.cellier.aka.nice at gmail.com> wrote:
>>> >>
>>> >> 2013/1/28 Ken Causey <ken at kencausey.com>:
>>> >>>
>>> >>> Eliot said:
>>> >>>>
>>> >>>> Hmmm.  Sorry to put you to this but what happens when you run the r2669,
>>> >>>> r2672 and r2673 VMs from http://www.mirandabanda.org/files/Cog/VM/? If
>>> >>>> these don't crash then it might be something to do with gcc 4.4.x.  But
>>> >>>> I'd
>>> >>>> have to take a look, and time is tight right now...  But if any of them do
>>> >>>> work could you use them for the interim?
>>> >>>
>>> >>>
>>> >>> Not a problem and thanks for the reply.
>>> >>>
>>> >>> Well I started with 2673 and the tests are still running but it would have
>>> >>> crashed by now if the same problem exists so it's looking like the gcc
>>> >>> version is the issue.  I will try earlier gcc versions and report back.
>>> >>>
>>> >>> It's a little disheartening that in this day and age we are tickling gcc
>>> >>> issues when the same version of gcc is used to build the kernel and
>>> >>> thousands upon thousands of Debian binaries which (by and large anyway) seem
>>> >>> to be fine.
>>> >>>
>>> >>> Ken
>>> >>
>>> >> And the answer would be: don't rely on UB (Undefined Behavior)
>>> >> Modern interpretation of the standards is that a compiler has a
>>> >> license to ignore UB in order to perform optimizations... This is
>>> >> because no one should rely on UB.
>>> >> Unfortunately, the underlying C language is full of UB, and the signed
>>> >> arithmetic model is particularly broken...
>>> >>
>>> >> I doubt the thousands of packages have been working unchanged...
>>> >> They work with army of programmers maintaining the code and chasing
>>> >> the compiler warnings.
>>> >> As long as we ignore the warnings, we are in danger.
>>> >> As long as we have several hundreds warnings, there is no easy way to
>>> >> analyze their dangerosity...
>>> >>
>>> >
>>> > I cannot agree more.
>>>
>>> well you mute them, right?
>>
>>


More information about the Vm-dev mailing list