[Vm-dev] failing/errors Pharo Tests with CogVM

Fri Sep 17 08:00:21 UTC 2010

2010/9/17 Andreas Raab <andreas.raab at gmx.de>:
>
> On 9/16/2010 2:44 PM, Nicolas Cellier wrote:
>>
>> Of course, as Andreas and Igor, I also prefer a frank exception to
>> creeping NaNs...
>>
>> The main reason for having NaN is compatibility with external world
>> indeed...
>> Some alien code will produce some NaN and we have to deal with it.
>
> Right. But that's not inconsistent from my perspective. I'm not saying
> disallow NaN altogether; I'm saying don't allow silently introducing it in
> arithmetic operations. Thus, if you have external code that produces NaN you
> can't use that easily to produce further NaN's by means of just arithmetic.
>
> BTW, I'd be *really* curious to see what kind of code people would really
> argue to have propagating NaNs for. It seems we're all violently agreeing
> that none of us want silent NaN propagation so why exactly are we arguing
> again? ;-)
>

It all depends on whether the arithmetic unit can easily deliver
exceptions and whether programming language can easily handle them.
While the former is a requirement of IEEE 754, the latter was far from
obvious 30 years ago
In this context my understanding of the rationale is that it can be
more efficient to test for an exceptional result once after a batch of
computations, rather than at each arithmetic operation.
With advent of decent exception handling in most languages, i can't
really agree with this rationale (or my understanding of it), but
that's easy in 2010 ;)

Of course, my own rationale is only valid in a restricted set of
architectures...
Note that with the advent of parallel FPU, a single exception is
signalled after a batch of computations too... This complexifies a bit
the exception handling. This could be complexified again by other
hardware choices...
BTW, good luck to those trying to use GPU as a FPU; as long as these
are not governed by a standard of the quality of IEEE 754.

Nicolas

> Cheers,
>  - Andreas
>
>>
>> Nicolas
>>
>> 2010/9/16 Eliot Miranda<eliot.miranda at gmail.com>:
>>>
>>>
>>>
>>> On Thu, Sep 16, 2010 at 1:25 PM, Andreas Raab<andreas.raab at gmx.de>
>>>  wrote:
>>>>
>>>> On 9/16/2010 12:32 PM, Eliot Miranda wrote:
>>>>>
>>>>>     I need to check these carefully.  One thing that does differ in
>>>>> current Cog is that the machine-code arithmetic float primitives don't
>>>>> fail if they produce a NaN result; they simply answer a NaN result.
>>>>>  IMo
>>>>> what needs to be done is two-fold.
>>>>>
>>>>> a) we need a NaN mode flag in the VM, that persists across snapshots
>>>>> and
>>>>> e.g. is queryable/settable via vmParameterAt:put:, that puts the
>>>>> floating-pont primitive into a state where NaNs are answered instead of
>>>>> primitives failing.
>>>>
>>>> FWIW, I don't think we need that flag. Failing the primitive instead of
>>>> producing something that is specifically declared not to be a number in an
>>>> arithmetic computation is *always* the right thing to do. The problem with
>>>> NaNs is that they propagate. So you start with an isolated NaN as the result
>>>> of a division by an underflow number and may be able to catch it. But you
>>>> don't because it's silent and then it propagates into a matrix because
>>>> you're scaling the matrix by that number. Now you've got a matrix full of
>>>> NaNs. As you push your geometry through that matrix everything becomes
>>>> complete and utter NaN garbage. And of course, NaNs break reflexivity,
>>>> symmetry and transitivity of comparisons.
>>>
>>> We went through a long series of discussions with customers taking
>>> exactly the position you are and finally capitulated because some customers
>>> required IEEE behavior. NaNs do all that you say, but for good reason.  If
>>> they appear in one's calculations then one's calculations are unsafe.  NaNs
>>> exist anyway; they creep in through the FFI even if the VM refises to
>>> produce them.  So they're hard to sweep under the carpet.  One principled
>>> position is to allow them and deal with them correctly.
>>> FWIW, I'm with you.  I would rather the primitives always failed  But
>>> I've lost that argument against people who knew what they were talking about
>>> (I'm no floating point expert).  So I like the flag because it keeps
>>> people's options open.
>>>>
>>>> As a consequence we should never allow them to be introduced silently by
>>>> the VM. If the error handling code for some arithmetic primitive decides
>>>> that against all reasoning you'd like to produce an NaN as the result
>>>> regardless, that's fine, you have been warned. But having the VM introduce
>>>> NaNs silently is wrong, wrong, wrong.
>>>
>>> The current situation in Cog is certainly wrong.  But as discussed above
>>> I don't think it's wrong for the VM to introduce them if it is explicitly in
>>> such a mode and I know from experience that users want and even need such a
>>> mode.
>>> best,
>>> Eliot
>>>>
>>>> Cheers,
>>>>  - Andreas
>>>>
>>>>> b) the Cog code generator needs to respect this flag and arrange that
>>>>> when in the default mode (current behavior) the machine-code arithmetic
>>>>> float primitives also fail if they produce a NaN result.
>>>>>
>>>>> We can then decide at a later date whether to change the primitive
>>>>> behavior to answer NaNs or not.  This is also what we did in
>>>>> VisualWorks; there's an IEEE arithmetic mode and in recent releases
>>>>> VW's
>>>>> floating-point arithmetic will produce NaNs.
>>>>>
>>>>> Anyone interested in taking a look at this is very welcome.  Its
>>>>> probably a week long project at most.
>>>>>
>>>>> best,
>>>>> Eliot
>>>>>
>>>>> On Thu, Sep 16, 2010 at 12:19 PM, Nicolas Cellier
>>>>> <nicolas.cellier.aka.nice at gmail.com
>>>>> <mailto:nicolas.cellier.aka.nice at gmail.com>>  wrote:
>>>>>
>>>>>
>>>>>    I mean the M7260-primitiveSmallIntegerCompareNan-Patch-nice.1.cs
>>>>> part,
>>>>>    the rest has already been applied in COG.
>>>>>
>>>>>    Nicolas
>>>>>
>>>>>    2010/9/16 Nicolas Cellier<nicolas.cellier.aka.nice at gmail.com
>>>>>    <mailto:nicolas.cellier.aka.nice at gmail.com>>:
>>>>>     >  I see http://bugs.squeak.org/view.php?id=7260 was not integrated
>>>>> in
>>>>>     >  COG, which was the cause of most of the Floating point failures
>>>>>    in old
>>>>>     >  VM, but maybe it's now more complex than that ?
>>>>>     >
>>>>>     >  Nicolas
>>>>>     >
>>>>>     >  2010/9/16 Mariano Martinez Peck<marianopeck at gmail.com
>>>>>    <mailto:marianopeck at gmail.com>>:
>>>>>     >>
>>>>>     >>  Hi Eliot. I took a Pharo 1.1.1 image (which has included the
>>>>>    changes to run Cog) and I run all the tests with the build  r2219
>>>>>     >>
>>>>>     >>  And these are the results:
>>>>>     >>
>>>>>     >>  9768 run, 9698 passes, 53 expected failures, 15 failures, 2
>>>>>    errors, 0 unexpected passes
>>>>>     >>  Failures:
>>>>>     >>  FloatTest>>#testRaisedTo
>>>>>     >>  MCInitializationTest>>#testWorkingCopy
>>>>>     >>  FloatTest>>#testReciprocal
>>>>>     >>  ReleaseTest>>#testUndeclared
>>>>>     >>  FloatTest>>#testDivide
>>>>>     >>  MethodContextTest>>#testClosureRestart
>>>>>     >>  FloatTest>>#testCloseTo
>>>>>     >>  FloatTest>>#testHugeIntegerCloseTo
>>>>>     >>  FloatTest>>#testInfinityCloseTo
>>>>>     >>  WeakRegistryTest>>#testFinalization
>>>>>     >>  PCCByLiteralsTest>>#testSwitchPrimCallOffOn
>>>>>     >>  AllocationTest>>#testOneGigAllocation
>>>>>     >>  FloatTest>>#testNaNCompare
>>>>>     >>  FileStreamTest>>#testPositionPastEndIsAtEnd
>>>>>     >>  NumberTest>>#testRaisedToIntegerWithFloats
>>>>>     >>
>>>>>     >>  Errors:
>>>>>     >>  MessageTallyTest>>#testSampling1
>>>>>     >>  WeakSetInspectorTest>>#testSymbolTableM6812
>>>>>     >>
>>>>>     >>
>>>>>     >>
>>>>>     >>  I think that most of these problems were fixed in latest
>>>>>    official SqueakVM. I guess they were integrated in VMMaker in
>>>>>    versions later than the one you used for Cog. Maybe you can
>>>>>    integrate them and create a new version?
>>>>>     >>
>>>>>     >>  I am not a VM expert so please if you can help us with this
>>>>>    tests it would be cool.
>>>>>     >>
>>>>>     >>  Thanks
>>>>>     >>
>>>>>     >>  Mariano
>>>>>     >>
>>>>>     >>
>>>>>     >
>>>>>
>>>>>
>>>
>>>
>>>
>>
>