[Vm-dev] Removing UB to make SmallFloat handling gcc 4.9.2 -O2 compatible [WAS: first 64 bits windows VM Running]

Ben Coman btc at openinworld.com
Sat Jul 23 11:20:58 UTC 2016


On Wed, Jul 20, 2016 at 5:24 AM, Nicolas Cellier
<nicolas.cellier.aka.nice at gmail.com> wrote:
> I've also replaced pointer aliasing
>     doubleResult = (double *)( & rawBitsInteger )[0];
> by memcpy:
>     memcpy( &doubleResult  , &rawBitsInteger , sizeof(doubleResult) );
>
> Why is memcpy less evil than pointer aliasing?
> With pointer aliasing any other write into a long integer could modify doubleResult.
> This completely defeat optimization - the holy grail of C people, they can't bother that FORTRAN compilers are faster than theirs ;). With great wisdom they declared this construct as undefined behavior, giving priority to optimization rather than backward compatibility or programmers' intentions...
> memcpy is less evil because it's localized (one shot).


On Thu, Jul 21, 2016 at 8:41 AM, Andres Valloud
<avalloud at smalltalk.comcastbiz.net> wrote:
> This is a good start for a default debugging procedure.
>
> The GCC manual warns of potential issues with memcpy(), and recommends using unions instead.  The union method worked on every HPS compilation environment, across multiple optimization levels.  Whenever I checked, the resulting assembly code was optimal.
>
> https://gcc.gnu.org/bugs/#casting_and_optimization
>
> http://mail-index.netbsd.org/tech-kern/2003/08/11/0001.html
>
> Andres.

I found this intriguing and wanted to understand this a bit more, so I
went hunting.

It looks similar to the advice in the second example of [1]... "The
noncompliant code example has an array of two values of type short
treated as an integer and assigned to by a cast to an integer value.
The resulting values are indeterminate. The compliant solution uses a
union type that includes a type compatible with the effective type of
the object."

[1] https://www.securecoding.cert.org/confluence/display/c/EXP39-C.+Do+not+access+a+variable+through+a+pointer+of+an+incompatible+type


I learnt a new term "Type Punning"[2]... "A form of pointer aliasing
where two pointers and refer to the same location in memory but
represent that location as different types. The compiler will treat
both 'puns' as unrelated pointers. Type punning has the potential to
cause dependency problems for any data accessed through both
pointers."

A macro for type casting is provided...
    #define UNION_CAST(x, destType) \
                   (((union {__typeof__(x) a; destType b;})x).b)

but maybe in most cases it would be better to have named unions.

[2] http://www.cocoawithlove.com/2008/04/using-pointers-to-recast-in-c-is-bad.html


A very information article was [3], including some discussion of
effects on optimisation...
It said... "Strictly speaking, reading a member of a union different
from the one written to is undefined in ANSI/ISO C99 [...] However, it
is an extremely common idiom and is well-supported by all major
compilers. As a practical matter, reading and writing to any member of
a union, in any order, is acceptable practice!!"

[3] http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html


Referring to C11, [4] says the GCC documentation says “type-punning is
allowed, provided the memory is accessed through the union type” and
provides this example that
the function f() has defined behaviour...
    union U { int x; float y; };
    int f() { union U t; t.y = 3.0; return t.x; }

but the function g() here exhibits undefined behaviour...
   int g() { union U t; int *p = &t.x; t.y = 3.0; return *p; }

(I didn't read past page 3...)
[4] http://robbertkrebbers.nl/research/articles/aliasing.pdf


I also found some negative opinions...
"There may be unused holes in structures. Suspect unions used for type
cheating. Specifically, a value should not be stored as one type and
retrieved as another"
[5] https://www.doc.ic.ac.uk/lab/cplus/cstyle.html


And...
"The riskiest form of packing is to use unions. If you know that
certain fields in your structure are never used in combination with
certain other fields, consider using a union to make them share
storage. But be extra careful and verify your work with regression
testing, because if your lifetime analysis is even slightly wrong you
will get bugs ranging from crashes to (much worse) subtle data
corruption."
[6]http://www.catb.org/esr/structure-packing/

And...
"The problem with a structure inside a union, is that the compiler is
allowed to add padding bytes between members of a structure (or
class), except bit fields....[instead] suggest creating functions to
combine and extract pixels from a 32-bit quantity. You can declare it
inline too. This is a lot more reliable than a struct inside a union,
including one with bit fields.
[7] http://stackoverflow.com/questions/2876832/using-unions-to-simplify-casts


But on the whole unions seem a good contender for type casting.

btw, I liked this joke about coding rules... "Somebody once told me
that in basketball you can’t hold the ball and run. I got a basketball
and tried it and it worked just fine. He obviously didn’t understand
basketball."
[8] http://blog.regehr.org/archives/213

cheers -ben


More information about the Vm-dev mailing list