[Vm-dev] [Pharo-dev] Debugging GCC code generation

Eliot Miranda eliot.miranda at gmail.com
Wed Dec 11 20:03:26 UTC 2019


Hi Pablo,

On Wed, Dec 11, 2019 at 11:33 AM tesonep at gmail.com <tesonep at gmail.com>
wrote:

> Hi Nicolas,
>   thanks for the comment, you are right the problem is the bad
> original code. But my opinion is that it just is not reporting the
> situation correctly, generating a warning or non-optimizing the code
> looks more like a expected behavior. Because as I have said, using a
> constant as index in the last statement generates a meaningful warning
> and the non-optimizated version of the function.
>
> And again as you said, the only thing to learn about all this is that
> we should not write crappy code.
>

Hang on.  "crappy code" is too strong.  Back in tre ANSI days it was not
undefined behavior, and in fact I think this only because undefined
behavior because C compilers were faced with lots of code where variables
were typed int but were being used on 64-bit machines.  Had the C world
used long as its default type then there would have been no need to make
casting the address of a long into the address of an int undefined
behavior.  So this isn't crappy code (if you think of what it means in
memory on a 64-bit little endnan machine it is perfectly sensible).  It
/is/ code that is inappropriate given C99.

So yes, we have to bow to the C gods and generate C99 compliant code, but
what we were doing here wasn't crappy, it was merely code that became
undefined behavior so that C compilers could generate more efficient code
for 64-bit systems in the presence of code that contains lots of integer
variables declared as int.  So let's just say that we have to generate C99
compliant code.  You'll notice that the Smalltalk code in the original does
exactly what you suggest Pablo.

fetchLong32: fieldIndex ofFloatObject: oop
"index by word size, and return a pointer as long as the word size"
| bits |
(self isImmediateFloat: oop) ifFalse:
[^self fetchLong32: fieldIndex ofObject: oop].
bits := self smallFloatBitsOf: oop.
^self
cCode: [(self cCoerceSimple: (self addressOf: bits) to: #'int *') at:
fieldIndex]
inSmalltalk:
[self flag: #endian.
fieldIndex = 0
ifTrue: [bits bitAnd: 16rFFFFFFFF]
ifFalse: [bits >> 32]]


> On Wed, Dec 11, 2019 at 7:11 PM Nicolas Cellier
> <nicolas.cellier.aka.nice at gmail.com> wrote:
> >
> > Of course, when I say "your" code, it's the code you have shown, and
> probably "our" (VMMaker) code ;)
> >
> > Le mer. 11 déc. 2019 à 19:05, Nicolas Cellier <
> nicolas.cellier.aka.nice at gmail.com> a écrit :
> >>
> >> Hi Pablo (again),
> >> no, not a bug.
> >>
> >> The problem is in the source code. The compiler has the right to
> presume that your code is exempt of UB, because you cannot depend on UB
> (obviously).
> >> So it can eliminate all code which corresponds to UB.
> >>
> >> The compiler has the right to assume that a pointer to an int cannot
> point to a long (UB).
> >> So modifying a long cannot have any sort of impact on the content of
> the int pointer.
> >> So the compiler can decouple both path return int content and assign
> long.
> >> But assigning the long has no effect, so the code can be suppressed
> altogether.
> >>
> >> Le mer. 11 déc. 2019 à 18:54, tesonep at gmail.com <tesonep at gmail.com> a
> écrit :
> >>>
> >>> Hi Aliaksei,
> >>>       to me it looks like a bug of GCC optimization. Basically, it is
> >>> assuming that the x variable is used but never read or its value is
> >>> never used. Also it assumes the same of the i variable, as we are only
> >>> accessing indirectly to the memory where it locates (the code is even
> >>> assuming that the variable exists, but it can be optimize out as in
> >>> this scenario). Even though, the original C code is valid C code, we
> >>> are not helping the compiler writing code like that. So I have
> >>> rewritten the code in a way that does not use indirect memory access
> >>> to the stack space.
> >>>
> >>> One thing more that makes me think is a bug, if you use an int
> >>> constant as the index and not a parameter, the error does not occur
> >>> (the code is not badly optimized) and there is a warning about the
> >>> not-so-great access to the stack.
> >>>
> >>> On Wed, Dec 11, 2019 at 6:01 PM Aliaksei Syrel <alex.syrel at gmail.com>
> wrote:
> >>> >
> >>> > Hi Pablo,
> >>> >
> >>> > Wow! Thank you for the detective story :)
> >>> >
> >>> > Do I understand correctly that the original code causes undefined
> behavior and therefore can be changed (or even removed) by the compiler?
> >>> > (because it returns something that is referencing memory on the
> stack)
> >>> >
> >>> > Please keep posting similar things in future! It is very educative :)
> >>> >
> >>> > Cheers,
> >>> > Alex
> >>> >
> >>> >
> >>> > On Wed, 11 Dec 2019 at 17:35, tesonep at gmail.com <tesonep at gmail.com>
> wrote:
> >>> >>
> >>> >> Hi,
> >>> >>     this mail is related to Pharo because it is knowledge I found
> >>> >> debugging the build of the VM, but the rest is to document it and
> >>> >> perhaps someone will found it interesting (also I couldn't find it
> >>> >> easily using Google). Sorry for the long mail!
> >>> >>
> >>> >> The problem
> >>> >> ==========
> >>> >>
> >>> >> The following code does not produce good code in 8.3 when using
> optimizations:
> >>> >>
> >>> >> long __attribute__ ((noinline)) myFunc(long i, int index){
> >>> >>    long v;
> >>> >>    long x = i >> 3;
> >>> >>
> >>> >>    v = x;
> >>> >>    return ((int*)(&v))[index];
> >>> >> }
> >>> >>
> >>> >> #include <stdio.h>
> >>> >>
> >>> >> int main(){
> >>> >>
> >>> >>     long i;
> >>> >>     int x;
> >>> >>
> >>> >>     scanf("%ld", &i);
> >>> >>     scanf("%d", &x);
> >>> >>
> >>> >>     printf("%ld",myFunc(i,x));
> >>> >> }
> >>> >>
> >>> >> Basically, with -02, it generates the following code:
> >>> >>
> >>> >> myFunc:
> >>> >>      movslq %esi, %rsi
> >>> >>      movslq -8(%rsp,%rsi,4), %rax
> >>> >>      ret
> >>> >>
> >>> >> And with -01 it generates the following code:
> >>> >>
> >>> >> myFunc:
> >>> >>      sarq $3, %rdi
> >>> >>      movq %rdi, -8(%rsp)
> >>> >>      movslq %esi, %rsi
> >>> >>      movslq -8(%rsp,%rsi,4), %rax
> >>> >>      ret
> >>> >>
> >>> >> As you can see, the more optimized version is losing the bit shift
> (or
> >>> >> any other operation).
> >>> >> To detect the problem it was not easy, basically, I have used the
> >>> >> Pharo Tests to detect the failing function and then to understand
> the
> >>> >> generation of code in GCC, we need to debug its generation.
> >>> >>
> >>> >> The first snippet is generated using:
> >>> >>
> >>> >> gcc -O2 prb.c -S -o prb.s -Wall
> >>> >>
> >>> >> and the second using:
> >>> >>
> >>> >> gcc -O1 prb.c -S -o prb.s -Wall
> >>> >>
> >>> >> The GCC pipeline has different steps, I have centered my self in the
> >>> >> tree optimizations.
> >>> >> GCC dumps each of the intermediate states of the compilation,
> working
> >>> >> with C like trees. For example to get the optimized version of the
> >>> >> program, before generating the Assembler we have to do:
> >>> >>
> >>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-optimized=/dev/stdout
> >>> >>
> >>> >> This will generate:
> >>> >>
> >>> >> __attribute__((noinline))
> >>> >> myFunc (long int i, int index)
> >>> >> {
> >>> >>   long int v;
> >>> >>   long unsigned int _1;
> >>> >>   long unsigned int _2;
> >>> >>   int * _3;
> >>> >>   int _4;
> >>> >>   long int _8;
> >>> >>
> >>> >>   <bb 2> [local count: 1073741825]:
> >>> >>   _1 = (long unsigned int) index_7(D);
> >>> >>   _2 = _1 * 4;
> >>> >>   _3 = &v + _2;
> >>> >>   _4 = *_3;
> >>> >>   _8 = (long int) _4;
> >>> >>   v ={v} {CLOBBER};
> >>> >>   return _8;
> >>> >>
> >>> >> }
> >>> >>
> >>> >> This is the optimized SSA (static single assign) version of the
> >>> >> function (https://gcc.gnu.org/onlinedocs/gccint/SSA.html)
> >>> >> As you can see in this version the code is already optimized, and
> our
> >>> >> operations not correctly generated. We expect to see something like:
> >>> >>
> >>> >> __attribute__((noinline))
> >>> >> myFunc (long int i, int index)
> >>> >> {
> >>> >>   long int x;
> >>> >>   long int v;
> >>> >>   long unsigned int _1;
> >>> >>   long unsigned int _2;
> >>> >>   int * _3;
> >>> >>   int _4;
> >>> >>   long int _10;
> >>> >>
> >>> >>   <bb 2> [local count: 1073741825]:
> >>> >>   x_6 = i_5(D) >> 3;
> >>> >>   v = x_6;
> >>> >>   _1 = (long unsigned int) index_9(D);
> >>> >>   _2 = _1 * 4;
> >>> >>   _3 = &v + _2;
> >>> >>   _4 = *_3;
> >>> >>   _10 = (long int) _4;
> >>> >>   v ={v} {CLOBBER};
> >>> >>   return _10;
> >>> >>
> >>> >> }
> >>> >>
> >>> >> In the first SSA version, we are lacking the shift operation:
> >>> >>
> >>> >>   x_6 = i_5(D) >> 3;
> >>> >>   v = x_6;
> >>> >>
> >>> >> So, we need to see in which of the optimizations and transformations
> >>> >> our code is lost:
> >>> >> To see all the steps we should execute:
> >>> >>
> >>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-all
> >>> >>
> >>> >> This will generate a lot, really a lot, of files in the same
> directory.
> >>> >> They have the name: prb.c.xxx.yyyy
> >>> >> Where xxx is the number of the step, and yyyy is the name of the
> step.
> >>> >> Each of the files contains the result of applying the changes, so
> what
> >>> >> I have done is looking in binary search where my code was lost.
> >>> >>
> >>> >> I have found that the problem was in the first dead code elimination
> >>> >> step (prb.c.041t.cddce1)
> >>> >> GCC does not like the crap code that we are writing, as we are
> copying
> >>> >> a stack variable and then accessing directly to the memory:
> >>> >>
> >>> >> v = x;
> >>> >> return ((int*)(&v))[index];
> >>> >>
> >>> >> So, basically, it was assuming that we were only using v and not x,
> so
> >>> >> all the code modifying x is described (this optimization is called
> >>> >> "dead store code deletion"). Basically tries to remove all the code
> >>> >> that is affecting stack (temporary) variables that are never used.
> >>> >>
> >>> >> I am not sure if this is a bug in GCC, so I will report it. But I
> have
> >>> >> fixed the problem writing the code in a proper way.
> >>> >>
> >>> >> Sorry again for the long mail, I wanted to store the knowledge and
> >>> >> propagate it before I eventually will flush it. I like these things,
> >>> >> but I am not debugging the GCC optimization pipeline all the days.
> >>> >>
> >>> >> --
> >>> >> Pablo Tesone.
> >>> >> tesonep at gmail.com
> >>> >>
> >>>
> >>>
> >>> --
> >>> Pablo Tesone.
> >>> tesonep at gmail.com
> >>>
>
>
> --
> Pablo Tesone.
> tesonep at gmail.com
>
>

-- 
_,,,^..^,,,_
best, Eliot
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.squeakfoundation.org/pipermail/vm-dev/attachments/20191211/2f7ceb9f/attachment-0001.html>


More information about the Vm-dev mailing list