[Vm-dev] [commit][3753] Do not use -O3 optimization, -O2 is safer and works well.

Andres Valloud avalloud at smalltalk.comcastbiz.net
Fri Dec 23 22:59:27 UTC 2016


Huh, I couldn't find GCC examples quickly.  However, here's something I 
actually ran into.  The manual for the IBM XL C 10.x compiler for AIX on 
POWER explicitly states -O3 causes comparatively common functions to 
stop setting errno.  Critically, however, this statement is *not* in 
every reference to the -O3 switch.  Rather, it was in a manual set that 
was much more detailed than these kinds of links:

http://www.ibm.com/support/knowledgecenter/en/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/proguide/levelthree.html

http://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/proguide/advancedoptimization.html

Notice how the above pages do not say anything about errno?  IIRC the 
man page didn't say anything about errno with -O3 either.  However, this 
one other page does:

http://www.ibm.com/support/knowledgecenter/en/SSGH2K_11.1.0/com.ibm.xlc111.aix.doc/compiler_ref/opt_optimize.html

And in there, there are other statements such as "with -O3 some 
instructions are placed in code paths where they always execute, while 
with -O2 that's not the case".  Looks like asking for undefined behavior 
in the general case to me.

This particular example was the breaking point for me, and from then on 
I started assuming -O3 is not behavior preserving, even if the manual 
doesn't immediately say there are problems.  Also, manuals are not 
necessarily complete.  That's not to say that -O3 isn't useful without 
the necessary verification steps.  And while I couldn't quickly find 
similar examples for other compilers, I do remember reading them.  From 
a VM engineering perspective however, and given all the gray area 
semi-undefined behavior things one is basically forced to do, my opinion 
is -O2 is the way to go.

Then again, even with -O2 one always has to read the manual and be aware 
of what's going on.  For instance, in the latest GCC with -O2, the 
optimization -fdelete-null-pointer-checks is enabled.  However, the 
manual also says:

========================
Assume that programs cannot safely dereference null pointers, and that 
no code or data element resides at address zero. This option enables 
simple constant folding optimizations at all optimization levels. In 
addition, other optimization passes in GCC use this flag to control 
global dataflow analyses that eliminate useless checks for null 
pointers; these assume that a memory access to address zero always 
results in a trap, so that if a pointer is checked after it has already 
been dereferenced, it cannot be null.

Note however that in some environments this assumption is not true.
========================

And yes, there are realistic places where NULL can be dereferenced. 
Again on the AIX versions I worked on, the memory at 0x0 was mapped. 
Due to some bugs at the time, HPS overwrote that memory.  And yet, the 
system continued to appear to work.  Thus, some null pointer segfaults 
for some platforms did not happen on AIX, even though the code and the 
wrong behavior were exactly the same.

Andres.

On 12/23/16 13:52 , Holger Freyther wrote:
>
>> On 23 Dec 2016, at 22:30, Andres Valloud <avalloud at smalltalk.comcastbiz.net> wrote:
>>
>> Compiler manuals usually state -O3 and higher do not preserve language semantics.  It's going to be hard to prove "bug" in the non-default presence of a switch known to potentially produce undefined behavior.
>
> Where did you get that from? I tried reading it up but don't see anything.
>
>
> Clang:
>
> -O3 Like -O2, except that it enables optimizations that take longer to perform or that may generate larger code (in an attempt to make the program run faster).
>
> => so longer compilation time, bigger binary
>
>
>
> gcc:
>
> -O3
> Optimize yet more.  -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-loop-vectorize, -ftree-loop-distribute-patterns, -fsplit-paths -ftree-slp-vectorize, -fvect-cost-model, -ftree-partial-pre, -fpeel-loops and -fipa-cp-clone options.
>
> => None says that the language standard is broken
>
>
>
> Intel ICC
>
> -O3
>
> Performs O2 optimizations and enables more aggressive loop transformations such as Fusion, Block-Unroll-and-Jam, and collapsing IF statements. The O3 optimizations may not cause higher performance unless loop and memory access transformations take place. The optimizations may slow down code in some cases compared to O2 optimizations. The O3 option is recommended for applications that have loops that heavily use floating-point calculations and process large data sets.
>
>
>
> So both Intel ICC an GCC start doing more auto vectorization with -O3. Nothing of that is breaking the language semantic. So in most cases if -O3 breaks things.. the code has undefined behavior...
>
>
> holger
>
>
>


More information about the Vm-dev mailing list