[Solved] wildly different behaviour between O2 and O3 optimized FP code [closed]


From GCC manual:

-O3

Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload, -ftree-vectorize, -fvect-cost-model, -ftree-partial-pre and -fipa-cp-clone options.

No of these optimizations are particularly unsafe. The only optimization that I see can change the result is -ftree-vectorize. In some cases, using vector instructions can change the result compared to FPU instructions. For example, FPU by default uses 80-bit internal precision for doubles, while vector SIMD instructions use 64 bits. Also the implementation of some math functions (like sqrt) may be different.

You would get much better chance of getting help, if you posted your code, exact compiler flags and information about your hardware (which SIMD instructions does your CPU have).

You can also directly compare assembly code generated in these two cases.

PS. But in my experience, the most likely cause is undefined behavior in the program. Typically, uninitialized variable, division by zero, etc. Make sure you compile with high warnings level (-Wall -Wextra -Wpedantic), and use UB Sanitizer.

solved wildly different behaviour between O2 and O3 optimized FP code [closed]