I think you’re really trying to ask about branchless vs. branching ways to do m = max(m, array[i])
. C compilers will already compile the if()
version to branchless code (using cmov
) depending on optimization settings. It can even auto-vectorize to a packed compare or packed-max function.
Your 0.5 * abs()
version is obviously terrible (a lot slower than a conditional-move) because it converts to double
and back. rather than dividing by two with a right shift.
See the asm on the Godbolt Compiler Explorer:
// auto-vectorizes to PMAXSD, or without SSE4.1, to pcmpgt / pand/por emulation of the same
int maxarray_if(int arr[], int n) {
int result = arr[0];
for (int i=0 ; i<n; ++i) {
int tmp = arr[i];
if (result < tmp)
result = tmp;
}
return result;
}
gcc 5.3 -O3 -march=haswell -mno-avx
auto-vectorized inner loop:
.L13:
add eax, 1
pmaxsd xmm0, XMMWORD PTR [rdx]
add rdx, 16
cmp r8d, eax
ja .L13
Vs. the FP version:
... whole bunch of integer crap
cvtsi2sd xmm0, eax
mulsd xmm0, xmm1
cvttsd2si eax, xmm0
So the FP version is obviously complete garbage.
You’ll get similar results for any target architecture. The conversion to/from double
isn’t going to go away. gcc keeps it even with -ffast-math
.
2
solved Comparing 2 different scenarios on 2 different architectures when finding the max element of an array