[Solved] Comparing 2 different scenarios on 2 different architectures when finding the max element of an array

Question

I think you’re really trying to ask about branchless vs. branching ways to do m = max(m, array[i]). C compilers will already compile the if() version to branchless code (using cmov) depending on optimization settings. It can even auto-vectorize to a packed compare or packed-max function.

Your 0.5 * abs() version is obviously terrible (a lot slower than a conditional-move) because it converts to double and back. rather than dividing by two with a right shift.

See the asm on the Godbolt Compiler Explorer:

// auto-vectorizes to PMAXSD, or without SSE4.1, to pcmpgt / pand/por emulation of the same
int maxarray_if(int arr[], int n) {
  int result = arr[0];
  for (int i=0 ; i<n; ++i) {
    int tmp = arr[i];
    if (result < tmp)
      result = tmp;
  }
  return result;
}

gcc 5.3 -O3 -march=haswell -mno-avx auto-vectorized inner loop:

.L13:
    add     eax, 1
    pmaxsd  xmm0, XMMWORD PTR [rdx]
    add     rdx, 16
    cmp     r8d, eax
    ja      .L13

Vs. the FP version:

    ... whole bunch of integer crap
    cvtsi2sd        xmm0, eax
    mulsd   xmm0, xmm1
    cvttsd2si       eax, xmm0

So the FP version is obviously complete garbage.

You’ll get similar results for any target architecture. The conversion to/from double isn’t going to go away. gcc keeps it even with -ffast-math.

Accepted Answer