[Solved] if-else if ladder and Compiler Optimization


A large portion of this question will depend on what A, B and C really are (and the compiler will optimise it, as shown below). Simple types, definitely not worth worrying about. If they are some kind of “big number math” objects, or some complicated data type that needs 1000 instructions for each “is this true or not”, then there will be a big difference if the compiler decides to make different code.

As always when it comes to performance: Measure in your own code, use profiling to detect where the code spends MOST of the time, and then measure with changes to that code. Repeat until it runs fast enough [whatever that is] and/or your manager tells you to stop fiddling with the code. Typically, however, unless it’s REALLY a high traffic area of the code, it will make little difference to re-arrange the conditions in an if-statement, it is the overall algorithm that makes most impact in the general case.

If we assume A, B and C are simple types, such as int, we can write some code to investigate:

extern int A, B, C;
extern void UpdateData();
extern void ResetData();

void func1()
{
    if ( A && B && C ) {
        UpdateData();
    } else if ( A && B ){
        ResetData();
    }
}


void func2()
{
    if ( A && B) {
        if (C) {
            UpdateData();
        } else {
            ResetData();
        }
    }
}

gcc 4.8.2 given this, with -O1 produces this code:

_Z5func1v:
    cmpl    $0, A(%rip)
    je  .L6
    cmpl    $0, B(%rip)
    je  .L6
    subq    $8, %rsp
    cmpl    $0, C(%rip)
    je  .L3
    call    _Z10UpdateDatav
    jmp .L1
.L3:
    call    _Z9ResetDatav
.L1:
    addq    $8, %rsp
.L6:
    rep ret

_Z5func2v:
.LFB1:
    cmpl    $0, A(%rip)
    je  .L12
    cmpl    $0, B(%rip)
    je  .L12
    subq    $8, %rsp
    cmpl    $0, C(%rip)
    je  .L9
    call    _Z10UpdateDatav
    jmp .L7
.L9:
    call    _Z9ResetDatav
.L7:
    addq    $8, %rsp
.L12:
    rep ret

In other words: No difference at all

Using clang++ 3.7 (as of about 3 weeks ago) with -O1 gives this:

_Z5func1v:                              # @_Z5func1v
    cmpl    $0, A(%rip)
    setne   %cl
    cmpl    $0, B(%rip)
    setne   %al
    andb    %cl, %al
    movzbl  %al, %ecx
    cmpl    $1, %ecx
    jne .LBB0_2
    movl    C(%rip), %ecx
    testl   %ecx, %ecx
    je  .LBB0_2
    jmp _Z10UpdateDatav         # TAILCALL
.LBB0_2:                                # %if.else
    testb   %al, %al
    je  .LBB0_3
    jmp _Z9ResetDatav           # TAILCALL
.LBB0_3:                                # %if.end8
    retq

_Z5func2v:                              # @_Z5func2v
    cmpl    $0, A(%rip)
    je  .LBB1_4
    movl    B(%rip), %eax
    testl   %eax, %eax
    je  .LBB1_4
    cmpl    $0, C(%rip)
    je  .LBB1_3
    jmp _Z10UpdateDatav         # TAILCALL
.LBB1_4:                                # %if.end4
    retq
.LBB1_3:                                # %if.else
    jmp _Z9ResetDatav           # TAILCALL
.Ltmp1:

The chaining of and in the func1 of clang MAY be of benefit, but it’s probably such a small difference that you should concentrate on what makes more sense from a logical perspective of the code.

In summary: Not worth it

Higher optimisation in g++ makes it do the same tailcall optimisation that clang does, otherwise no difference.

However, if we make A, B and C into external functions, which the compiler can’t “understand”, then we get a difference:

_Z5func1v:                              # @_Z5func1v
    pushq   %rax
.Ltmp0:
    .cfi_def_cfa_offset 16
    callq   _Z1Av
    testl   %eax, %eax
    je  .LBB0_3

    callq   _Z1Bv
    testl   %eax, %eax
    je  .LBB0_3

    callq   _Z1Cv
    testl   %eax, %eax
    je  .LBB0_3

    popq    %rax
    jmp _Z10UpdateDatav         # TAILCALL
.LBB0_3:                                # %if.else
    callq   _Z1Av
    testl   %eax, %eax
    je  .LBB0_5

    callq   _Z1Bv
    testl   %eax, %eax
    je  .LBB0_5

    popq    %rax
    jmp _Z9ResetDatav           # TAILCALL
.LBB0_5:                                # %if.end12
    popq    %rax
    retq

_Z5func2v:                              # @_Z5func2v
    pushq   %rax
.Ltmp2:
    .cfi_def_cfa_offset 16
    callq   _Z1Av
    testl   %eax, %eax
    je  .LBB1_4

    callq   _Z1Bv
    testl   %eax, %eax
    je  .LBB1_4

    callq   _Z1Cv
    testl   %eax, %eax
    je  .LBB1_3

    popq    %rax
    jmp _Z10UpdateDatav         # TAILCALL
.LBB1_4:                                # %if.end6
    popq    %rax
    retq
.LBB1_3:                                # %if.else
    popq    %rax
    jmp _Z9ResetDatav           # TAILCALL

Here we DO see the difference between func1 and func2, where func1 will call A and B twice – since the compiler can’t assume that calling those functions ONCE will do the same thing as calling twice. [Consider that the functions A and B may be reading data from a file, calling rand, or whatever, the result of NOT calling that function may be that the program behaves differently.

(In this case I only posted clang code, but g++ produces code that has the same outcome, but slightly different ordering of the different lumps of code)

solved if-else if ladder and Compiler Optimization