[Solved] Does (p+x)-x always result in p for pointer p and integer x in gcc linux x86-64 C++

Question

Yes, for gcc5.x and later specifically, that specific expression is optimized very early to just p, even with optimization disabled, regardless of any possible runtime UB.

This happens even with a static array and compile-time constant size. gcc -fsanitize=undefined doesn’t insert any instrumentation to look for it either. Also no warnings at -Wall -Wextra -Wpedantic

int *add(int *p, long long x) {
    return (p+x) - x;
}

int *visible_UB(void) {
    static int arr[100];
    return (arr+200) - 200;
}

Using gcc -dump-tree-original to dump its internal representation of program logic before any optimization passes shows that this optimization happened even before that in gcc5.x and newer. (And happens even at -O0).

;; Function int* add(int*, long long int) (null)
;; enabled by -tree-original

return <retval> = p;


;; Function int* visible_UB() (null)
;; enabled by -tree-original
{
  static int arr[100];

    static int arr[100];
  return <retval> = (int *) &arr;
}

That’s from the Godbolt compiler explorer with gcc8.3 with -O0.

The x86-64 asm output is just:

; g++8.3 -O0 
add(int*, long long):
    mov     QWORD PTR [rsp-8], rdi
    mov     QWORD PTR [rsp-16], rsi    # spill args
    mov     rax, QWORD PTR [rsp-8]     # reload only the pointer
    ret
visible_UB():
    mov     eax, OFFSET FLAT:_ZZ10visible_UBvE3arr
    ret

-O3 output is of course just mov rax, rdi

gcc4.9 and earlier only do this optimization in a later pass, and not at -O0: the tree dump still includes the subtract, and the x86-64 asm is

# g++4.9.4 -O0
add(int*, long long):
    mov     QWORD PTR [rsp-8], rdi
    mov     QWORD PTR [rsp-16], rsi
    mov     rax, QWORD PTR [rsp-16]
    lea     rdx, [0+rax*4]            # RDX = x*4 = x*sizeof(int)
    mov     rax, QWORD PTR [rsp-16]
    sal     rax, 2
    neg     rax                       # RAX = -(x*4)
    add     rdx, rax                  # RDX = x*4 + (-(x*4)) = 0
    mov     rax, QWORD PTR [rsp-8]
    add     rax, rdx                  # p += x + (-x)
    ret

visible_UB():       # but constants still optimize away at -O0
    mov     eax, OFFSET FLAT:_ZZ10visible_UBvE3arr
    ret

This does line up with the -fdump-tree-original output:

return <retval> = p + ((sizetype) ((long unsigned int) x * 4) + -(sizetype) ((long unsigned int) x * 4));

If x*4 overflows, you’ll still get the right answer. In practice I can’t think of a way to write a function that would lead to the UB causing an observable change in behaviour.

As part of a larger function, a compiler would be allowed to infer some range info, like that p[x] is part of the same object as p[0], so reading memory in between / out that far is allowed and won’t segfault. e.g. allowing auto-vectorization of a search loop.

But I doubt that gcc even looks for that, let alone takes advantage of it.

(Note that your question title was specific to gcc targeting x86-64 on Linux, not about whether similar things are safe in gcc, e.g. if done in separate statements. I mean yes probably safe in practice, but won’t be optimized away almost immediately after parsing. And definitely not about C++ in general.)

I highly recommend not doing this. Use uintptr_t to hold pointer-like values that aren’t actual valid pointers. like you’re doing in the updates to your answer on C++ gcc extension for non-zero-based array pointer allocation?.

Accepted Answer