Head Scratching C(++) Bug

What's wrong with the following loop? I stared at it for quite some time before finding the bug in my code, all too ready to exclaim "it's a compiler bug":

/* n is a parameter of type size_t */
/* table is a pointer to int such that table[0] .. table[n-1] are all valid indices */
for(size_t i = 0; i < n; i++) {
    if(table[i] == needle) return i;
}

Instead of terminating after at most n iterations, the loop continued indefinitely, eventually returning some nonsense value where another memory location matched the needle value.

Give up? The problem's nothing to do with the loop, but with what came (didn't come) after it:

size_t find_in_table(int *table, size_t n, int needle) {
    for(size_t i = 0; i < n; i++) {
        if(table[i] == needle) return i;
    }
}

There's no return statement after the loop. If the loop terminates, it reaches the end of a non-void function without returning a value. This is undefined behavior. Therefore, the compiler-author logic goes, it must be the case that the loop can never terminate due to the loop condition; it must only terminate by the return statement inside the loop. (Stated in another way, the "undefined behavior" is to continue to run the loop.)

Without the final return (e.g., return (size_t)-1;), the comparison i < n is completely eliminated at typical optimization levels. Even weirder, you can get the program to print that (say) i==5, n==4, and that i<n is true!

I think gcc has an enabled-by-default warning for this, but either it's disabled by my project's CFLAGS or I overlooked it in a torrent of other compiler output.

Without looking beyond the loop or if you're thinking with the mindset that a missing return statement's undefined behavior is "returns an arbitrary value", you don't think about things like this. This is a very easy mistake to make, since it is also a pretty good model of what a modern compiler will do at "for better debugging" optimization levels, as well as what the "C is portable assembler" moto asks us to falsely believe. No, in reality, C(++) is a slippery language. While it's not a bet you could settle in the negative, I bet there's at least one security bug out there due to this undefined behavior and the way that gcc transforms a finite loop into an infinite one.

My recommendation?

Pay attention to compiler diagnostics. Enable more of them. Understand what the diagnostic is saying, then make an appropriate response. Get your editor's help to parse error lists and make sure you don't miss any.

Entry first conceived on 6 November 2020, 1:54 UTC, last modified on 1 January 2021, 22:49 UTC
Website Copyright © 2004-2024 Jeff Epler