... darkrealms ...

Message 242,690 of 243,097
Waldek Hebisch to highcrew
Re: On Undefined Behavior
02 Jan 26 05:53:13
From: antispam@fricas.org

highcrew  wrote:
> Hello,
>
> While I consider myself reasonably good as C programmer, I still
> have difficulties in understanding undefined behavior.
> I wonder if anyone in this NG could help me.
>
> Let's take an example.  There's plenty here:
> https://en.cppreference.com/w/c/language/behavior.html
> So let's focus on https://godbolt.org/z/48bn19Tsb
>
> For the lazy, I report it here:
>
>   int table[4] = {0};
>   int exists_in_table(int v)
>   {
>       // return true in one of the first 4 iterations
>       // or UB due to out-of-bounds access
>       for (int i = 0; i <= 4; i++) {
>           if (table[i] == v) return 1;
 >       }
>       return 0;
>   }
>
> This is compiled (with no warning whatsoever) into:
>
>   exists_in_table:
>           mov     eax, 1
>           ret
>   table:
>           .zero   16
>
>
> Well, this is *obviously* wrong. And sure, so is the original code,
> but I find it hard to think that the compiler isn't able to notice it,
> given that it is even "exploiting" it to produce very efficient code.
>
> I understand the formalism: the resulting assembly is formally
> "correct", in that UB implies that anything can happen.
> Yet I can't think of any situation where the resulting assembly
> could be considered sensible.  The compiled function will
> basically return 1 for any input, and the final program will be
> buggy.

You do not get the formalism: compiler applies a lot transformations
which are supposed to be correct for programs obeying the C rules.
However, compiler does not understand the program.  It may notice
details that you missed, but it act essentialy blindly on
information it has.  And most transformations have only limited
info (storing all things that compiler infers would take a lot
of memory and searching all info would take a lot of time).

Code that you see is a result of many transformations, possibly
hundreds or more.  The result is a conseqence of all steps,
but it could be hard to isolate a single "silly" step.

> Wouldn't it be more sensible to have a compilation error, or
> at least a warning?  The compiler will be happy even with -Wall -Wextra
> -Werror.

This case looks reasonably easy: when compiling 'exists_in_table'
the compiler had declaration of 'table' and knows it size is 4.
Compiler generated its output probably after noticing that
the loop would produce out of bound reference.  So with some
extra effort it should be possible to generate a diagnostic.
But in general, instead of array you may have a pointer without
bound information.  Or upper bound may be variable.  As James
wrote, for such reasons C standard does not require a diagnostic.
Also, in the past gcc and clang did not generate diagnostics
in such situation.  gcc is very complex beast and adding
diagnostics now may require nontrivial effort.

BTW: I expect that eventually gcc will warn.  Ideologicaly,
using various string functions can overflow buffers in
similar ways.  In the past such buffers overflow just generated
some (possibly "working") code.  Now most such uses report
warnings.  In fact, this problem looks like an outlier.

> There's plenty of documentation, articles and presentations that
> explain how this can make very efficient code... but nothing
> will answer this question: do I really want to be efficiently
> wrong?

By using C you implicitely gave "yes" as an answer.

> I mean, yes I would find the problem, thanks to my 100% coverage
> unit testing, but couldn't the compiler give me a hint?

Since it gave no hint it probably could not.  In cases when it
can it warns (at least when you activate warnings).

> Could someone drive me into this reasoning? I know there is a lot of
> thinking behind it, yet everything seems to me very incorrect!
> I'm in deep cognitive dissonance here! :) Help!
>

--
                              Waldek Hebisch

--- SoupGate-Win32 v1.05
 * Origin: you cannot sedate... all the things you hate (1:229/2)
[ << oldest | < older | list | newer > | newest >> ]