You never stop learning, I just explored how I can force a compiler into producing the sbb op on x86. It’s quite difficult actually, most of them produce for example cmp + seta + neg for the term -(a > b). In fact the only deterministic behaviour is with a very special ternery term (a > b ? -1 : 0).
So the function I tried to optimize for the optimzer to be optimized was:
// max(l, t) unsigned char L = (l - t); t += (L & (L < l ? -1 : 0));
The produced code, even though almost cool, was terrifying at the same time:
0002c 8a 54 24 16 mov dl, ...
00030 8a 44 24 04 mov al, ...
00034 8a ca mov cl, dl
00036 2a c8 sub cl, al
00038 3a ca cmp cl, dl
0003a 1a d2 sbb dl, dl
00047 81 e2 ff 00 00 00 and edx, 255
0004d 22 d1 and dl, cl
0004f 02 c2 add al, dl
I marked the minor redundancy in rose, and the real offending ones red. And the worst is, I couldn’t force him to quit then. In this case it was the VC7, maybe you don’t want to expect too much from that one, I though and threw the ball to some others.
The result was VC8, VC9 and the gcc all at least dropped the and, but none, not one removed the cmp + mov (the cmp does not affect the borrow-bit, which is successively used by the sbb, thus is a nop, because it’s the one op using the backupped variable, the mov is also a nop).
The correct minimal code would be:
0002c 8a 54 24 16 mov cl, ...
00030 8a 44 24 04 mov al, ...
00036 2a c8 sub cl, al
0003a 1a d2 sbb dl, dl
0004d 22 d1 and dl, cl
0004f 02 c2 add al, dl
The morale: compilers still don’t do flag-aware peephole-optimization. That’s too bad after the computers became fast enough to profit from almost 40 years of compiler-development …