Fröhling

Niels Fröhling about [X]HTML, JS and technical tidbits

  • Home
  • About
14. July 2008

Blinn’s division by 0xFF revisited

Ethatron in Algorithms

While I was implementing several blending functions like the ‘over’-operator, I again was confronted with divisions by constants at range-boundaries. Like 0xFF for ‘normalizing’ unsigned char multiplications, and the equivalent 0xFFFF and 0xFFFFFFFF for unsigned short and unsigned long respectively. Most of us are familiar with Blinn’s approximation, which alone gave him a star in the Silicon-Valey walk-of-fame.

As I dived deep into the blending code, I considered various precision-related improvements like delaying the divide by 0xFF untill the end:

1) result = dst * dst-alpha / 0xFF + src * src-alpha / 0xFF - (src * src-alpha / 0xFF) * dst-alpha / 0xFF
2) result = (dst * dst-alpha + src * src-alpha - src * src-alpha * dst-alpha / 0xFF) / 0xFF
3) result = (dst * dst-alpha * 0xFF + src * src-alpha * 0xFF - src * src-alpha * dst-alpha) / (0xFF * 0xFF)

Equation 3) has no loss of precision at all but requires a 24bit accumulator (holding 255^3), which I would accept if it would be possible to find a Blinnish equation dividing by 65025 (255^2). So I started do find a general approach to create Blinnish functions automatically.

It became a little adventure as it turned out that my idea of Blinn’s function was actually wrong. First, I assumed that the function is an equivalent of truncating division:

1) fb(a) = a / 255

Which is wrong or better always wrong cited and used, actually Blinn’s function as mentioned on all sources I know of is rounding division:

1) fb(a) = (a + 127) / 255
2) fb(a) = (a + 128 + (a + 128) / 256) / 256
3) fb(a) = (acc = (a + 128) + acc / 256) / 256

I suppose in most contexts it’s irrelevant, if the expectancy is just to have a sort of approximation, and if you are happy with copy+paste=works_sufficiently [still it makes me wonder, how often a naive fallback C-implementation with truncation breaks determinism].

After then really trying to understand the function I came up with the following equivalence:

1) fb(a) = (a + (ra - 1)) / (ca - 1)
2) fb(a) = (a + ra + (a + ra) / ca) / ca
3) fb(a) = (acc = (a + ra) + acc / ca) / ca

Obviously it’s only exciting (as with the original function) if ca = 2^ea and ra {1, ca - 2}.

4) fb(a) = (acc = (a + ra) + acc / 2^ea) / 2^ea
5) fb(a) = (acc = (a + ra) + acc >> ea) >>ea

Well and it is indeed exciting, as this general expression allows now to choose any arbitrary ra and ea receiving results according the first equation, as long as the divident is smaller-equal than divisor^2. Which means Blinn’s function works with any division by 0×1, 0×3, 0×7, 0×15, … 0×7FFFFFFF, in combination with any offset (which doesn’t even need to be related to rounding). Cool.

Still (coming back to my initial quest, division by 65025) I struggled to find relaxation of the function for quadric constants:

1) fc(a) = (a + (ra - 1)) / (ca - 1)^2
2) fc(a) = (a + (ra - 1)) / (ca^2 - ca + 2 - 1)

The problematic term ca^2 - ca + 2 doesn’t suffice the crutial equivalence 2^ea, so even if there would be an exact equivalent function only involving shifts, it wouldn’t be just one shift. Well and that’s the moment you realize, that the function of Blinn is a special case of the division-approximation through geometric series, fitting exactly the slot that is exact, and not only an approximation. So I just did a brute-force search for a function with the least ulps for round-near and truncation, which are these:

1) fc(a) = (a + 0×7F00UL) / 0xFE01UL
2) fc(a) = (acc = (a + 0×0010) + acc >> 15 + acc >> 7 + 0×8000) >>16

1) fc(a) = (a + 0×0000UL) / 0xFE01UL
2) fc(a) = (acc = (a + 0×0000) + acc >> 14 + acc >> 7 + 0×0000) >>16

The first function has around 900 inexact results (out of 2.7 million distinct multiplications, 255^3 / !3), the second has around 850 inexact results. Well that is relative good, it’s 1 out of 3000, it’s always one too much, and it only occurs on a = a0 * a1 * a2 with a0 + a1 + a2 >= 0×100.

Kein Kommentar

Tags

  • Algorithms Assembler bash C++ Homemade Javascript MMX Optimization Rants TYPO3

Categories

    • Applications
      • TYPO3
    • TidBits
      • Algorithms
      • Approximations
      • Equivalence
      • Fixes
      • Javascript
      • Optimizations
      • Scripts
    • Uncategorized

Meta

    • Log in
    • Entries RSS
    • Comments RSS
    • WordPress.org

Recent Posts

    • jQuery’s animate is short-thought
    • movntq alignment
    • OP-equivalent series (pavgd)
    • OP-equivalent series (psubq)
    • OP-equivalent series (paddq)

 

  • September 2010
    M T W T F S S
    « Aug    
     12345
    6789101112
    13141516171819
    20212223242526
    27282930  

Archives

    • August 2010
    • April 2010
    • February 2010
    • January 2010
    • July 2009
    • February 2009
    • December 2008
    • November 2008
    • August 2008
    • July 2008
    • May 2008
© 2010 Wired by Fröhling
Design von Dezzain Studio
Übersetzt von Htwo
Nature Pictures | Bamboo Blinds