A few years ago, I posted on my personal web page a number of programming challenges for my friends. This week, I present one of the old challenges: computing luma from RGB triplets using integer arithmetic only.

You have to compute the function

unsigned char SlowBrightness(unsigned char r,
unsigned char g,
unsigned char b)
{
return 0.289f * r +
0.587f * g +
0.114f * b + 1e-5f;
}

but the processor you’re using is not capable of floating point number arithmetic. In other words, it uses a all-software library to emulate them and it’s really slow.

Re-implement this function using only integer operations.

Efficiency is not the only goal, you must have an exact solution as well: your integer-only implementation must give the same answer as the float version for all RGB triplets.

Bonus. Use only +, *, >> and minimize the size of the integer constants involved.

I’ll take a stab at it, but first I just want to verify something…

In SlowBrightness() as shown above, even if the sum of the products, is say, 240.99, the function will return 240, since float-to-integer simply chops off the decimal part. I assume that in an ideal world (e.g. shipping library, etc.), you’d actually want to return 241, correct? But for the challenge we should emulate SlowBrightness() as coded (i.e. return 240), not what I would think you would want in a commercial product?

Hope that question makes sense. Thanks for the clarification.

It doesn’t really matter for this exercise. You could do rounding, but that’s not the statement (and you’ll have plenty to do to get the bonus points). :p

Thanks for your reply. I might be missing something, hopefully you can straighten me out…

I’m an embedded guy, and I also work on life-critical systems and high-precision control systems (cases where even floating point “double” isn’t precise enough), so I’m pretty accustomed to doing scaled-integer math, fixed-point math, and scaled binary.

For example, the 0.289 coefficient is pretty close to 37/128 (first multiply by 37, then shift right by 7). If higher precision is needed, you can always increase the denominator and scale/tweak the numerator.

But here’s where I’m running into problems (on my Linux box). For 283 of the 16M RGB triplets, I’m being hobbled by the machine’s floating point “machine epsilon”. Just one example…

Take the triplet (3,111,184) – if you work out the math, the luminance is exactly 87.000, but the C code in your example, on my box, yields a value of 86.99999237 – and when truncated on function call return, that’s 86. My code calculates 87 with scaled integer, but the test bench expects 86, and my code is considered “wrong” per the requirement “your integer-only implementation must give the same answer as the float version for all RGB triplets”.

When I change the floating point suffix in your constants from “f” to “L”, everything works fine. So, I’m running into the “machine epsilon”, and I’m guessing that’s not the point of your challenge.

I validated what you’re saying. The first version of the code (for Windows) did not exhibit this behavior, but on linux, it does (even when casting the constants as long double).

I assume you have gcc/g++ on Linux, so I fixed the code (as minimally as possible) to accomodate the behavior of g++ 4.5.2 and that 86.99999… is truncated as 87, still using floats.

If you have 32-bit integers is trivial. Just pre-multiply every constant in the given implementation by 100000 (10^5), compute the sum, and divide: (28900*r+58700*g+11400*b+1)/100000

If the division is too expensive, use 2^17 (131072) and right-shift:
(37880*r+76939*g+14942*b+1) >> 17

Is there some other constraint I’m missing? Word sizes?

It passes the test, but only if casting 289, 587, and 114 to long (64 bits on my machine). 289*r only yields int (as 289 is int, r is int8_t, promoted to int), not long, which breaks the arithmetic on LP64.

I like that solution, the constants involved in the computation appear small, but you need 64 bits integers to make it work properly (since 26+8 > 32). I guess you still get bonus points (since the statement is about the constants, not the intermediate values).

BTW, I was lazy, just to get it done I used base-10 math for the scaling (hence the shift by 17 instead of 16, and the trailing / 5). I suppose I could/should re-work it for the last part of the bonus (no / ) but I don’t think it will be that hard.

Given the interest in this, I’m sure if I wait 5 minutes someone else will do it for me…

People that optimize code (while not always “optimizing”) should more often do what you just did: document how the hack came to be. I’m sure everyone that submitted a solution did pretty much the same thing, but I am also sure they all took slightly different approaches to deal with precision issues.

(Actually a reply to Steven – fighting the indentation)

Steven – you make a good point. One of the nice things about using version control – even for something small like this – is that I’m able to see the progression of the development. First, get it working, then optimize. It also makes it easy to “abort” if you go down a bum path & decide to trash it (something this small I don’t branch).

I do the same thing when working on Project Euler problems. First, start with whatever approach seems straightforward, get it working, and then go from there. Almost invariably, my final version will run 10-100 times faster, once I’ve identified invariants, pre-cached important values, and often utilize dynamic programming / memoization tactics. (I’m an EE, not a CS guy, so learning things like DP was really mind-expanding).

So… when can we expect the next programming challenge? ;-)

What I meant was it is necessary to explain where a magic number like 151519 comes from, and the source control history may, or mayn’t help (plus it’s a pain to examine a complete history to find the relevant changes).

I came up with (4848615r + 9848226g + 1912603 + 168) >> 24 by multiplying the constants by 2^24 (in bc) and rounding. Strangely, when I try that on the 19-bit approach, 1e-5f * 2^19 is 5.24288, so the last addition constant is 5, which results in a bunch of off-by-1 values. If I use 300, it works fine.

On the other hand, the 24-bit and the 19-bit versions generate the same assembly for me, so I don’t feel too bad.

I was bored, so here’s one in 16-bits. Takes a bit more ops to get the job done though (9 *, 8 +, and 3 >>). Well, using that many ops is really cheating, but maybe you’ll enjoy it anyway.

I used this sort of technique to compute something like 20 decimal digits of PI on a basic stamp a long time back (basic stamps have 24 bytes of general purpose memory. Not kilobytes. Bytes.

The technique is interesting… breaking down the multiplies like that isn’t something that would’ve occured to me naturally; I try to make the most of each instruction, not break them in lots of smaller instructions.

RT @freakonometrics: "you are a tenure-track professor and you’ve noticed that higher education might not be able to sustain you either…" h… 3 days ago

RT @thefranke: Single-image scene guess-timation and relighting: Inverse Rendering Techniques for Physically Grounded Image Editing http://…3 days ago

I love problems like this, short but non-trivial.

I’ll take a stab at it, but first I just want to verify something…

In SlowBrightness() as shown above, even if the sum of the products, is say, 240.99, the function will return 240, since float-to-integer simply chops off the decimal part. I assume that in an ideal world (e.g. shipping library, etc.), you’d actually want to return 241, correct? But for the challenge we should emulate SlowBrightness() as coded (i.e. return 240), not what I would think you would want in a commercial product?

Hope that question makes sense. Thanks for the clarification.

It doesn’t really matter for this exercise. You could do rounding, but that’s not the statement (and you’ll have plenty to do to get the

bonuspoints). :pHi Steven,

Thanks for your reply. I might be missing something, hopefully you can straighten me out…

I’m an embedded guy, and I also work on life-critical systems and high-precision control systems (cases where even floating point “double” isn’t precise enough), so I’m pretty accustomed to doing scaled-integer math, fixed-point math, and scaled binary.

For example, the 0.289 coefficient is pretty close to 37/128 (first multiply by 37, then shift right by 7). If higher precision is needed, you can always increase the denominator and scale/tweak the numerator.

But here’s where I’m running into problems (on my Linux box). For 283 of the 16M RGB triplets, I’m being hobbled by the machine’s floating point “machine epsilon”. Just one example…

Take the triplet (3,111,184) – if you work out the math, the luminance is exactly 87.000, but the C code in your example, on my box, yields a value of 86.99999237 – and when truncated on function call return, that’s 86. My code calculates 87 with scaled integer, but the test bench expects 86, and my code is considered “wrong” per the requirement “your integer-only implementation must give the same answer as the float version for all RGB triplets”.

When I change the floating point suffix in your constants from “f” to “L”, everything works fine. So, I’m running into the “machine epsilon”, and I’m guessing that’s not the point of your challenge.

I validated what you’re saying. The first version of the code (for Windows) did not exhibit this behavior, but on linux, it does (even when casting the constants as long double).

I assume you have gcc/g++ on Linux, so I fixed the code (as minimally as possible) to accomodate the behavior of g++ 4.5.2 and that 86.99999… is truncated as 87, still using floats.

Chances are that you now moved the problem to one of the other 16M RGB triples.

Exhaustive testing says the contrary.

[decided to reset the indentation level!]

@Jules – I was wondering the same thing, but I ran with the modification, and now all triplets match.

For the sake of completeness, my compiler details (yes, it’s a bit dated):

g++ (Debian 4.3.2-1.1) 4.3.2

Hope to have some time later today to work on this again (damn deadlines! ;-) )

If you have 32-bit integers is trivial. Just pre-multiply every constant in the given implementation by 100000 (10^5), compute the sum, and divide: (28900*r+58700*g+11400*b+1)/100000

If the division is too expensive, use 2^17 (131072) and right-shift:

(37880*r+76939*g+14942*b+1) >> 17

Is there some other constraint I’m missing? Word sizes?

You’re close.

However your method using

>> 17fails on (0,2,209) with 24 instead of 25.(and also with many others.)

Ahh!!! Ok, you want something that incurs the same error than the floating point variant… I’ll need to think about it.

My answer (using a 32-bit mantissa):

It passes the test, but no bonus points: the size of the integers are not minimized!

So there’s an solution using 32-bit integers? Mine works down to a 26-bit shift, but I need 34-bit of precision (ie a 64-bit intermediate)…

Yes. There’s one with at most 28 bits intermediate precision needed.

19 bit shift:

return (151519 * r + 307757 * g + 59768 * b + 300) >> 19

Passed!

…and it beats the solution I have by ~0.5 bits.

My solution: (289*r + 587*g + 114*b)*525 >> 19. This works with ordinary ints.

It fails on (0,255,248) yielding 178 instead of 177 (and on many others). It seems to be overshooting just a bit.

Ah, indeed I made a mistake (I didn’t have access to a compiler to test it). Can you test this one: (289*r + 587*g + 114*b)*67109 >> 23

Sorry wrong again, the 23 should be 26…

It passes the test, but only if casting 289, 587, and 114 to long (64 bits on my machine). 289*r only yields int (as 289 is int, r is int8_t, promoted to int), not long, which breaks the arithmetic on LP64.

I like that solution, the constants involved in the computation

appearsmall, but you need 64 bits integers to make it work properly (since 26+8 > 32). I guess you still get bonus points (since the statement is about the constants, not the intermediate values).uint8_t FastBrightness(uint8_t r, uint8_t g, uint8_t b)

{

uint32_t result;

result = ((((r * 189399) + (g * 384696) + (b * 74711) + 375) >> 17) / 5);

return(result);

}

I think this works for all cases and only uses 32-bit math everywhere.

It passes the test.

BTW, I was lazy, just to get it done I used base-10 math for the scaling (hence the shift by 17 instead of 16, and the trailing / 5). I suppose I could/should re-work it for the last part of the bonus (no / ) but I don’t think it will be that hard.

Given the interest in this, I’m sure if I wait 5 minutes someone else will do it for me…

result = (((r * 151519) + (g * 307757) + (b * 59769) + 300) >> 19);

D’oh! Now as I scroll up in the comments I see that pdq already did basically the same thing (although I ended up with 59769 and not 59768).

I would say “great minds think alike” but I won’t bring pdq down to my level.

Just to give a little credibility to me posting basically the same thing, I should probably post a brief explanation…

So, .289 is 18939.904/65536. But if you just multiply by 18940, or 18939, that little .904 starts to matter.

So the gist is to do this:

temp_r = ((r * 18939) + (r * 18940) * 9) / 10

That’s how I originally did it, hence my answer with >> 17 and / 5.

Re-scaled everything to be multiplied by 8 instead of 10, and basically came up with the same as pdq, although one digit was different ;-)

People that optimize code (while not always “optimizing”) should more often do what you just did: document how the hack came to be. I’m sure everyone that submitted a solution did pretty much the same thing, but I am also sure they all took slightly different approaches to deal with precision issues.

(Actually a reply to Steven – fighting the indentation)

Steven – you make a good point. One of the nice things about using version control – even for something small like this – is that I’m able to see the progression of the development. First, get it working, then optimize. It also makes it easy to “abort” if you go down a bum path & decide to trash it (something this small I don’t branch).

I do the same thing when working on Project Euler problems. First, start with whatever approach seems straightforward, get it working, and then go from there. Almost invariably, my final version will run 10-100 times faster, once I’ve identified invariants, pre-cached important values, and often utilize dynamic programming / memoization tactics. (I’m an EE, not a CS guy, so learning things like DP was really mind-expanding).

So… when can we expect the next programming challenge? ;-)

What I meant was it is necessary to explain where a magic number like 151519 comes from, and the source control history may, or mayn’t help (plus it’s a pain to examine a complete history to find the relevant changes).

As for the next challenge, why, just next week :p

My best solution is (151520r + 307758g + 59769b) >> 19, which is interesting in that it only needs two additions.

And it passes the test. Gratz.

I came up with (4848615r + 9848226g + 1912603 + 168) >> 24 by multiplying the constants by 2^24 (in bc) and rounding. Strangely, when I try that on the 19-bit approach, 1e-5f * 2^19 is 5.24288, so the last addition constant is 5, which results in a bunch of off-by-1 values. If I use 300, it works fine.

On the other hand, the 24-bit and the 19-bit versions generate the same assembly for me, so I don’t feel too bad.

It also passes the test. Bravo!

I was bored, so here’s one in 16-bits. Takes a bit more ops to get the job done though (9 *, 8 +, and 3 >>). Well, using that many ops is really cheating, but maybe you’ll enjoy it anyway.

http://codepad.org/xsW9SEcb

I used this sort of technique to compute something like 20 decimal digits of PI on a basic stamp a long time back (basic stamps have 24 bytes of general purpose memory. Not kilobytes. Bytes.

The technique is interesting… breaking down the multiplies like that isn’t something that would’ve occured to me naturally; I try to make the most of each instruction, not break them in lots of smaller instructions.

However… knowledge++ :p

[…] problem of computing luminance was rather of a bit-twiddling nature (and some of my readers came up with very creative […]