Programming Challenge: Luminance

A few years ago, I posted on my personal web page a number of programming challenges for my friends. This week, I present one of the old challenges: computing luma from RGB triplets using integer arithmetic only.

You have to compute the function

```unsigned char SlowBrightness(unsigned char r,
unsigned char g,
unsigned char b)
{
return 0.289f * r +
0.587f * g +
0.114f * b + 1e-5f;
}
```

but the processor you’re using is not capable of floating point number arithmetic. In other words, it uses a all-software library to emulate them and it’s really slow.

Re-implement this function using only integer operations.

Efficiency is not the only goal, you must have an exact solution as well: your integer-only implementation must give the same answer as the float version for all RGB triplets.

Bonus. Use only +, *, >> and minimize the size of the integer constants involved.

36 Responses to Programming Challenge: Luminance

1. Dan says:

I love problems like this, short but non-trivial.

I’ll take a stab at it, but first I just want to verify something…

In SlowBrightness() as shown above, even if the sum of the products, is say, 240.99, the function will return 240, since float-to-integer simply chops off the decimal part. I assume that in an ideal world (e.g. shipping library, etc.), you’d actually want to return 241, correct? But for the challenge we should emulate SlowBrightness() as coded (i.e. return 240), not what I would think you would want in a commercial product?

Hope that question makes sense. Thanks for the clarification.

• Steven Pigeon says:

It doesn’t really matter for this exercise. You could do rounding, but that’s not the statement (and you’ll have plenty to do to get the bonus points). :p

• Dan says:

Hi Steven,

Thanks for your reply. I might be missing something, hopefully you can straighten me out…

I’m an embedded guy, and I also work on life-critical systems and high-precision control systems (cases where even floating point “double” isn’t precise enough), so I’m pretty accustomed to doing scaled-integer math, fixed-point math, and scaled binary.

For example, the 0.289 coefficient is pretty close to 37/128 (first multiply by 37, then shift right by 7). If higher precision is needed, you can always increase the denominator and scale/tweak the numerator.

But here’s where I’m running into problems (on my Linux box). For 283 of the 16M RGB triplets, I’m being hobbled by the machine’s floating point “machine epsilon”. Just one example…

Take the triplet (3,111,184) – if you work out the math, the luminance is exactly 87.000, but the C code in your example, on my box, yields a value of 86.99999237 – and when truncated on function call return, that’s 86. My code calculates 87 with scaled integer, but the test bench expects 86, and my code is considered “wrong” per the requirement “your integer-only implementation must give the same answer as the float version for all RGB triplets”.

When I change the floating point suffix in your constants from “f” to “L”, everything works fine. So, I’m running into the “machine epsilon”, and I’m guessing that’s not the point of your challenge.

• Steven Pigeon says:

I validated what you’re saying. The first version of the code (for Windows) did not exhibit this behavior, but on linux, it does (even when casting the constants as long double).

I assume you have gcc/g++ on Linux, so I fixed the code (as minimally as possible) to accomodate the behavior of g++ 4.5.2 and that 86.99999… is truncated as 87, still using floats.

• Jules Jacobs says:

Chances are that you now moved the problem to one of the other 16M RGB triples.

• Steven Pigeon says:

Exhaustive testing says the contrary.

2. Dan says:

[decided to reset the indentation level!]

@Jules – I was wondering the same thing, but I ran with the modification, and now all triplets match.

For the sake of completeness, my compiler details (yes, it’s a bit dated):

g++ (Debian 4.3.2-1.1) 4.3.2

Hope to have some time later today to work on this again (damn deadlines! ;-) )

3. juancn says:

If you have 32-bit integers is trivial. Just pre-multiply every constant in the given implementation by 100000 (10^5), compute the sum, and divide: (28900*r+58700*g+11400*b+1)/100000

If the division is too expensive, use 2^17 (131072) and right-shift:
(37880*r+76939*g+14942*b+1) >> 17

Is there some other constraint I’m missing? Word sizes?

• Steven Pigeon says:

You’re close.

However your method using >> 17 fails on (0,2,209) with 24 instead of 25.
(and also with many others.)

• juancn says:

Ahh!!! Ok, you want something that incurs the same error than the floating point variant… I’ll need to think about it.

4. pdq says:

My answer (using a 32-bit mantissa):

```#define FIXED_SHIFT 32
#define FIXED(X) ((uint64_t) ((X) * (((uint64_t)1)<<FIXED_SHIFT)))

unsigned char FastBrightness(unsigned char r,
unsigned char g,
unsigned char b)
{
return (FIXED(0.289f) * r +
FIXED(0.587f) * g +
FIXED(0.114f) * b + FIXED(1e-5f)) >> FIXED_SHIFT;
}
```
• Steven Pigeon says:

It passes the test, but no bonus points: the size of the integers are not minimized!

• pdq says:

So there’s an solution using 32-bit integers? Mine works down to a 26-bit shift, but I need 34-bit of precision (ie a 64-bit intermediate)…

• Steven Pigeon says:

Yes. There’s one with at most 28 bits intermediate precision needed.

• pdq says:

19 bit shift:

return (151519 * r + 307757 * g + 59768 * b + 300) >> 19

• Steven Pigeon says:

Passed!

…and it beats the solution I have by ~0.5 bits.

5. Jules Jacobs says:

My solution: (289*r + 587*g + 114*b)*525 >> 19. This works with ordinary ints.

• Steven Pigeon says:

It fails on (0,255,248) yielding 178 instead of 177 (and on many others). It seems to be overshooting just a bit.

• Jules Jacobs says:

Ah, indeed I made a mistake (I didn’t have access to a compiler to test it). Can you test this one: (289*r + 587*g + 114*b)*67109 >> 23

• Jules Jacobs says:

Sorry wrong again, the 23 should be 26…

• Steven Pigeon says:

It passes the test, but only if casting 289, 587, and 114 to long (64 bits on my machine). 289*r only yields int (as 289 is int, r is int8_t, promoted to int), not long, which breaks the arithmetic on LP64.

I like that solution, the constants involved in the computation appear small, but you need 64 bits integers to make it work properly (since 26+8 > 32). I guess you still get bonus points (since the statement is about the constants, not the intermediate values).

6. Dan says:

uint8_t FastBrightness(uint8_t r, uint8_t g, uint8_t b)
{
uint32_t result;
result = ((((r * 189399) + (g * 384696) + (b * 74711) + 375) >> 17) / 5);
return(result);
}

I think this works for all cases and only uses 32-bit math everywhere.

• Steven Pigeon says:

It passes the test.

7. Dan says:

BTW, I was lazy, just to get it done I used base-10 math for the scaling (hence the shift by 17 instead of 16, and the trailing / 5). I suppose I could/should re-work it for the last part of the bonus (no / ) but I don’t think it will be that hard.

Given the interest in this, I’m sure if I wait 5 minutes someone else will do it for me…

• Dan says:

result = (((r * 151519) + (g * 307757) + (b * 59769) + 300) >> 19);

• Dan says:

D’oh! Now as I scroll up in the comments I see that pdq already did basically the same thing (although I ended up with 59769 and not 59768).

I would say “great minds think alike” but I won’t bring pdq down to my level.

Just to give a little credibility to me posting basically the same thing, I should probably post a brief explanation…

So, .289 is 18939.904/65536. But if you just multiply by 18940, or 18939, that little .904 starts to matter.

So the gist is to do this:

temp_r = ((r * 18939) + (r * 18940) * 9) / 10

That’s how I originally did it, hence my answer with >> 17 and / 5.

Re-scaled everything to be multiplied by 8 instead of 10, and basically came up with the same as pdq, although one digit was different ;-)

• Steven Pigeon says:

People that optimize code (while not always “optimizing”) should more often do what you just did: document how the hack came to be. I’m sure everyone that submitted a solution did pretty much the same thing, but I am also sure they all took slightly different approaches to deal with precision issues.

• Dan says:

(Actually a reply to Steven – fighting the indentation)

Steven – you make a good point. One of the nice things about using version control – even for something small like this – is that I’m able to see the progression of the development. First, get it working, then optimize. It also makes it easy to “abort” if you go down a bum path & decide to trash it (something this small I don’t branch).

I do the same thing when working on Project Euler problems. First, start with whatever approach seems straightforward, get it working, and then go from there. Almost invariably, my final version will run 10-100 times faster, once I’ve identified invariants, pre-cached important values, and often utilize dynamic programming / memoization tactics. (I’m an EE, not a CS guy, so learning things like DP was really mind-expanding).

So… when can we expect the next programming challenge? ;-)

• Steven Pigeon says:

What I meant was it is necessary to explain where a magic number like 151519 comes from, and the source control history may, or mayn’t help (plus it’s a pain to examine a complete history to find the relevant changes).

As for the next challenge, why, just next week :p

8. Evan P. says:

My best solution is (151520r + 307758g + 59769b) >> 19, which is interesting in that it only needs two additions.

• Steven Pigeon says:

And it passes the test. Gratz.

9. I came up with (4848615r + 9848226g + 1912603 + 168) >> 24 by multiplying the constants by 2^24 (in bc) and rounding. Strangely, when I try that on the 19-bit approach, 1e-5f * 2^19 is 5.24288, so the last addition constant is 5, which results in a bunch of off-by-1 values. If I use 300, it works fine.

On the other hand, the 24-bit and the 19-bit versions generate the same assembly for me, so I don’t feel too bad.

• Steven Pigeon says:

It also passes the test. Bravo!

10. Aaron says:

I was bored, so here’s one in 16-bits. Takes a bit more ops to get the job done though (9 *, 8 +, and 3 >>). Well, using that many ops is really cheating, but maybe you’ll enjoy it anyway.