## /* no comments (part II) */

In a previous installment, I discussed the quality of English in comments, arguing that the quality of comments influences the reader’s judgment on the quality of the code as well.

That’s not the only thing that can make code harder or easier to understand. Today (anyway, at the time of writing), I was working on something where arbitrary-looking constants would constantly come up. I mean, constants that you wouldn’t know where they’re from unless there’s a comment. A clear comment. Let’s see some of those.

Can you guess the following constants?

 $3.141592653589793\ldots$ $\pi$ $2.718281828459045\ldots$ $e$ $1.732050808\ldots$ ? $0.693147181\ldots$ ? $1.772453851\ldots$ ? $1.539600718\ldots$ ? $1.584962501\ldots$ ?

Think a while and see if you can find a short expression for these constants. The answers are a bit further down. No peeking!

Some of those are easy to find, but even when you can find the original “natural expression” for the constant, you may still not know where it’s from. If you see something like:

...
s=1.0/sqrt(n);
v += 2.199272222*u;
...


you’d be hard-pressed to figure out where the constant $2.199272222$ is from. However, if you read

...
s=1.0/sqrt(n);
...
// solving 4pi/n = 3sqrt(3)/2 s^2
// for s (finding the side s of an hexagon
// with surface 4pi/n), we find that
//
// s= sqrt( (8 pi) / (3sqrt(3) n) )
//
v += 2.199272222*s*u;
...


it’s already much better. Of course, the code now includes a rather lengthy comment on how this formula is derived. Maybe if we added where it’s from, possibly a reference to a paper where it was used, it would be complete. In the above, it’s easy to google or use Wikipedia to find that the length of the side of a regular hexagon occupying $1/n$ of the surface of a unit sphere gives the formula above. Let us pretend that, in this context, it makes perfect sense to use this derivation.

*
* *

Plugging an “efficient” constant like $2.1992722\ldots$ in the middle of the source code is only machine efficient, not efficient in absolute terms. It greatly reduces maintainability as the next programmer that will work on this program may, or may not, know about the derivation of this constant and just plug some other “random” value. (And I have seen this. For real. “bah! some random number! let us use 2 instead. 2’s big enough.” Epic facepalm moment, I tell you.). It also reduces maintainability because if, for some reason, a pentagon is needed, say, the constant must be replaced by some other constant but the next programmer may, or may not, be able to figure out how to do so. With the last comment, he will at least understand what to do.

*
* *

I hope you tried hard. Anyway, here are the answers:

 $3.141592653589793\ldots$ $\pi$ $2.718281828459045\ldots$ $e$ $1.732050808\ldots$ $\sqrt{3}$ $0.693147181\ldots$ $\ln 2$ $1.772453851\ldots$ $\sqrt{\pi}$ $1.539600718\ldots$ $\frac{8}{3\sqrt{3}}$ $1.584962501\ldots$ $\lg 3$

where $\lg$ is log-base-2.

### 4 Responses to /* no comments (part II) */

1. Hey. Perhaps the lengthy comment about the identity and derivation of a constant is best placed outside the callable, where the constant is defined. Then the algorithm can just reference this value without comment.

• Steven Pigeon says:

Or placed in the auto-documentation tool you’re using. There’s Doxygen, for example. That might also be a good place to put that information. But in the other hand, it should not be too far from where the constant is used. In any case, I think stuffing a (random-looking) constant without comment is bad.

Sorry, I still don’t have a clue what 2.199272222 is.

Anyway, I read some good advice about commenting once. It was about the language FORTH (unique and interesting) but most of the advice applies to other languages. Essentially, it’s about names of variables and subroutines.

This has a lot to do with comments, because names of variables and subroutines are part of commenting.

1. Short words are best (cf quote from Winston Churchill).
2. Don’t use cryptic abbreviations
3. Use English, not cryptic computerese
4. Don’t describe the parameters, describe the function
5. Name the WHAT, not the HOW
6. Comments should be imperative, not descriptive.

1. Short words are best (cf quote from Winston Churchill).

Should you name your variable any of these:
ModulusOfVector (if you love camel case)
modulus.of.vector
modulus_of_vector

Puh-leeeease! Why did we invent mathematics in the first place? It makes convenient abbreviations of a lot of things. That’s a balance between “abbreviation” and “convenient”. The above three choices are so long to keep typing in, and too long to keep saying if you’re describing the code to someone else, that they should be taken out and shot.

Can’t we just call it modulus?
modulus = x(i)^2+y(i)^2+z(i)^2 …wasn’t that easy?

Suppose you’re adding the moduli of all the vectors…

sum=0.0

…loop mechanism…
modulus2 = x(i)^2+y(i)^2+z(i)^2
sum = sum + modulus2

(strictly, it’s the modulus squared, so the variable name has a 2 in it, purely to remind us)

…easy stuff. But suppose you think “this is a summation” … so you could call your variable “summation”. But why? The longer word doesn’t give you any more information. To use a longer word, there’s got to be a good reason, so “sum” is correct.

Short, tight code is good code.

2. Don’t use cryptic abbreviations

Why use “modulus”? Why not use “mdls” ? Because you’ll look at your code again in six months and thing “WTF is mdls???”
Rest easy: “modulus” is only characters 7 long, as opposed to the 4 characters of “mdls”. The computer has enough memory to store those extra 3 characters.

Another reason for using the full word: if the word has to be used outside the context in which it is defined. Inside the original defining routine, you might write

mod2 = x(i)^2+y(i)^2+z(i)^2
sum=sum + mod2

…and mod2 is never used again. You will (won’t you???) have a comment like “total error is sum of moduli squared of vectors” so it’s pretty clear that you’re adding the squares of the moduli. HOWEVER, if you want to use that variable somewhere else, because you want to keep its value and use it in another routine, it’s no longer obvious what mod2 means, so it’s probably better to spell out the whole word.

3. Use English, not cryptic computerese

You want to put something on the screen, given a location in x and y. So you have a routine which takes parameters x and y, and a reference to a graphical object.
Possible names for the routine:
XYlocate
LOCXY
SysCall0888

Well, the first two are bad enough, being awkward to pronounce. But the third one? Someone is referring to the system routine involved in doing the work, probably some internal graphics-library call. That means you need an extra line to say what it means. If you’re using it repeatedly in the code, you would have to comment it every time, or leave it sitting in the code being cryptic.

XYlocate and similar, are awkward to pronounce, but they’re a symptom of something else, too: redundancy. You’re most likely going to use the routine in this kind of way:

XYlocate(monster,xm,ym)

So… your keen games programmer will recognise that you’re putting a monster on the screen at location xm, ym… BUT now notice that X and Y occur twice in that line. That’s an awkward waste of space. Much nicer to write:

Locate(monster,xm,ym)

I’m not sure that I would use “locate”, because that might mean “find”. Maybe something that more obviously says “place” or “put”.. Aha! So you name your routine one of those words.

place(monster,xm,ym)
or
put(monster,xm,ym)

Now that’s not far from basic simple English which says what it means and no more.

4. Don’t describe the parameters, describe the function

Suppose you have some measurements from an A-to-D converter and they’re a set of voltages. You have some coefficients in an equation which converts them to the thing you’re REALLY interested in, which might be wind speed (you’re programming a weather station). The equation and its coefficients happen to be accesible via a routine called VoltageConvert.

You could write:
…loop over i…
S(i)=VoltageConvert(meas(i))

True enough, you’re converting a measurement. As an aside, if that measurement is a voltage, it should be named something like v or voltage.
HOWEVER, if your routine computes the speed, it should be called “speed”.

…loop over i…
S(i)=speed(v(i))

So later on you’ll be taking the average wind speed by averaging S(i).

5. Name the WHAT, not the HOW

The name of a subroutine should answer the question “What am I doing?”
This is closely related to the previous tip.

Suppose you’re computing areas of squares. You have the length of one side. You have a set of lengths L(i) and you want to add them. The whole point is to find the total area.

sum=0.0
…loop over i…
sum=sum+square(L(i))

Now you have a (trivial) routine to square L(i) and return the result, but the point should be what the output of that routine MEANS. What are you trying to achieve? What use is the routine to you/ why are you calling it?

You’re computing areas, so the routine should be called “area”.

sum=0.0
…loop over i…
sum=sum+area(L(i))

Then it’s blindingly obvious what your code is doing. The comment for the routine “area” should tell you HOW it’s doing it.

6. Comments should be imperative, not descriptive.

Our monster-placing routine might be commented like this:
/* This routine calls system routine 0888 to display a graphical object, having converted its coordinates to hardware coordinates*/

Mmmm. For a start, the comment starts by telling me something I already know. It’s a routine. Yes, I know that!

I prefer to give commands – the command form mirrors what a program is. This makes the comments shorter and tighter.

I also notice that the comments are out of order, i.e. the phrases don’t describe things in the same order in which they occur as the routine is executed. The first phrase is not the overall function of the routine, but an internal component. Words like “having” let you know that you have to rearrange the phrases to put them in execution order, i.e. “do the second thing, having done the first thing.” This is OK up to a point, because our brains can usually cope with that sort of thing, but it’s too easy to let that approach grow into complicated sentences which are not obvious at first glance.

I would re-arrange the comments as follows:

/* Display a graphical object on the screen.

Convert location to hardware coordinates (use system routine 0862)
Place at those coordinates (system routine 0888)
*/

Now everything is in order. The purpose is obvious, and appears first.
The method (outline of algorithm) is below, in order, with supporting information in brackets.
(I know it’s only two lines, but it’s an algorithm!)

• The naming convention is inherently linked to the language used. To take your modulus example, if the language allows the overload of operators, may you can find something that’s suspiciously close to the mathematical notation, dispensing you from using “modulus” completely. What If you could define |x| directly?

Also, the naming convention must adjust itself to the main look-and-feel of the language. In C++, one would use STL-like (and Boost-like) conventions, having, probably, some_vector.mod() in function-form, rather than modulus(some_vector). In Python you’d do something else, and so on for each language you will use. Also, some languages will make it difficult to have terse expressions anyway ( oh.noes.it.s.java.kill.me.already() ).

I also generally agree on points 4) to 6)