One of the good things of the peer review process is that if you publish, you’re eventually going to have to review papers for conferences or journal in your (perceived) area of expertise. Sometimes you get pearls such as “the resulting results of algorithm X are resulted” (true story), or “the dynamics of the attorney of yes no plasmodium” (also true), but sometimes bad science comes from the bad presentation of results.

This is also a (essentially true) story. So I’m reviewing a paper that proposes some kind of method for predicting the value of (some) parameter that minimizes some error function. The method is fast, but not analytic. The graph in the paper looks something like:

So it seems that the error is greatly minimized for values around 170. But the paper conveniently dismisses the real scale of the error in favor of some *relative* error. So the same graph with the (incomplete, but) now absolute scale now looks like this:

The geometry is the same, but now we *know* the actual scale of the “minimization.” We now know it’s not that impressive, but visually, we still get the impression that the minimization of the function using values around 170 is awesome. Already changing the scale somewhat makes it less impressive:

and finally, if we consider the *actual scale* of the change:

we see that… well… that was not a very interesting result after all. In all graphics, it’s the same series (I did not take the one of the paper, of course, but some random walk with similar properties), but the last graphics shows conspicuously the complete ineptitude of the proposed result.

Guess who got an F

*

* *

So this week’s entry isn’t much of an entry, more a rant against people that can’t science very well, certainly a rant against bad graphs. I think one of the challenges (even once you do have good science/results, which was not the case of the authors) is to present your information correctly, that is, in a way that’s both informative, easy to understand, and, more importantly, *not misleading*.

Huff’s wonderful *How to lie with statistics* (at amazon) comes to mind, and I do recommend it, but I would rather suggest you first read Tufte’s *The Visual Display of Quantitative Information*, a really nice (introductory) book on data visualization. I must say that I have been more careful in the design of figures and graphs since I have read Tufte’s books. Some of the tips given are somewhat self-evident, but some are actually subtle; and one has to be exceedingly careful with subtleties. Reviewers aren’t always very subtle. If you mislead, or are not understood, you fail.

*

* *

Often details that first seem inconsequential make a big difference. Let us look at this graph:

This graph in neutral colors is meant to convey that the method (whatever it may be) reduces the selection of classes 1 and 2, and favor the “better” classes 3, 4, and 5, compared to some unshown previous method. First, this graph renders very badly the fact that the sum of all bars is zero. Why? If you have a finite amount of items, you super-class some, and de-class some, then the net variation in the number of items is zero. Here, that the lower classes are negative and the higher are positive is to be interpreted as an amelioration. Out of context, it’s not much help. It’s a bad graph.

It’s even worst because the graph is entire susceptible to color-based interpretation. In red, it seems now so much worse, for no real reason:

…but in green:

it now seems so much better! Why? Because we are so deeply conditioned to interpret red as bad, danger, etc, and green as good, fresh, healthy, that the mere change of color of the graph changes its interpretation! Consider:

This one has neutral color, and we cannot really guess if bigger is better or worse. Coloring so:

makes it clear that higher/bigger is worse. Reversing the color scheme:

we get the exact opposite!

*

* *

So to make a long story short, I do take care of having the right scale and the right colors to transmit the correct information and the correct intend, and so should you.

Oh, the abuse of graphs!

Here in England, one stock-market investment magazine I used to read had amazingly bad graphs. For share prices they would use weird scales. If the price (over a certain timescale) varied in the range 80p to 180p, one might expect the vertical (price) scale (in pence) to go 80,100, 120, 140, 160, 180. Maybe one would expect 75, 100,150,175. However, they would often choose a scale going 80, 110, 140, 170, 200. Sometimes it would be 70, 110, 150, 190. WHAT THE …!?!?!?

Now remember that share prices can never be negative… I have seen at least one of their price axes labelled with the following values -20, 30, 80, 130, 180. Yes, that starts at MINUS TWENTY. Ha, ha, at least the increment (50) was sort-of sensible!

More generally, people seem to refer to the X axis and Y axis when they DON’T EXIST. I know that when we’re initially introduced to graphs, as children, all the examples have two variables (X and Y) and X is always represented on the horizontal axis and Y on the vertical axis. Fine. So we get used to associating X with the horizontal axis and Y with the vertical axis. But then we meet real-world data, where variables can take any name under the sun, and we still have a fixation with X and Y, although these don’t usually exist. If we’re plotting a car’s velocity (let’s call it V) against time (T) we have a V axis and a T axis. So get this:

*** We do not have X and Y axes ***

Still, everybody refers to X and Y axes, even in peer-reviewed journals, where the authors ought to know better.

What about titles of graphs? I can’t stand titles along the lines of “Graph of the variables V and T”, because (just for starters):

1) I know it’s a graph. I can see that in half a second just by opening my eyes.

2) I know that variables are being plotted – what else is a graph for???

…so that leaves us with the title “V and T”. A little terse, maybe? Yes, because we’ve removed the redundant words, leaving us space for something useful, such as a title along the lines of “Velocity during initial braking”.

(Aha! I see the light going on… NOW I know what this paper is about!… and suchlike reactions.)

Phrases like “graph of” or “plot of” are like wrapping paper. If you give someone a present, nicely wrapped up in brown paper, the first thing they do is take off the paper and throw it away. Mentally, we do exactly this with the words “graph of the variables”. We are then left with the actual content, which is “V and T”.

If you avoid redundant words in graphs (and in writing more generally, but that’s a whole other topic) your readers won’t necessarily notice. It’s the wrapping-paper phenomenon – the receiver is concerned about the content, and soon forgets about the wrapping paper. However, despite not getting any thanks, you’ll be able to fit more information, nicely expressed, in the same space, possibly even in less space.