One of the good things of the peer review process is that if you publish, you’re eventually going to have to review papers for conferences or journal in your (perceived) area of expertise. Sometimes you get pearls such as “the resulting results of algorithm X are resulted” (true story), or “the dynamics of the attorney of yes no plasmodium” (also true), but sometimes bad science comes from the bad presentation of results.
This is also a (essentially true) story. So I’m reviewing a paper that proposes some kind of method for predicting the value of (some) parameter that minimizes some error function. The method is fast, but not analytic. The graph in the paper looks something like:
So it seems that the error is greatly minimized for values around 170. But the paper conveniently dismisses the real scale of the error in favor of some relative error. So the same graph with the (incomplete, but) now absolute scale now looks like this:
The geometry is the same, but now we know the actual scale of the “minimization.” We now know it’s not that impressive, but visually, we still get the impression that the minimization of the function using values around 170 is awesome. Already changing the scale somewhat makes it less impressive:
and finally, if we consider the actual scale of the change:
we see that… well… that was not a very interesting result after all. In all graphics, it’s the same series (I did not take the one of the paper, of course, but some random walk with similar properties), but the last graphics shows conspicuously the complete ineptitude of the proposed result.
Guess who got an F
So this week’s entry isn’t much of an entry, more a rant against people that can’t science very well, certainly a rant against bad graphs. I think one of the challenges (even once you do have good science/results, which was not the case of the authors) is to present your information correctly, that is, in a way that’s both informative, easy to understand, and, more importantly, not misleading.
Huff’s wonderful How to lie with statistics (at amazon) comes to mind, and I do recommend it, but I would rather suggest you first read Tufte’s The Visual Display of Quantitative Information, a really nice (introductory) book on data visualization. I must say that I have been more careful in the design of figures and graphs since I have read Tufte’s books. Some of the tips given are somewhat self-evident, but some are actually subtle; and one has to be exceedingly careful with subtleties. Reviewers aren’t always very subtle. If you mislead, or are not understood, you fail.
Often details that first seem inconsequential make a big difference. Let us look at this graph:
This graph in neutral colors is meant to convey that the method (whatever it may be) reduces the selection of classes 1 and 2, and favor the “better” classes 3, 4, and 5, compared to some unshown previous method. First, this graph renders very badly the fact that the sum of all bars is zero. Why? If you have a finite amount of items, you super-class some, and de-class some, then the net variation in the number of items is zero. Here, that the lower classes are negative and the higher are positive is to be interpreted as an amelioration. Out of context, it’s not much help. It’s a bad graph.
It’s even worst because the graph is entire susceptible to color-based interpretation. In red, it seems now so much worse, for no real reason:
…but in green:
it now seems so much better! Why? Because we are so deeply conditioned to interpret red as bad, danger, etc, and green as good, fresh, healthy, that the mere change of color of the graph changes its interpretation! Consider:
This one has neutral color, and we cannot really guess if bigger is better or worse. Coloring so:
makes it clear that higher/bigger is worse. Reversing the color scheme:
we get the exact opposite!
So to make a long story short, I do take care of having the right scale and the right colors to transmit the correct information and the correct intend, and so should you.