## LaTeXify C/C++ code snippets

So I’m still writing lecture notes. This time, I need to include kind of larger pieces of C or C++ code, and $\LaTeX$ environments do not really help me a lot. Some are better than others, but you still have to escape and fancify text yourself. This is laborious and error-prone, and is an obvious target for automation. A script of some sort. The task isn’t overly complicated: highlight keywords, and escape symbols like { } _ and & that make $\LaTeX$ unhappy. This looks like a job for
sed.

sed is arguably one of the most perverse Unix command there is. The documentation is bad, it is mostly counter-intuitive, and it interacts more or less nicely with the shell. So here’s the script

#!/usr/bin/env bash

keyword_typeface=pmb

keywords=( bool break case catch char class const continue delete do double
else enum float for if int long new nullptr private protected public return
short sizeof static struct switch template typedef typeid typename union
unsigned using virtual void volatile while )

do
# all special symbols that interfere with LaTeX # \ _ { } &
#
# sed special symbol &
#
line=$(echo "$line" | sed s/[\#\&_{}]/\\\\\&/g )

for key in ${keywords[@]} do line=$(echo "$line" | sed "s/\b${key}\b/\\\\$keyword_typeface{$key}/g")
done
echo "$line" done <$1


Let’s break down the script. First, we have a keyword_typeface that defines the $\LaTeX$ command to use to render keywords. The list of keywords, is visibly incomplete, but as I won’t be using every possible C/C++ keyword in my notes, I just put those that I am likely to use.

The first sed command escapes any of the troublesome characters by prefixing them with a backslash. The & in the second part of the substitution command stands for “the match”, while \1 would have stood for the first match, and,… of course… “the match” and the first match are
not the same thing.

The second sed, in a for loop merely wraps any of the keywords in the first with the keyword typeface command.

Let’s see how it works. Here’s a bit of C++ code:

//////////////////////////////
size_t binary_search( const std::vector<int> & v,
int value,
uint64_t & steps,
interpolator interp)
{
steps=0;
size_t l=0, h=v.size()-1;
while (l<h)
{
steps++;
size_t p=interp(l,v[l],h,v[h],value);
if (v[p]<value)
l=p+1;
else
if (v[p]==value)
return p;
else
h=p;
}

return l;
}


Invoking the script with the file name as argument will produce:

\begin{code*}
//////////////////////////////
size\_t binary\_search( \pmb{const} std::vector<\pmb{int}> \& v,
\pmb{int} value,
uint64\_t \& steps,
interpolator interp)
\{
steps=0;
size\_t l=0, h=v.size()-1;
\pmb{while} (l<h)
\{
steps++;
size\_t p=interp(l,v[l],h,v[h],value);
\pmb{if} (v[p]<value)
l=p+1;
\pmb{else}
\pmb{if} (v[p]==value)
\pmb{return} p;
\pmb{else}
h=p;
\}

\pmb{return} l;
\}
\end{code*}


Which is rendered as:

*
* *

The script has one severe limitation: keywords that are part(s) of other keywords. Like const is a part of constexpr. If const comes first in the list of keywords, then the script will output something like \pmb{const}expr and will later fail to match constexpr. If, however, constexpr comes first in the list, then the script will generate \pmb{\pmb{const}expr} because it will match the whole word constexpr and then match within that word const later on. While \pmb{\pmb{const}expr} is not very beautiful, it renders correctly.

### 2 Responses to LaTeXify C/C++ code snippets

1. justusc says:

When including code snippets in my papers, I have found minted to be a wonderful package. See the following link for an overview. https://www.sharelatex.com/learn/Code_Highlighting_with_minted

It avoids the need for scripts and other preprocessing steps, and has several nice syntax coloring schemes.