When I write papers or other things, I tend to create separate bib files, so that I don’t end with a giant unsearchable and unmaintainable blob. Moreover, topics tend to be transient, and the bibliography may or mayn’t be interesting in a few year’s time, so, if unused, it can safely sleep in a directory with the paper it’s attached to.
But once in a while, I need one of those old references, and since they’re scatted just about everywhere… it may take a while to find them back. Unless you have a script. Scripts are nice.
So basically, a script to find back references should be able to enumerate all the .bib files and search them, at least for author names and titles. A simple grep doesn’t quite cut it, because bibtex files are structured in a format that may be reminiscent of JSON:
@book{bell-PH-1990, author = "Timothy C. Bell and John G. Cleary and Ian H. Witten", publisher = "Prentice Hall", title = "{Text Compression}", year = "1990" }
So to be useful, the script should at least print the whole block. Fortunately, there’s a tool for that, bib2bib. But, unfortunately, it’s difficult to use so it’ll be tricky to wrap it in a script. In particular, it tends to output more than you want, exports comments and strings and preamble and other cr*p you don’t necessarily want. Some options like --quiet or --no-comment have no effect. Messages are printed to either stdout or stderr, and
bib2bib returns 0 even if it terminated with an error. Some
grep and sed magic will be needed.
*
* *
Since bib2bib isn’t really meant to be used in a script, I had to use it twice. Once to check for the output—remember, it doesn’t even return an error status… well, it always returns “success”—and standard error output at that. If no error is printed, then I parse the output to remove all the extraneous stuff.
#!/usr/bin/env bash locate *.bib | ( while read filename do # grep with -a forces interpretation as "ascii" since some # encodings (ex. Windows, iso-latin1) may be detected as # "binary" (and grep whines). # hack because bib2bib still returns 0 (success) even if no # match # nul=$(bib2bib -c 'author:"'$1'" or title:"'$1'"' \ < "$filename" 2>&1 \ | grep -i -e "no matching" \ -e "parse error" ) if [ "$nul" == "" ] then echo ---- $filename # hacky seds because --no-comment has no effect. It also # exports strings, preambles, etc. # bib2bib \ -c 'author:"'$1'" or title:"'$1'"' \ < "$filename" \ | sed '/comment\|string\|preamble/{:1;N;s/{.*}//;T1}' \ | grep -a -v '^@comment\|^@string\|^@preamble' \ | sed '/^$/N;/^\n$/D' fi done ) 2> /dev/null
Some of the dark sed magic comes from here. The first sed replaces the contents of nested {curly {braces}}. The second compresses multiple empty lines into a single empty line. Invoked in a shell, the script produces the following output:
> find-bib.sh huffman ---- /home/steven/somewhere/part-ii.bib @article{capocelli-TIT-1986, author = {R. M. Capocelli and R. Giancarlo and I. J. Taneja}, journal = {IEEE Trans. Information Theory}, month = nov, number = {6}, pages = {854--857}, title = {{Bounds on the Redundancy of Huffman Codes}}, volume = {32}, year = {1986} }
*
* *
The script isn’t bullet proof. For one thing, the sed regexp doesn’t quite deals with stuff like this:
@Preamble{"\input bibnames.sty " # "\input path.sty " # "\ifx \k \undefined \let \k = \c \immediate\write16{Ogonek accent unavailable: replaced by cedilla}\fi " # "\ifx \undefined \FEATPOST \def \FEATPOST {{\manfnt FEAT}\-{\manfnt POST}\spacefactor1000 }\fi" # "\ifx \undefined \MP \def \MP {{\manfnt META}\-{\manfnt POST}\spacefactor1000 } \fi" # "\ifx \undefined \Xy \def \Xy {{\sc Xy}} \fi" # "\ifx \undefined \manfnt \font\manfnt=logo10 \fi" # "\ifx \undefined \pdfTeX \def \pdfTeX {pdf\TeX}\fi" # "\def \toenglish #1\endtoenglish{[{\em English:} #1\unskip]} " # "\hyphenation{ An-wen-der-ver-ei-ni-gung Bie-mes-der-fer Co-lo-phon Deutsch-spra-chi-ge Ge-leit-wort Hol-dys Katz-en-beiss-er Ko-lo-dziej-ska la-da-mi Lar-ra-bee Manu-scripts mark-up Rijks-uni-ver-si-teit South-all Stutt-gart }" }
I have no idea why this kind of stuff is necessary in a bib file. Still the script will strip everything except the hyphenation list. ¯\_(ツ)_/¯