Converting PDFs to Hard B+W

Nothing too this week: how to convert a Djvu or PDF to hard black and white PDF—not shades of gray. Why would you want to do that anyway? Well, you may, like me, have a printer that has no concept of color calibration and has dreadful half-toning algorithms, resulting in unreadable text and no contrast when you print a Djvu or a PDF of a scanned book.

dudeney-small

I’ve googled a good while before constructing a solution that (mostly) works correctly. The conversion script only has a few lines of Bash:

#!/usr/bin/env bash

thresh=$2
thresh=${thresh:=25%}

convert -verbose \
    -density 300 \
    "$1" \
    -type bilevel \
    -threshold $thresh \
    -despeckle \
    -quality 100 \
    black+white.pdf

Where convert is one of ImageMagick‘s command line tools. Note that argument order is quite important:

  • -density 300 forces the source to be reinterpreted as a 300dpi image. This will help build a high resolution source image for the conversion.
  • "$1" is the file to be converted. It will generally work well with PDFs, but I saw a couple of Djvu files that causes problems (convert converts a few pages to blank pages for no obvious reasons).
  • -type bilevel sets the output to binary.
  • -threshold $thresh uses this threshold in %, to convert from black to white, below the threshold, it is converted to black, above, white. (I also had issues with Djvu: sometimes the threshold behaves randomly.)
  • -despeckle filters the image to remove random and lone pixels.
  • -quality 100 sets the output to maximum quality (lossless compression?)
  • black+white.pdf is the output file (ImageMagick will add this name to the PDF’s metadata, so it will appear as the title of the document).

When I said it mostly works, the three main issues are:

  • The threshold isn’t image-local adaptive, it’s document-wide; if there are darker or lighter pages, it may give less than cromulent results.
  • Djvu support is flaky. Some document convert with blank pages.
  • It uses lots of memory, in direct relation to the -density argument. 300 is often too large, you may need to use a coarser resolution, maybe 150 (it will uses 4× less memory).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: