Scanning documents or books without expensive hardware and commercial software can be tricky. This week, I give you the script I use to clean up a scanned image (and eventually assemble many of them into a single PDF document).
We’re all looking for documentation, books, and papers. Sometimes we’re lucky, we find the pristine PDF, rendered fresh from a text processor or maybe LaTeX. Sometimes we’re not so lucky, the only thing we can find is a collection of JPEG images with high compression ratios.
Scans of text are not always easy to clean up, even when they’re well done to begin with, they may be compressed with JPEG using a (too) high compression ratio, leading to conspicuous artifacts. These artifacts must be cleaned-up before printing or binding together in a PDF.