Cleaning Scans

February 21, 2017

Scanning documents or books without expensive hardware and commercial software can be tricky. This week, I give you the script I use to clean up a scanned image (and eventually assemble many of them into a single PDF document).

scanner

Read the rest of this entry »


Optimizing JPEG for bandwidth

September 1, 2015

Optimizing web content is always complicated. On one hand, you want your users to have the best possible user experience, but on the other hand, you don’t really want to spend much bandwidth delivering the bits.

compteur-small

This week, let’s have a look at how we can optimize images for perceptual quality while minimizing bandwidth. While we could proceed by guesswork—fiddling the parameters until it kind of looks OK—or we can take 5 minutes to write a script that searches the parameter space for the best solution given a constraint, say, perceptual quality.

Read the rest of this entry »


Getting Documents Back From JPEG Scans

July 6, 2010

We’re all looking for documentation, books, and papers. Sometimes we’re lucky, we find the pristine PDF, rendered fresh from a text processor or maybe LaTeX. Sometimes we’re not so lucky, the only thing we can find is a collection of JPEG images with high compression ratios.

Scans of text are not always easy to clean up, even when they’re well done to begin with, they may be compressed with JPEG using a (too) high compression ratio, leading to conspicuous artifacts. These artifacts must be cleaned-up before printing or binding together in a PDF.

Read the rest of this entry »