Ohta (Colorspaces III)

April 24, 2018

Let’s continue with lesser known color spaces. In 1980, Yu-Ichi Ohta [1] to segment images based on colors, and to do this, introduced a new colorspace—or more precisely, two variants of the same color space.

Ohta’s concern wasn’t image coding but region separation. He supposed (without much evidence) that a color space with a basis close to the principal components of the colors in the image should be maximally discriminant. He then proposed that the colorspace

Read the rest of this entry »

Semigraphics Compression

November 7, 2017

ANSI art and poor resolution may appeal to the nostalgic, those in want of the time when BBS were still it and the IBM PC’s programmable character set was the nec plus ultra of semigraphics, but they’re not really useful. At best, we can use them to dispense ourselves from using ncurse and still getting some colors and effects

However, “semigraphics” may have their use in lossy data compression, were we allow some data to be lost to gain some more compression. That may be especially true when we have very little computing power or if we want to have many simple CPUs in parallel doing the decoding.

Read the rest of this entry »

New Block Order

March 21, 2017

The idea of reorganizing data before compression isn’t new. Almost twenty five years ago, Burrows and Wheeler proposed a block sorting transform to reorder data to help compression. The idea here is to try to create repetitions that can be exploited by a second compression engine.

But the Burrows-Wheeler transform isn’t the only possible one. There are a few other techniques to generate (reversible) permutations of the input.

Read the rest of this entry »

Getting Documents Back From JPEG Scans

July 6, 2010

We’re all looking for documentation, books, and papers. Sometimes we’re lucky, we find the pristine PDF, rendered fresh from a text processor or maybe LaTeX. Sometimes we’re not so lucky, the only thing we can find is a collection of JPEG images with high compression ratios.

Scans of text are not always easy to clean up, even when they’re well done to begin with, they may be compressed with JPEG using a (too) high compression ratio, leading to conspicuous artifacts. These artifacts must be cleaned-up before printing or binding together in a PDF.

Read the rest of this entry »

Picking a Bit-Rate for MP3 Files?

April 6, 2010

I am currently contemplating the possibility of recoding all my CDs into new compressed formats. I currently use MP3 but since I started my collection in a time when a huge hard drive was 4GB, a lot of those are coded at 96 and 112 kilobits/s… which doesn’t sound all that great. Although MP3 isn’t the greatest format around, it does offer the advantage of being compatible with nearly all players around, something neither FLAC nor Ogg Vorbis can claim.

Some of the most recent devices support the addition of codecs, but not all. My Sansa player doesn’t, although I could probably upgrade the firmware or something but my car built-in player is another story. So for the time being, I considered recoding everything in MP3 with a higher bit-rate, or maybe use FLAC and write some kind of script that transcode the files to MP3 for exporting to a player. I’m not decided yet. But let say I decide to stick with MP3 all the way. What bit-rate should I use?

Read the rest of this entry »

Ad Hoc Compression Methods: Digram Coding

February 17, 2009

Ad hoc compression algorithms always sound somewhat flimsy, but they still have their usefulness whenever the data you have to compress exhibits very special properties or is extremely short. Universal compression algorithms need a somewhat lengthy input to adapt sufficiently to yield good compression. Moreover, you may not have the computing power (CPU speed, RAM and program space) to implement full-blown compressors. In this case also, an ad hoc algorithm may be useful.

Digram (bigram, digraph, bigraph) coding is one of those ad hoc algorithms. Digram coding will compress text (or any source, really) by finding pairs of symbols that appear often and assigning them codes corresponding to used symbols, if any.


Let us see in detail what digram compression is, and how effective it really is.

Read the rest of this entry »