
I recently discovered the QOI image format. In a world of complex image formats, with long specifications and complicated licensing, there is certainly a need for an reasonable image format to just exchange pixmaps – without being as horrible as the X PixMap format.
QOI reminded me a lot of the image formats from the 80s, the run-length encoding is similar to the PackBits compression scheme, and the Hold & Modify video-mode on the Amiga. More generally, having pixel data with no defined container, and a encoding scheme whipped out of nowhere is also something that was common in the 80s.
That said, I’m not entirely sold on QOI. First I really think only the decoder should be the reference implementation, having both encoder and decoder being the reference means that the implementation takes precedence on the documentation, which is not nice. Also while the decoder is lossless, a lossy encoder seems perfectly reasonable. The pngquant program is a typical example of a lossy encoder for a lossless decoder (PNG).
When I look at the format itself, I would have done things a bit differently.
- There are two instructions to express a pixel as a difference from the previous one,
QOI_OP_DIFF
andQOI_OP_LUMA
, this feels redundant and I wonder how many diffs expressible in one format are expressible in the other. I would either have made sure the overlap is small or just gotten rid of the less expressive one. - A change in the value of the α channel forces the use of a
QOI_OP_RGBA
instructions. I would have split α channel changes into a separate instruction. This would simplify the implementation and be more efficient at the edges of the transparency mask. The assumption that there is one instruction per pixel does not hold anyway because of theQOI_OP_RUN
instruction. - I’m not convinced by the
QOI_OP_INDEX
instruction. Why do we need a hash? We could just have a ring buffer containing the last 64 pixels, and have the offset within that buffer, no need to compute a hash, or worry about collisions. - The way the run-length encoding works, long runs can only be encoded incrementally: consider the case where we have 256 times the same pixel. First it needs to appear once, say with a
QOI_OP_RGB
. So we can repeat one previous pixels, then two, then four, etc. So we will need 8 instructions to encode the 256 pixels. Packbit can do this in two runs…
I would have shifted the bias of the run size so that a minimum run is the 2 previous pixels. Repeating the previous pixel can be encoded withQOI_OP_DIFF
. - If we are going to have a format from the 80s, I would replace or complement the array of past pixels with a 64 entry colour palette. This would allow to start common long runs efficiently, and allow for interesting lossy encoders as you basically want to minimise the number of pixels that need a full
QOI_OP_RGB
instruction. Generally, the run-length encoding can repeat patterns, which to me screams palette dithering…
That said, I fully understand the fun in fuzzing around with simple image formats, I certainly enjoyed writing my ILBM decoder.
Edit: There is a page with discussions/ideas for a second version.