The QOI Format

QOI

I recently discovered the . In a world of complex image formats, with long specifications and complicated licensing, there is certainly a need for an reasonable image format to just exchange pixmaps – without being as horrible as the X PixMap format.

QOI reminded me a lot of the image formats from the 80s, the run-length encoding is similar to the , and the video-mode on the Amiga. More generally, having pixel data with no defined container, and a encoding scheme whipped out of nowhere is also something that was common in the 80s.

That said, I’m not entirely sold on QOI. First I really think only the decoder should be the reference implementation, having both encoder and decoder being the reference means that the implementation takes precedence on the documentation, which is not nice. Also while the decoder is lossless, a lossy encoder seems perfectly reasonable. The pngquant program is a typical example of a lossy encoder for a lossless decoder (PNG).

When I look at the format itself, I would have done things a bit differently.

  • There are two instructions to express a pixel as a difference from the previous one, QOI_OP_DIFF and QOI_OP_LUMA, this feels redundant and I wonder how many diffs expressible in one format are expressible in the other. I would either have made sure the overlap is small or just gotten rid of the less expressive one.
  • A change in the value of the α channel forces the use of a QOI_OP_RGBA instructions. I would have split α channel changes into a separate instruction. This would simplify the implementation and be more efficient at the edges of the transparency mask. The assumption that there is one instruction per pixel does not hold anyway because of the QOI_OP_RUN instruction.
  • I’m not convinced by the QOI_OP_INDEX instruction. Why do we need a hash? We could just have a ring buffer containing the last 64 pixels, and have the offset within that buffer, no need to compute a hash, or worry about collisions.
  • The way the run-length encoding works, long runs can only be encoded incrementally: consider the case where we have 256 times the same pixel. First it needs to appear once, say with a QOI_OP_RGB. So we can repeat one previous pixels, then two, then four, etc. So we will need 8 instructions to encode the 256 pixels. Packbit can do this in two runs…
    I would have shifted the bias of the run size so that a minimum run is the 2 previous pixels. Repeating the previous pixel can be encoded with QOI_OP_DIFF.
  • If we are going to have a format from the 80s, I would replace or complement the array of past pixels with a 64 entry colour palette. This would allow to start common long runs efficiently, and allow for interesting lossy encoders as you basically want to minimise the number of pixels that need a full QOI_OP_RGB instruction. Generally, the run-length encoding can repeat patterns, which to me screams palette dithering…

That said, I fully understand the fun in fuzzing around with simple image formats, I certainly enjoyed writing my ILBM decoder.

Edit: There is a page with discussions/ideas for a second version.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.