# Steinar H. Gunderson: rANS encoding of signed coefficients

I’m currently trying to make sense of some still image coding

(more details to come at a much later stage!), and for a variety

of reasons, I’ve chosen to use rANS as the entropy coder. However,

there’s an interesting little detail that I haven’t actually seen

covered anywhere; maybe it’s just because I’ve missed something,

or maybe because it’s too blindingly obvious, but I thought I would

document what I ended up with anyway. (I had hoped for something

even more elegant, but I guess the obvious would have to do.)

For those that don’t know rANS coding, let me try to handwave it

as much as possible. Your state is typically a single word

(in my case, a 32-bit word), which is refilled from the input stream

as needed. The encoder and decoder works in reverse order;

let’s just talk about the decoder. Basically it works by looking

at the lowest 12 (or whatever) bits of the decoder state, mapping

each of those 2^12 slots to a decoded symbol. More common symbols

are given more slots, proportionally to the frequency. Let me just

write a tiny, tiny example with 2 bits and three symbols instead,

giving four slots:

Lowest bits | Symbol |
---|---|

00 | 0 |

01 | 0 |

10 | 1 |

11 | 2 |

Note that the zero coefficient here maps to one out of two slots

(ie., a range); you don’t choose which one yourself, the encoder

stashes some information in there (which is used to recover the

next control word once you know which symbol there is).

Now for the actual problem: When storing DCT coefficients, we

typically want to also store a sign (ie., not just 1 or 2, but

also -1/+1 and -2/+2). The statistical distribution is symmetrical,

so the sign bit is incompressible (except that of course there’s

no sign bit needed for 0). We could have done this by introducing new symbols

-1 and -2 in addition to our three other ones, but this means we’ll

need more bits of precision, and accordingly larger look-up tables

(which is negative for performance). So let’s find something better.

We could also simply store it separately somehow; if the coefficient

is non-zero, store the bits in some separate repository. Perhaps

more elegantly, you can encode a second symbol in the rANS stream

with probability 1/2, but this is more expensive computationally.

But both of these have the problem that they’re divergent in terms

of control flow; nonzero coefficients potentially need to do a lot

of extra computation and even loads. This isn’t nice for SIMD, and

it’s not nice for GPU. It’s generally not really nice.

The solution I ended up with was simulating a larger table with

a smaller one. Simply rotate the table so that the zero symbol

has the top slots instead of the bottom slots, and then replicate

the rest of the table. For instance, take this new table:

Lowest bits | Symbol |
---|---|

000 | 1 |

001 | 2 |

010 | 0 |

011 | 0 |

100 | 0 |

101 | 0 |

110 | -1 |

111 | -2 |

(The observant reader will note that this doesn’t describe the

exact same distribution as last time—zero has twice the relative

frequency as in the other table—but ignore that for the time

being.)

In this case, the bottom half of the table doesn’t actually need

to be stored! We know that if the three bottom bits are >= 110

(6 in decimal), we have a negative value, can subtract 6, and then

continue decoding. If we are go past the end of our 2-bit table

despite that, we know we are decoding a zero coefficient (which

doesn’t have a sign), so we can just clamp the read; or for a GPU,

reads out-of-bounds on a texture will typically return 0 anyway.

So it all works nicely, and the divergent I/O is gone.

If this pickled your interest, you probably want to read up on

rANS in general; Fabian Giesen (aka ryg) has some notes

that work as a good starting point, but beware; some of this is

pretty confusing. 🙂

Source: Debian Planet