Home » Linux » Debian » Steinar H. Gunderson: rANS encoding of signed coefficients

Steinar H. Gunderson: rANS encoding of signed coefficients

I’m currently trying to make sense of some still image coding
(more details to come at a much later stage!), and for a variety
of reasons, I’ve chosen to use rANS as the entropy coder. However,
there’s an interesting little detail that I haven’t actually seen
covered anywhere; maybe it’s just because I’ve missed something,
or maybe because it’s too blindingly obvious, but I thought I would
document what I ended up with anyway. (I had hoped for something
even more elegant, but I guess the obvious would have to do.)

For those that don’t know rANS coding, let me try to handwave it
as much as possible. Your state is typically a single word
(in my case, a 32-bit word), which is refilled from the input stream
as needed. The encoder and decoder works in reverse order;
let’s just talk about the decoder. Basically it works by looking
at the lowest 12 (or whatever) bits of the decoder state, mapping
each of those 2^12 slots to a decoded symbol. More common symbols
are given more slots, proportionally to the frequency. Let me just
write a tiny, tiny example with 2 bits and three symbols instead,
giving four slots:

Lowest bits Symbol
00 0
01 0
10 1
11 2

Note that the zero coefficient here maps to one out of two slots
(ie., a range); you don’t choose which one yourself, the encoder
stashes some information in there (which is used to recover the
next control word once you know which symbol there is).

Now for the actual problem: When storing DCT coefficients, we
typically want to also store a sign (ie., not just 1 or 2, but
also -1/+1 and -2/+2). The statistical distribution is symmetrical,
so the sign bit is incompressible (except that of course there’s
no sign bit needed for 0). We could have done this by introducing new symbols
-1 and -2 in addition to our three other ones, but this means we’ll
need more bits of precision, and accordingly larger look-up tables
(which is negative for performance). So let’s find something better.

We could also simply store it separately somehow; if the coefficient
is non-zero, store the bits in some separate repository. Perhaps
more elegantly, you can encode a second symbol in the rANS stream
with probability 1/2, but this is more expensive computationally.
But both of these have the problem that they’re divergent in terms
of control flow; nonzero coefficients potentially need to do a lot
of extra computation and even loads. This isn’t nice for SIMD, and
it’s not nice for GPU. It’s generally not really nice.

The solution I ended up with was simulating a larger table with
a smaller one. Simply rotate the table so that the zero symbol
has the top slots instead of the bottom slots, and then replicate
the rest of the table. For instance, take this new table:

Lowest bits Symbol
000 1
001 2
010 0
011 0
100 0
101 0
110 -1
111 -2

(The observant reader will note that this doesn’t describe the
exact same distribution as last time—zero has twice the relative
frequency as in the other table—but ignore that for the time
being.)

In this case, the bottom half of the table doesn’t actually need
to be stored! We know that if the three bottom bits are >= 110
(6 in decimal), we have a negative value, can subtract 6, and then
continue decoding. If we are go past the end of our 2-bit table
despite that, we know we are decoding a zero coefficient (which
doesn’t have a sign), so we can just clamp the read; or for a GPU,
reads out-of-bounds on a texture will typically return 0 anyway.
So it all works nicely, and the divergent I/O is gone.

If this pickled your interest, you probably want to read up on
rANS in general; Fabian Giesen (aka ryg) has some notes
that work as a good starting point, but beware; some of this is
pretty confusing. 🙂

Source: Debian Planet

Facebook Comments