H.264 CABAC overview

For further information please see

Context-Based Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard (Detlev Marpe, Heiko Schwarz and Thomas Wiegand)

H.264/MPEG-4 AVC defines several different profiles, which specify which coding methods and parameters are allowed in a stream. The Main and High profiles allow the use of context adaptive binary arithmetic coding (CABAC), which offers improved compression performance (around 10% bit saving) compared to the context adaptive variable length coding (CAVLC) method which is available as an alternative, though it is more expensive to implement.

Overview of the entropy coding process

The entropy encoder takes as input a sequence of symbols representing samples and control information and maps this onto a binary bitstream which is output into the container. In contrast with earlier compression stages, the entropy coding is lossless; the decoder will reproduce the exact sequence of symbols which was input to the entropy encoder during compression.

H.264's implementation of CABAC creates the bitstream in three stages.

In binarization, each symbol to be output is uniquely mapped onto a binary string, called a bin string. Each bit position in the bin string is called a bin. Each bin is then passed to one of two coding modes: in regular coding mode, the next step, context modelling, is applied and the resulting context model and bin value are passed to the binary arithmetic coding engine; in bypass mode, context modelling is skipped and the bin is passed directly to a bypass coding engine, skipping the context modelling stage.
In context modelling (only used for regular coding mode) a bin is categorised for coding under a particular probability model. Each probability model has its state represented by a context variable which is a pair (most probable symbol in {0, 1}, probability of less probable symbol). Arithmetic coding is applied using the chosen context model and updates its context variable.
In binary arithmetic coding the value of the bin is used to update the context variable if applicable, and bits are output into the bitstream.

Binarization

The input to this process is a symbol (syntax element) to be coded, such as a quantization transform coefficient, macroblock type specifier or a motion vector component. The mapping onto bin strings should be close to a minimum redundancy code.

Main types of binarization

Four main types of binarization are defined:

Unary code – the value x ≥ 0 is mapped onto x 1 bits followed by a 0 bit.
Truncated unary (TU) code – the value 0 ≤ x ≤ S is coded with a unary code if x < S, or x 1 bits otherwise. If the condition 0 ≤ x ≤ S holds, the truncated unary encoding of value x is given by
```
def tu(s, x):
  for i in range(0, min(s, x)):
    put(1)
  if x < s:
    put(0)
```

kth order Exp-Golomb (EGk) code – the value x is mapped onto three sequential bit strings: a prefix, suffix and sign bit. The construction of a kth order Exp-Golomb code for value x is given by

def egk(k, x):
  while True:
    if x >= (1 << k):
      put(1) # bit of the prefix
      x = x - (1 << k)
      k = k + 1
    else:
      put(0) # end of the prefix
      while k > 0:
        k = k - 1
        put((x >> k) & 0x01) # bit of the suffix
      break

Fixed-length (FL) code – the value x < S is mapped onto its binary representation, using ceil(log₂S) bits.

There are also five unstructed binary trees defined manually for coding macroblock and submacroblock types.

Concatenated binarizations

The codes can also be concatenated. There are three situations where concatenations of the four basic types are used:

coded_block_pattern is encoded using a 4-bit FL prefix (for luma) and a TU suffix with S = 2 for chroma.
Motion vector differences are encoded with a concatenation of a unary prefix and a 3rd order Exp-Golomb code suffix: for a value mvd, the prefix is a TU coding with S = 9 of the value min(|mvd|,9), or, if mvd = 0, just the bit 0. If |mvd| ≥ 9, a suffix is output with the value |mvd| - 9 using the EG3 code. A sign bit is then output if |mvd| > 0: 0 if mvd is positive and 1 otherwise. The following code performs this coding, referencing the coding procedures for the main types of binarization above.
```
def uegk(s, k, x):
  absx = abs(x)
  tu(s, absx)
  absx = absx - 9
  if absx >= 0:
    egk(k, absx)
  sgn(x)
  
def sgn(x):
  if x == 0:
    return
  if x < 0:
    put(1)
  else:
    put(0)
```
Absolute values of transform coefficient levels (coeff_abs_value_minus1 = abs_level - 1 is coded, as the positions of zero-valued coefficients are specified in a map) are coded using a TU prefix with S = 14 and an EG0 suffix.

Context modelling

The context modelling stage associates a context model with each bin output by the binarization stage.

There are four basic types of context modelling, which associate a probability with each bin based on previously coded values or other symbols in the neighbourhood:

Up to two neighbouring syntax elements are chosen based on the syntax element being coded, and the context model for the bin being coded is chosen based on the context model of the related bin in the neighbour syntax elements. For example, the context model of the related bin in the syntax elements in the above and left bins may be selected for the current bin.
For the mb_type and sub_mb_type syntax elements, the model for a bin b_i with prior coded bins (b₀,b₁, …, b_i-1) is chosen based on those prior bin values.
On residual data only, based on position in the scanning path
On residual data only, based on number of encoded levels with a particular value prior to the current level bin being coded.

The context modelling process only ever references past values within the same slice.

Each syntax element may use one of a range of models, each of which is denoted by a context index. The possible models for each syntax element are given in Table 9-11 of the standard, which specifies the allowable values for the context index γ for each element. The range of allowed values of context index for mb_type, sub_mb_type and mb_skip_flag depends on the slice type being coded (SI/I, SP/P or B).

Each probability model (uniquely associated with a context index) consists of a pair of two values: a 6-bit probability state index σ_γ and a single bit which is the most probable symbol (MPS). Each model is therefore represented by a 7-bit value.

Macroblock type, submacroblock type, spatial and temporal prediction modes, slice- and macroblock-based control information syntax elements all use context indices between 0 and 72. The context index is calculated as γ = Γ_S + χ_S where Γ_S is the context index offset, which is the lowest value in the allowable range for the syntax element's context index, and χ_S is a context index increment, which specifies the offset within the range. χ_S may either depend only on the bin index (giving a fixed assignment of probability model to each bin), or it may specify one of the first two context modelling types above.

Context indices in the range 73 to 398 are used for coding residual data (except for γ = 276 which is associated with the end of slice flag).

significant_coeff_flag and last_significant_coeff_flag use different models depending on whether they are in frame or field mode. Not all context models are used in frame-only/field-only pictures.

The model for coded_block_pattern is specified using γ = Γ_S + χ_S. All other syntax elements of residual data use the relation γ = Γ_S + Δ_S(ctx_cat) + χ_S, where the context category dependent offset Δ_S. Table 9-40 in the standard specifies the value of this offset, in terms of the context category which is given for each block type in Table 9-42.

Binary arithmetic coding

Arithmetic coding works by representing an interval within [0, 1] by two values: a lower bound L and a range R, and recursively subdividing this interval using the probability and value of each input bit: on reception of a more probably symbol (MPS), with probability p_MPS, the interval is updated to have width R_MPS := R ⋅ p_MPS (the corresponding operation for a less probable symbol would update the interval to have width R_LPS := R ⋅ p_LPS then update the lower bound L := L + R - R_LPS).

H.264/MPEG-4 AVC CABAC uses a modulo-coder (M coder) as its binary arithmetic coding implementation. It avoids the multiplication above by quantizing the value of the interval width R onto a small set of values Q = {Q₀, Q₁, …, Q_{K - 1}}, and the probability range of the the less probable symbol p_LPS (0, 0.5] onto another set of values P = {p₀, p₁, …, p_{N - 1}}. The tradeoff chosen for H.264 was K = 4 quantized range values and N = 64 probability vaulues.

For the bypass-mode coding engine, the probability estimation stage is omitted.