tor-browser

The Tor Browser
git clone https://git.dasho.dev/tor-browser.git
Log | Files | Refs | README | LICENSE

av1_encoder.dox (75620B)


      1 /*!\page encoder_guide AV1 ENCODER GUIDE
      2 
      3 \tableofcontents
      4 
      5 \section architecture_introduction Introduction
      6 
      7 This document provides an architectural overview of the libaom AV1 encoder.
      8 
      9 It is intended as a high level starting point for anyone wishing to contribute
     10 to the project, that will help them to more quickly understand the structure
     11 of the encoder and find their way around the codebase.
     12 
     13 It stands above and will where necessary link to more detailed function
     14 level documents.
     15 
     16 \subsection  architecture_gencodecs Generic Block Transform Based Codecs
     17 
     18 Most modern video encoders including VP8, H.264, VP9, HEVC and AV1
     19 (in increasing order of complexity) share a common basic paradigm. This
     20 comprises separating a stream of raw video frames into a series of discrete
     21 blocks (of one or more sizes), then computing a prediction signal and a
     22 quantized, transform coded, residual error signal. The prediction and residual
     23 error signal, along with any side information needed by the decoder, are then
     24 entropy coded and packed to form the encoded bitstream. See Figure 1: below,
     25 where the blue blocks are, to all intents and purposes, the lossless parts of
     26 the encoder and the red block is the lossy part.
     27 
     28 This is of course a gross oversimplification, even in regard to the simplest
     29 of the above codecs.  For example, all of them allow for block based
     30 prediction at multiple different scales (i.e. different block sizes) and may
     31 use previously coded pixels in the current frame for prediction or pixels from
     32 one or more previously encoded frames. Further, they may support multiple
     33 different transforms and transform sizes and quality optimization tools like
     34 loop filtering.
     35 
     36 \image html genericcodecflow.png "" width=70%
     37 
     38 \subsection architecture_av1_structure AV1 Structure and Complexity
     39 
     40 As previously stated, AV1 adopts the same underlying paradigm as other block
     41 transform based codecs. However, it is much more complicated than previous
     42 generation codecs and supports many more block partitioning, prediction and
     43 transform options.
     44 
     45 AV1 supports block partitions of various sizes from 128x128 pixels down to 4x4
     46 pixels using a multi-layer recursive tree structure as illustrated in figure 2
     47 below.
     48 
     49 \image html av1partitions.png "" width=70%
     50 
     51 AV1 also provides 71 basic intra prediction modes, 56 single frame inter prediction
     52 modes (7 reference frames x 4 modes x 2 for OBMC (overlapped block motion
     53 compensation)), 12768 compound inter prediction modes (that combine inter
     54 predictors from two reference frames) and 36708 compound inter / intra
     55 prediction modes. Furthermore, in addition to simple inter motion estimation,
     56 AV1 also supports warped motion prediction using affine transforms.
     57 
     58 In terms of transform coding, it has 16 separable 2-D transform kernels
     59 \f$(DCT, ADST, fADST, IDTX)^2\f$ that can be applied at up to 19 different
     60 scales from 64x64 down to 4x4 pixels.
     61 
     62 When combined together, this means that for any one 8x8 pixel block in a
     63 source frame, there are approximately 45,000,000 different ways that it can
     64 be encoded.
     65 
     66 Consequently, AV1 requires complex control processes. While not necessarily
     67 a normative part of the bitstream, these are the algorithms that turn a set
     68 of compression tools and a bitstream format specification, into a coherent
     69 and useful codec implementation. These may include but are not limited to
     70 things like :-
     71 
     72 - Rate distortion optimization (The process of trying to choose the most
     73   efficient combination of block size, prediction mode, transform type
     74   etc.)
     75 - Rate control (regulation of the output bitrate)
     76 - Encoder speed vs quality trade offs.
     77 - Features such as two pass encoding or optimization for low delay
     78   encoding.
     79 
     80 For a more detailed overview of AV1's encoding tools and a discussion of some
     81 of the design considerations and hardware constraints that had to be
     82 accommodated, please refer to <a href="https://arxiv.org/abs/2008.06091">
     83 A Technical Overview of AV1</a>.
     84 
     85 Figure 3 provides a slightly expanded but still simplistic view of the
     86 AV1 encoder architecture with blocks that relate to some of the subsequent
     87 sections of this document. In this diagram, the raw uncompressed frame buffers
     88 are shown in dark green and the reconstructed frame buffers used for
     89 prediction in light green. Red indicates those parts of the codec that are
     90 (or may be) lossy, where fidelity can be traded off against compression
     91 efficiency, whilst light blue shows algorithms or coding tools that are
     92 lossless. The yellow blocks represent non-bitstream normative configuration
     93 and control algorithms.
     94 
     95 \image html av1encoderflow.png "" width=70%
     96 
     97 \section architecture_command_line The Libaom Command Line Interface
     98 
     99  Add details or links here: TODO ? elliotk@
    100 
    101 \section architecture_enc_data_structures Main Encoder Data Structures
    102 
    103 The following are the main high level data structures used by the libaom AV1
    104 encoder and referenced elsewhere in this overview document:
    105 
    106 - \ref AV1_PRIMARY
    107     - \ref AV1_PRIMARY.gf_group (\ref GF_GROUP)
    108     - \ref AV1_PRIMARY.lap_enabled
    109     - \ref AV1_PRIMARY.twopass (\ref TWO_PASS)
    110     - \ref AV1_PRIMARY.p_rc (\ref PRIMARY_RATE_CONTROL)
    111     - \ref AV1_PRIMARY.tf_info (\ref TEMPORAL_FILTER_INFO)
    112 
    113 - \ref AV1_COMP
    114     - \ref AV1_COMP.oxcf (\ref AV1EncoderConfig)
    115     - \ref AV1_COMP.rc (\ref RATE_CONTROL)
    116     - \ref AV1_COMP.speed
    117     - \ref AV1_COMP.sf (\ref SPEED_FEATURES)
    118 
    119 - \ref AV1EncoderConfig (Encoder configuration parameters)
    120     - \ref AV1EncoderConfig.pass
    121     - \ref AV1EncoderConfig.algo_cfg (\ref AlgoCfg)
    122     - \ref AV1EncoderConfig.kf_cfg (\ref KeyFrameCfg)
    123     - \ref AV1EncoderConfig.rc_cfg (\ref RateControlCfg)
    124 
    125 - \ref AlgoCfg (Algorithm related configuration parameters)
    126     - \ref AlgoCfg.arnr_max_frames
    127     - \ref AlgoCfg.arnr_strength
    128 
    129 - \ref KeyFrameCfg (Keyframe coding configuration parameters)
    130     - \ref KeyFrameCfg.enable_keyframe_filtering
    131 
    132 - \ref RateControlCfg (Rate control configuration)
    133     - \ref RateControlCfg.mode
    134     - \ref RateControlCfg.target_bandwidth
    135     - \ref RateControlCfg.best_allowed_q
    136     - \ref RateControlCfg.worst_allowed_q
    137     - \ref RateControlCfg.cq_level
    138     - \ref RateControlCfg.under_shoot_pct
    139     - \ref RateControlCfg.over_shoot_pct
    140     - \ref RateControlCfg.maximum_buffer_size_ms
    141     - \ref RateControlCfg.starting_buffer_level_ms
    142     - \ref RateControlCfg.optimal_buffer_level_ms
    143     - \ref RateControlCfg.vbrbias
    144     - \ref RateControlCfg.vbrmin_section
    145     - \ref RateControlCfg.vbrmax_section
    146 
    147 - \ref PRIMARY_RATE_CONTROL (Primary Rate control status)
    148     - \ref PRIMARY_RATE_CONTROL.gf_intervals[]
    149     - \ref PRIMARY_RATE_CONTROL.cur_gf_index
    150 
    151 - \ref RATE_CONTROL (Rate control status)
    152     - \ref RATE_CONTROL.intervals_till_gf_calculate_due
    153     - \ref RATE_CONTROL.frames_till_gf_update_due
    154     - \ref RATE_CONTROL.frames_to_key
    155 
    156 - \ref TWO_PASS (Two pass status and control data)
    157 
    158 - \ref GF_GROUP (Data related to the current GF/ARF group)
    159 
    160 - \ref FIRSTPASS_STATS (Defines entries in the first pass stats buffer)
    161     - \ref FIRSTPASS_STATS.coded_error
    162 
    163 - \ref SPEED_FEATURES (Encode speed vs quality tradeoff parameters)
    164     - \ref SPEED_FEATURES.hl_sf (\ref HIGH_LEVEL_SPEED_FEATURES)
    165 
    166 - \ref HIGH_LEVEL_SPEED_FEATURES
    167     - \ref HIGH_LEVEL_SPEED_FEATURES.recode_loop
    168     - \ref HIGH_LEVEL_SPEED_FEATURES.recode_tolerance
    169 
    170 - \ref TplParams
    171 
    172 \section architecture_enc_use_cases Encoder Use Cases
    173 
    174 The libaom AV1 encoder is configurable to support a number of different use
    175 cases and rate control strategies.
    176 
    177 The principle use cases for which it is optimised are as follows:
    178 
    179  - <b>Video on Demand / Streaming</b>
    180  - <b>Low Delay or Live Streaming</b>
    181  - <b>Video Conferencing / Real Time Coding (RTC)</b>
    182  - <b>Fixed Quality / Testing</b>
    183 
    184 Other examples of use cases for which the encoder could be configured but for
    185 which there is less by way of specific optimizations include:
    186 
    187  - <b>Download and Play</b>
    188  - <b>Disk Playback</b>>
    189  - <b>Storage</b>
    190  - <b>Editing</b>
    191  - <b>Broadcast video</b>
    192 
    193 Specific use cases may have particular requirements or constraints. For
    194 example:
    195 
    196 <b>Video Conferencing:</b>  In a video conference we need to encode the video
    197 in real time and to avoid any coding tools that could increase latency, such
    198 as frame look ahead.
    199 
    200 <b>Live Streams:</b> In cases such as live streaming of games or events, it
    201 may be possible to allow some limited buffering of the video and use of
    202 lookahead coding tools to improve encoding quality. However,  whilst a lag of
    203 a second or two may be fine given the one way nature of this type of video,
    204 it is clearly not possible to use tools such as two pass coding.
    205 
    206 <b>Broadcast:</b> Broadcast video (e.g. digital TV over satellite) may have
    207 specific requirements such as frequent and regular key frames (e.g. once per
    208 second or more) as these are important as entry points to users when switching
    209 channels. There may also be  strict upper limits on bandwidth over a short
    210 window of time.
    211 
    212 <b>Download and Play:</b> Download and play applications may have less strict
    213 requirements in terms of local frame by frame rate control but there may be a
    214 requirement to accurately hit a file size target for the video clip as a
    215 whole. Similar considerations may apply to playback from mass storage devices
    216 such as DVD or disk drives.
    217 
    218 <b>Editing:</b> In certain special use cases such as offline editing, it may
    219 be desirable to have very high quality and data rate but also very frequent
    220 key frames or indeed to encode the video exclusively as key frames. Lossless
    221 video encoding may also be required in this use case.
    222 
    223 <b>VOD / Streaming:</b> One of the most important and common use cases for AV1
    224 is video on demand or streaming, for services such as YouTube and Netflix. In
    225 this use case it is possible to do two or even multi-pass encoding to improve
    226 compression efficiency. Streaming services will often store many encoded
    227 copies of a video at different resolutions and data rates to support users
    228 with different types of playback device and bandwidth limitations.
    229 Furthermore, these services support dynamic switching between multiple
    230 streams, so that they can respond to changing network conditions.
    231 
    232 Exact rate control when encoding for a specific format (e.g 360P or 1080P on
    233 YouTube) may not be critical, provided that the video bandwidth remains within
    234 allowed limits. Whilst a format may have a nominal target data rate, this can
    235 be considered more as the desired average egress rate over the video corpus
    236 rather than a strict requirement for any individual clip. Indeed, in order
    237 to maintain optimal quality of experience for the end user, it may be
    238 desirable to encode some easier videos or sections of video at a lower data
    239 rate and harder videos or sections at a higher rate.
    240 
    241 VOD / streaming does not usually require very frequent key frames (as in the
    242 broadcast case) but key frames are important in trick play (scanning back and
    243 forth to different points in a video) and for adaptive stream switching. As
    244 such, in a use case like YouTube, there is normally an upper limit on the
    245 maximum time between key frames of a few seconds, but within certain limits
    246 the encoder can try to align key frames with real scene cuts.
    247 
    248 Whilst encoder speed may not seem to be as critical in this use case, for
    249 services such as YouTube, where millions of new videos have to be encoded
    250 every day, encoder speed is still important, so libaom allows command line
    251 control of the encode speed vs quality trade off.
    252 
    253 <b>Fixed Quality / Testing Mode:</b> Libaom also has a fixed quality encoder
    254 pathway designed for testing under highly constrained conditions.
    255 
    256 \section architecture_enc_speed_quality Speed vs Quality Trade Off
    257 
    258 In any modern video encoder there are trade offs that can be made in regard to
    259 the amount of time spent encoding a video or video frame vs the quality of the
    260 final encode.
    261 
    262 These trade offs typically limit the scope of the search for an optimal
    263 prediction / transform combination with faster encode modes doing fewer
    264 partition, reference frame, prediction mode and transform searches at the cost
    265 of some reduction in coding efficiency.
    266 
    267 The pruning of the size of the search tree is typically based on assumptions
    268 about the likelihood of different search modes being selected based on what
    269 has gone before and features such as the dimensions of the video frames and
    270 the Q value selected for encoding the frame. For example certain intra modes
    271 are less likely to be chosen at high Q but may be more likely if similar
    272 modes were used for the previously coded blocks above and to the left of the
    273 current block.
    274 
    275 The speed settings depend both on the use case (e.g. Real Time encoding) and
    276 an explicit speed control passed in on the command line as <b>--cpu-used</b>
    277 and stored in the \ref AV1_COMP.speed field of the main compressor instance
    278 data structure (<b>cpi</b>).
    279 
    280 The control flags for the speed trade off are stored the \ref AV1_COMP.sf
    281 field of the compressor instancve and are set in the following functions:-
    282 
    283 - \ref av1_set_speed_features_framesize_independent()
    284 - \ref av1_set_speed_features_framesize_dependent()
    285 - \ref av1_set_speed_features_qindex_dependent()
    286 
    287 A second factor impacting the speed of encode is rate distortion optimisation
    288 (<b>rd vs non-rd</b> encoding).
    289 
    290 When rate distortion optimization is enabled each candidate combination of
    291 a prediction mode and transform coding strategy is fully encoded and the
    292 resulting error (or distortion) as compared to the original source and the
    293 number of bits used, are passed to a rate distortion function. This function
    294 converts the distortion and cost in bits to a single <b>RD</b> value (where
    295 lower is better). This <b>RD</b> value is used to decide between different
    296 encoding strategies for the current block where, for example, a one may
    297 result in a lower distortion but a larger number of bits.
    298 
    299 The calculation of this <b>RD</b> value is broadly speaking as follows:
    300 
    301 \f[
    302   RD = (&lambda; * Rate) + Distortion
    303 \f]
    304 
    305 This assumes a linear relationship between the number of bits used and
    306 distortion (represented by the rate multiplier value <b>&lambda;</b>) which is
    307 not actually valid across a broad range of rate and distortion values.
    308 Typically, where distortion is high, expending a small number of extra bits
    309 will result in a large change in distortion. However, at lower values of
    310 distortion the cost in bits of each incremental improvement is large.
    311 
    312 To deal with this we scale the value of <b>&lambda;</b> based on the quantizer
    313 value chosen for the frame. This is assumed to be a proxy for our approximate
    314 position on the true rate distortion curve and it is further assumed that over
    315 a limited range of distortion values, a linear relationship between distortion
    316 and rate is a valid approximation.
    317 
    318 Doing a rate distortion test on each candidate prediction / transform
    319 combination is expensive in terms of cpu cycles. Hence, for cases where encode
    320 speed is critical, libaom implements a non-rd pathway where the <b>RD</b>
    321 value is estimated based on the prediction error and quantizer setting.
    322 
    323 \section architecture_enc_src_proc Source Frame Processing
    324 
    325 \subsection architecture_enc_frame_proc_data Main Data Structures
    326 
    327 The following are the main data structures referenced in this section
    328 (see also \ref architecture_enc_data_structures):
    329 
    330 - \ref AV1_PRIMARY ppi (the primary compressor instance data structure)
    331     - \ref AV1_PRIMARY.tf_info (\ref TEMPORAL_FILTER_INFO)
    332 
    333 - \ref AV1_COMP cpi (the main compressor instance data structure)
    334     - \ref AV1_COMP.oxcf (\ref AV1EncoderConfig)
    335 
    336 - \ref AV1EncoderConfig (Encoder configuration parameters)
    337     - \ref AV1EncoderConfig.algo_cfg (\ref AlgoCfg)
    338     - \ref AV1EncoderConfig.kf_cfg (\ref KeyFrameCfg)
    339 
    340 - \ref AlgoCfg (Algorithm related configuration parameters)
    341     - \ref AlgoCfg.arnr_max_frames
    342     - \ref AlgoCfg.arnr_strength
    343 
    344 - \ref KeyFrameCfg (Keyframe coding configuration parameters)
    345     - \ref KeyFrameCfg.enable_keyframe_filtering
    346 
    347 \subsection architecture_enc_frame_proc_ingest Frame Ingest / Coding Pipeline
    348 
    349  To encode a frame, first call \ref av1_receive_raw_frame() to obtain the raw
    350  frame data. Then call \ref av1_get_compressed_data() to encode raw frame data
    351  into compressed frame data. The main body of \ref av1_get_compressed_data()
    352  is \ref av1_encode_strategy(), which determines high-level encode strategy
    353  (frame type, frame placement, etc.) and then encodes the frame by calling
    354  \ref av1_encode(). In \ref av1_encode(), \ref av1_first_pass() will execute
    355  the first_pass of two-pass encoding, while \ref encode_frame_to_data_rate()
    356  will perform the final pass for either one-pass or two-pass encoding.
    357 
    358  The main body of \ref encode_frame_to_data_rate() is
    359  \ref encode_with_recode_loop_and_filter(), which handles encoding before
    360  in-loop filters (with recode loops \ref encode_with_recode_loop(), or
    361  without any recode loop \ref encode_without_recode()), followed by in-loop
    362  filters (deblocking filters \ref loopfilter_frame(), CDEF filters and
    363  restoration filters \ref cdef_restoration_frame()).
    364 
    365  Except for rate/quality control, both \ref encode_with_recode_loop() and
    366  \ref encode_without_recode() call \ref av1_encode_frame() to manage the
    367  reference frame buffers and \ref encode_frame_internal() to perform the
    368  rest of encoding that does not require access to external frames.
    369  \ref encode_frame_internal() is the starting point for the partition search
    370  (see \ref architecture_enc_partitions).
    371 
    372 \subsection architecture_enc_frame_proc_tf Temporal Filtering
    373 
    374 \subsubsection architecture_enc_frame_proc_tf_overview Overview
    375 
    376 Video codecs exploit the spatial and temporal correlations in video signals to
    377 achieve compression efficiency. The noise factor in the source signal
    378 attenuates such correlation and impedes the codec performance. Denoising the
    379 video signal is potentially a promising solution.
    380 
    381 One strategy for denoising a source is motion compensated temporal filtering.
    382 Unlike image denoising, where only the spatial information is available,
    383 video denoising can leverage a combination of the spatial and temporal
    384 information. Specifically, in the temporal domain, similar pixels can often be
    385 tracked along the motion trajectory of moving objects. Motion estimation is
    386 applied to neighboring frames to find similar patches or blocks of pixels that
    387 can be combined to create a temporally filtered output.
    388 
    389 AV1, in common with VP8 and VP9, uses an in-loop motion compensated temporal
    390 filter to generate what are referred to as alternate reference frames (or ARF
    391 frames). These can be encoded in the bitstream and stored as frame buffers for
    392 use in the prediction of subsequent frames, but are not usually directly
    393 displayed (hence they are sometimes referred to as non-display frames).
    394 
    395 The following command line parameters set the strength of the filter, the
    396 number of frames used and determine whether filtering is allowed for key
    397 frames.
    398 
    399 - <b>--arnr-strength</b> (\ref AlgoCfg.arnr_strength)
    400 - <b>--arnr-maxframes</b> (\ref AlgoCfg.arnr_max_frames)
    401 - <b>--enable-keyframe-filtering</b>
    402   (\ref KeyFrameCfg.enable_keyframe_filtering)
    403 
    404 Note that in AV1, the temporal filtering scheme is designed around the
    405 hierarchical ARF based pyramid coding structure. We typically apply denoising
    406 only on key frame and ARF frames at the highest (and sometimes the second
    407 highest) layer in the hierarchical coding structure.
    408 
    409 \subsubsection architecture_enc_frame_proc_tf_algo Temporal Filtering Algorithm
    410 
    411 Our method divides the current frame into "MxM" blocks. For each block, a
    412 motion search is applied on frames before and after the current frame. Only
    413 the best matching patch with the smallest mean square error (MSE) is kept as a
    414 candidate patch for a neighbour frame. The current block is also a candidate
    415 patch. A total of N candidate patches are combined to generate the filtered
    416 output.
    417 
    418 Let f(i) represent the filtered sample value and \f$p_{j}(i)\f$ the sample
    419 value of the j-th patch. The filtering process is:
    420 
    421 \f[
    422   f(i) = \frac{p_{0}(i) + \sum_{j=1}^{N} &omega;_{j}(i).p_{j}(i)}
    423               {1 + \sum_{j=1}^{N} &omega;_{j}(i)}
    424 \f]
    425 
    426 where \f$ &omega;_{j}(i) \f$ is the weight of the j-th patch from a total of
    427 N patches. The weight is determined by the patch difference as:
    428 
    429 \f[
    430   &omega;_{j}(i) = exp(-\frac{D_{j}(i)}{h^2})
    431 \f]
    432 
    433 where \f$ D_{j}(i) \f$ is the sum of squared difference between the current
    434 block and the j-th candidate patch:
    435 
    436 \f[
    437   D_{j}(i) = \sum_{k\in&Omega;_{i}}||p_{0}(k) - p_{j}(k)||_{2}
    438 \f]
    439 
    440 where:
    441 - \f$p_{0}\f$ refers to the current frame.
    442 - \f$&Omega;_{i}\f$ is the patch window, an "LxL" pixel square.
    443 - h is a critical parameter that controls the decay of the weights measured by
    444   the Euclidean distance. It is derived from an estimate of noise amplitude in
    445   the source. This allows the filter coefficients to adapt for videos with
    446   different noise characteristics.
    447 - Usually, M = 32, N = 7, and L = 5, but they can be adjusted.
    448 
    449 It is recommended that the reader refers to the code for more details.
    450 
    451 \subsubsection architecture_enc_frame_proc_tf_funcs Temporal Filter Functions
    452 
    453 The main entry point for temporal filtering is \ref av1_temporal_filter().
    454 This function returns 1 if temporal filtering is successful, otherwise 0.
    455 When temporal filtering is applied, the filtered frame will be held in
    456 the output_frame, which is the frame to be
    457 encoded in the following encoding process.
    458 
    459 Almost all temporal filter related code is in av1/encoder/temporal_filter.c
    460 and av1/encoder/temporal_filter.h.
    461 
    462 Inside \ref av1_temporal_filter(), the reader's attention is directed to
    463 \ref tf_setup_filtering_buffer() and \ref tf_do_filtering().
    464 
    465 - \ref tf_setup_filtering_buffer(): sets up the frame buffer for
    466   temporal filtering, determines the number of frames to be used, and
    467   calculates the noise level of each frame.
    468 
    469 - \ref tf_do_filtering(): the main function for the temporal
    470   filtering algorithm. It breaks each frame into "MxM" blocks. For each
    471   block a motion search \ref tf_motion_search() is applied to find
    472   the motion vector from one neighboring frame. tf_build_predictor() is then
    473   called to build the matching patch and \ref av1_apply_temporal_filter_c() (see
    474   also optimised SIMD versions) to apply temporal filtering. The weighted
    475   average over each pixel is accumulated and finally normalized in
    476   \ref tf_normalize_filtered_frame() to generate the final filtered frame.
    477 
    478 - \ref av1_apply_temporal_filter_c(): the core function of our temporal
    479   filtering algorithm (see also optimised SIMD versions).
    480 
    481 \subsection architecture_enc_frame_proc_film Film Grain Modelling
    482 
    483  Add details here.
    484 
    485 \section architecture_enc_rate_ctrl Rate Control
    486 
    487 \subsection architecture_enc_rate_ctrl_data Main Data Structures
    488 
    489 The following are the main data structures referenced in this section
    490 (see also \ref architecture_enc_data_structures):
    491 
    492  - \ref AV1_PRIMARY ppi (the primary compressor instance data structure)
    493     - \ref AV1_PRIMARY.twopass (\ref TWO_PASS)
    494 
    495  - \ref AV1_COMP cpi (the main compressor instance data structure)
    496     - \ref AV1_COMP.oxcf (\ref AV1EncoderConfig)
    497     - \ref AV1_COMP.rc (\ref RATE_CONTROL)
    498     - \ref AV1_COMP.sf (\ref SPEED_FEATURES)
    499 
    500  - \ref AV1EncoderConfig (Encoder configuration parameters)
    501     - \ref AV1EncoderConfig.rc_cfg (\ref RateControlCfg)
    502 
    503  - \ref FIRSTPASS_STATS *frame_stats_buf (used to store per frame first
    504    pass stats)
    505 
    506  - \ref SPEED_FEATURES (Encode speed vs quality tradeoff parameters)
    507     - \ref SPEED_FEATURES.hl_sf (\ref HIGH_LEVEL_SPEED_FEATURES)
    508 
    509 \subsection architecture_enc_rate_ctrl_options Supported Rate Control Options
    510 
    511 Different use cases (\ref architecture_enc_use_cases) may have different
    512 requirements in terms of data rate control.
    513 
    514 The broad rate control strategy is selected using the <b>--end-usage</b>
    515 parameter on the command line, which maps onto the field
    516 \ref aom_codec_enc_cfg_t.rc_end_usage in \ref aom_encoder.h.
    517 
    518 The four supported options are:-
    519 
    520 - <b>VBR</b> (Variable Bitrate)
    521 - <b>CBR</b> (Constant Bitrate)
    522 - <b>CQ</b> (Constrained Quality mode ; A constrained variant of VBR)
    523 - <b>Fixed Q</b> (Constant quality of Q mode)
    524 
    525 The value of \ref aom_codec_enc_cfg_t.rc_end_usage is in turn copied over
    526 into the encoder rate control configuration data structure as
    527 \ref RateControlCfg.mode.
    528 
    529 In regards to the most important use cases above, Video on demand uses either
    530 VBR or CQ mode. CBR is the preferred rate control model for RTC and Live
    531 streaming and Fixed Q is only used in testing.
    532 
    533 The behaviour of each of these modes is regulated by a series of secondary
    534 command line rate control options but also depends somewhat on the selected
    535 use case, whether 2-pass coding is enabled and the selected encode speed vs
    536 quality trade offs (\ref AV1_COMP.speed and \ref AV1_COMP.sf).
    537 
    538 The list below gives the names of the main rate control command line
    539 options together with the names of the corresponding fields in the rate
    540 control configuration data structures.
    541 
    542 - <b>--target-bitrate</b> (\ref RateControlCfg.target_bandwidth)
    543 - <b>--min-q</b> (\ref RateControlCfg.best_allowed_q)
    544 - <b>--max-q</b> (\ref RateControlCfg.worst_allowed_q)
    545 - <b>--cq-level</b> (\ref RateControlCfg.cq_level)
    546 - <b>--undershoot-pct</b> (\ref RateControlCfg.under_shoot_pct)
    547 - <b>--overshoot-pct</b> (\ref RateControlCfg.over_shoot_pct)
    548 
    549 The following control aspects of vbr encoding
    550 
    551 - <b>--bias-pct</b> (\ref RateControlCfg.vbrbias)
    552 - <b>--minsection-pct</b> ((\ref RateControlCfg.vbrmin_section)
    553 - <b>--maxsection-pct</b> ((\ref RateControlCfg.vbrmax_section)
    554 
    555 The following relate to buffer and delay management in one pass low delay and
    556 real time coding
    557 
    558 - <b>--buf-sz</b> (\ref RateControlCfg.maximum_buffer_size_ms)
    559 - <b>--buf-initial-sz</b> (\ref RateControlCfg.starting_buffer_level_ms)
    560 - <b>--buf-optimal-sz</b> (\ref RateControlCfg.optimal_buffer_level_ms)
    561 
    562 \subsection architecture_enc_vbr Variable Bitrate (VBR) Encoding
    563 
    564 For streamed VOD content the most common rate control strategy is Variable
    565 Bitrate (VBR) encoding. The CQ mode mentioned above is a variant of this
    566 where additional quantizer and quality constraints are applied.  VBR
    567 encoding may in theory be used in conjunction with either 1-pass or 2-pass
    568 encoding.
    569 
    570 VBR encoding varies the number of bits given to each frame or group of frames
    571 according to the difficulty of that frame or group of frames, such that easier
    572 frames are allocated fewer bits and harder frames are allocated more bits. The
    573 intent here is to even out the quality between frames. This contrasts with
    574 Constant Bitrate (CBR) encoding where each frame is allocated the same number
    575 of bits.
    576 
    577 Whilst for any given frame or group of frames the data rate may vary, the VBR
    578 algorithm attempts to deliver a given average bitrate over a wider time
    579 interval. In standard VBR encoding, the time interval over which the data rate
    580 is averaged is usually the duration of the video clip.  An alternative
    581 approach is to target an average VBR bitrate over the entire video corpus for
    582 a particular video format (corpus VBR).
    583 
    584 \subsubsection architecture_enc_1pass_vbr 1 Pass VBR Encoding
    585 
    586 The command line for libaom does allow 1 Pass VBR, but this has not been
    587 properly optimised and behaves much like 1 pass CBR in most regards, with bits
    588 allocated to frames by the following functions:
    589 
    590 - \ref av1_calc_iframe_target_size_one_pass_vbr(
    591            const struct AV1_COMP *const cpi)
    592        "av1_calc_iframe_target_size_one_pass_vbr()"
    593 - \ref av1_calc_pframe_target_size_one_pass_vbr(
    594            const struct AV1_COMP *const cpi,
    595            FRAME_UPDATE_TYPE frame_update_type)
    596        "av1_calc_pframe_target_size_one_pass_vbr()"
    597 
    598 \subsubsection architecture_enc_2pass_vbr 2 Pass VBR Encoding
    599 
    600 The main focus here will be on 2-pass VBR encoding (and the related CQ mode)
    601 as these are the modes most commonly used for VOD content.
    602 
    603 2-pass encoding is selected on the command line by setting --passes=2
    604 (or -p 2).
    605 
    606 Generally speaking, in 2-pass encoding, an encoder will first encode a video
    607 using a default set of parameters and assumptions. Depending on the outcome
    608 of that first encode, the baseline assumptions and parameters will be adjusted
    609 to optimize the output during the second pass.  In essence the first pass is a
    610 fact finding mission to establish the complexity and variability of the video,
    611 in order to allow a better allocation of bits in the second pass.
    612 
    613 The libaom 2-pass algorithm is unusual in that the first pass is not a full
    614 encode of the video. Rather it uses a limited set of prediction and transform
    615 options and a fixed quantizer,  to generate statistics about each frame. No
    616 output bitstream is created and the per frame first pass statistics are stored
    617 entirely in volatile memory. This has some disadvantages when compared to a
    618 full first pass encode, but avoids the need for file I/O and improves speed.
    619 
    620 For two pass encoding, the function \ref av1_encode() will first be called
    621 for each frame in the video with the value \ref AV1EncoderConfig.pass = 1.
    622 This will result in calls to \ref av1_first_pass().
    623 
    624 Statistics for each frame are stored in \ref FIRSTPASS_STATS frame_stats_buf.
    625 
    626 After completion of the first pass, \ref av1_encode() will be called again for
    627 each frame with \ref AV1EncoderConfig.pass = 2.  The frames are then encoded in
    628 accordance with the statistics gathered during the first pass by calls to
    629 \ref encode_frame_to_data_rate() which in turn calls
    630  \ref av1_get_second_pass_params().
    631 
    632 In summary the second pass code :-
    633 
    634 - Searches for scene cuts (if auto key frame detection is enabled).
    635 - Defines the length of and hierarchical structure to be used in each
    636   ARF/GF group.
    637 - Allocates bits based on the relative complexity of each frame, the quality
    638   of frame to frame prediction and the type of frame (e.g. key frame, ARF
    639   frame, golden frame or normal leaf frame).
    640 - Suggests a maximum Q (quantizer value) for each ARF/GF group, based on
    641   estimated complexity and recent rate control compliance
    642   (\ref RATE_CONTROL.active_worst_quality)
    643 - Tracks adherence to the overall rate control objectives and adjusts
    644   heuristics.
    645 
    646 The main two pass functions in regard to the above include:-
    647 
    648 - \ref find_next_key_frame()
    649 - \ref define_gf_group()
    650 - \ref calculate_total_gf_group_bits()
    651 - \ref get_twopass_worst_quality()
    652 - \ref av1_gop_setup_structure()
    653 - \ref av1_gop_bit_allocation()
    654 - \ref av1_twopass_postencode_update()
    655 
    656 For each frame, the two pass algorithm defines a target number of bits
    657 \ref RATE_CONTROL.base_frame_target,  which is then adjusted if necessary to
    658 reflect any undershoot or overshoot on previous frames to give
    659 \ref RATE_CONTROL.this_frame_target.
    660 
    661 As well as \ref RATE_CONTROL.active_worst_quality, the two pass code also
    662 maintains a record of the actual Q value used to encode previous frames
    663 at each level in the current pyramid hierarchy
    664 (\ref PRIMARY_RATE_CONTROL.active_best_quality). The function
    665 \ref rc_pick_q_and_bounds(), uses these values to set a permitted Q range
    666 for each frame.
    667 
    668 \subsubsection architecture_enc_1pass_lagged 1 Pass Lagged VBR Encoding
    669 
    670 1 pass lagged encode falls between simple 1 pass encoding and full two pass
    671 encoding and is used for cases where it is not possible to do a full first
    672 pass through the entire video clip, but where some delay is permissible. For
    673 example near live streaming where there is a delay of up to a few seconds. In
    674 this case the first pass and second pass are in effect combined such that the
    675 first pass starts encoding the clip and the second pass lags behind it by a
    676 few frames.  When using this method, full sequence level statistics are not
    677 available, but it is possible to collect and use frame or group of frame level
    678 data to help in the allocation of bits and in defining ARF/GF coding
    679 hierarchies.  The reader is referred to the \ref AV1_PRIMARY.lap_enabled field
    680 in the main compressor instance (where <b>lap</b> stands for
    681 <b>look ahead processing</b>). This encoding mode for the most part uses the
    682 same rate control pathways as two pass VBR encoding.
    683 
    684 \subsection architecture_enc_rc_loop The Main Rate Control Loop
    685 
    686 Having established a target rate for a given frame and an allowed range of Q
    687 values, the encoder then tries to encode the frame at a rate that is as close
    688 as possible to the target value, given the Q range constraints.
    689 
    690 There are two main mechanisms by which this is achieved.
    691 
    692 The first selects a frame level Q, using an adaptive estimate of the number of
    693 bits that will be generated when the frame is encoded at any given Q.
    694 Fundamentally this mechanism is common to VBR, CBR and to use cases such as
    695 RTC with small adjustments.
    696 
    697 As the Q value mainly adjusts the precision of the residual signal, it is not
    698 actually a reliable basis for accurately predicting the number of bits that
    699 will be generated across all clips. A well predicted clip, for example, may
    700 have a much smaller error residual after prediction.  The algorithm copes with
    701 this by adapting its predictions on the fly using a feedback loop based on how
    702 well it did the previous time around.
    703 
    704 The main functions responsible for the prediction of Q and the adaptation over
    705 time, for the two pass encoding pipeline are:
    706 
    707 - \ref rc_pick_q_and_bounds()
    708     - \ref get_q()
    709         - \ref av1_rc_regulate_q(
    710                    const struct AV1_COMP *cpi, int target_bits_per_frame,
    711                    int active_best_quality, int active_worst_quality,
    712                    int width, int height) "av1_rc_regulate_q()"
    713         - \ref get_rate_correction_factor()
    714         - \ref set_rate_correction_factor()
    715         - \ref find_closest_qindex_by_rate()
    716 - \ref av1_twopass_postencode_update()
    717     - \ref av1_rc_update_rate_correction_factors()
    718 
    719 A second mechanism for control comes into play if there is a large rate miss
    720 for the current frame (much too big or too small). This is a recode mechanism
    721 which allows the current frame to be re-encoded one or more times with a
    722 revised Q value. This obviously has significant implications for encode speed
    723 and in the case of RTC latency (hence it is not used for the RTC pathway).
    724 
    725 Whether or not a recode is allowed for a given frame depends on the selected
    726 encode speed vs quality trade off. This is set on the command line using the
    727 --cpu-used parameter which maps onto the \ref AV1_COMP.speed field in the main
    728 compressor instance data structure.
    729 
    730 The value of \ref AV1_COMP.speed, combined with the use case, is used to
    731 populate the speed features data structure AV1_COMP.sf. In particular
    732 \ref HIGH_LEVEL_SPEED_FEATURES.recode_loop determines the types of frames that
    733 may be recoded and \ref HIGH_LEVEL_SPEED_FEATURES.recode_tolerance is a rate
    734 error trigger threshold.
    735 
    736 For more information the reader is directed to the following functions:
    737 
    738 - \ref encode_with_recode_loop()
    739 - \ref encode_without_recode()
    740 - \ref recode_loop_update_q()
    741 - \ref recode_loop_test()
    742 - \ref av1_set_speed_features_framesize_independent()
    743 - \ref av1_set_speed_features_framesize_dependent()
    744 
    745 \subsection architecture_enc_fixed_q Fixed Q Mode
    746 
    747 There are two main fixed Q cases:
    748 -# Fixed Q with adaptive qp offsets: same qp offset for each pyramid level
    749    in a given video, but these offsets are adaptive based on video content.
    750 -# Fixed Q with fixed qp offsets: content-independent fixed qp offsets for
    751    each pyramid level.
    752 
    753 The reader is also refered to the following functions:
    754 - \ref av1_rc_pick_q_and_bounds()
    755 - \ref rc_pick_q_and_bounds_no_stats_cbr()
    756 - \ref rc_pick_q_and_bounds_no_stats()
    757 - \ref rc_pick_q_and_bounds()
    758 
    759 \section architecture_enc_frame_groups GF/ ARF Frame Groups & Hierarchical Coding
    760 
    761 \subsection architecture_enc_frame_groups_data Main Data Structures
    762 
    763 The following are the main data structures referenced in this section
    764 (see also \ref architecture_enc_data_structures):
    765 
    766 - \ref AV1_COMP cpi (the main compressor instance data structure)
    767     - \ref AV1_COMP.rc (\ref RATE_CONTROL)
    768 
    769 - \ref FIRSTPASS_STATS *frame_stats_buf (used to store per frame first pass
    770 stats)
    771 
    772 \subsection architecture_enc_frame_groups_groups Frame Groups
    773 
    774 To process a sequence/stream of video frames, the encoder divides the frames
    775 into groups and encodes them sequentially (possibly dependent on previous
    776 groups). In AV1 such a group is usually referred to as a golden frame group
    777 (GF group) or sometimes an Alt-Ref (ARF) group or a group of pictures (GOP).
    778 A GF group determines and stores the coding structure of the frames (for
    779 example, frame type, usage of the hierarchical structure, usage of overlay
    780 frames, etc.) and can be considered as the base unit to process the frames,
    781 therefore playing an important role in the encoder.
    782 
    783 The length of a specific GF group is arguably the most important aspect when
    784 determining a GF group. This is because most GF group level decisions are
    785 based on the frame characteristics, if not on the length itself directly.
    786 Note that the GF group is always a group of consecutive frames, which means
    787 the start and end of the group (so again, the length of it) determines which
    788 frames are included in it and hence determines the characteristics of the GF
    789 group. Therefore, in this document we will first discuss the GF group length
    790 decision in Libaom, followed by frame structure decisions when defining a GF
    791 group with a certain length.
    792 
    793 \subsection architecture_enc_gf_length GF / ARF Group Length Determination
    794 
    795 The basic intuition of determining the GF group length is that it is usually
    796 desirable to group together frames that are similar. Hence, we may choose
    797 longer groups when consecutive frames are very alike and shorter ones when
    798 they are very different.
    799 
    800 The determination of the GF group length is done in function \ref
    801 calculate_gf_length(). The following encoder use cases are supported:
    802 
    803 <ul>
    804   <li><b>Single pass with look-ahead disabled(\ref has_no_stats_stage()):
    805   </b> in this case there is no information available on the following stream
    806   of frames, therefore the function will set the GF group length for the
    807   current and the following GF groups (a total number of MAX_NUM_GF_INTERVALS
    808   groups) to be the maximum value allowed.</li>
    809 
    810   <li><b>Single pass with look-ahead enabled (\ref AV1_PRIMARY.lap_enabled):</b>
    811   look-ahead processing is enabled for single pass, therefore there is a
    812   limited amount of information available regarding future frames. In this
    813   case the function will determine the length based on \ref FIRSTPASS_STATS
    814   (which is generated when processing the look-ahead buffer) for only the
    815   current GF group.</li>
    816 
    817   <li><b>Two pass:</b> the first pass in two-pass encoding collects the stats
    818   and will not call the function. In the second pass, the function tries to
    819   determine the GF group length of the current and the following GF groups (a
    820   total number of MAX_NUM_GF_INTERVALS groups) based on the first-pass
    821   statistics. Note that as we will be discussing later, such decisions may not
    822   be accurate and can be changed later.</li>
    823 </ul>
    824 
    825 Except for the first trivial case where there is no prior knowledge of the
    826 following frames, the function \ref calculate_gf_length() tries to determine the
    827 GF group length based on the first pass statistics. The determination is divided
    828 into two parts:
    829 
    830 <ol>
    831    <li>Baseline decision based on accumulated statistics: this part of the function
    832    iterates through the firstpass statistics of the following frames and
    833    accumulates the statistics with function accumulate_next_frame_stats.
    834    The accumulated statistics are then used to determine whether the
    835    correlation in the GF group has dropped too much in function detect_gf_cut.
    836    If detect_gf_cut returns non-zero, or if we've reached the end of
    837    first-pass statistics, the baseline decision is set at the current point.</li>
    838 
    839    <li>If we are not at the end of the first-pass statistics, the next part will
    840    try to refine the baseline decision. This algorithm is based on the analysis
    841    of firstpass stats. It tries to cut the groups in stable regions or
    842    relatively stable points. Also it tries to avoid cutting in a blending
    843    region.</li>
    844 </ol>
    845 
    846 As mentioned, for two-pass encoding, the function \ref
    847 calculate_gf_length() tries to determine the length of as many as
    848 MAX_NUM_GF_INTERVALS groups. The decisions are stored in
    849 \ref PRIMARY_RATE_CONTROL.gf_intervals[]. The variables
    850 \ref RATE_CONTROL.intervals_till_gf_calculate_due and
    851 \ref PRIMARY_RATE_CONTROL.gf_intervals[] help with managing and updating the stored
    852 decisions. In the function \ref define_gf_group(), the corresponding
    853 stored length decision will be used to define the current GF group.
    854 
    855 When the maximum GF group length is larger or equal to 32, the encoder will
    856 enforce an extra layer to determine whether to use maximum GF length of 32
    857 or 16 for every GF group. In such a case, \ref calculate_gf_length() is
    858 first called with the original maximum length (>=32). Afterwards,
    859 \ref av1_tpl_setup_stats() is called to analyze the determined GF group
    860 and compare the reference to the last frame and the middle frame. If it is
    861 decided that we should use a maximum GF length of 16, the function
    862 \ref calculate_gf_length() is called again with the updated maximum
    863 length, and it only sets the length for a single GF group
    864 (\ref RATE_CONTROL.intervals_till_gf_calculate_due is set to 1). This process
    865 is shown below.
    866 
    867 \image html tplgfgroupdiagram.png "" width=40%
    868 
    869 Before encoding each frame, the encoder checks
    870 \ref RATE_CONTROL.frames_till_gf_update_due. If it is zero, indicating
    871 processing of the current GF group is done, the encoder will check whether
    872 \ref RATE_CONTROL.intervals_till_gf_calculate_due is zero. If it is, as
    873 discussed above, \ref calculate_gf_length() is called with original
    874 maximum length. If it is not zero, then the GF group length value stored
    875 in \ref PRIMARY_RATE_CONTROL.gf_intervals[\ref PRIMARY_RATE_CONTROL.cur_gf_index] is used
    876 (subject to change as discussed above).
    877 
    878 \subsection architecture_enc_gf_structure Defining a GF Group's Structure
    879 
    880 The function \ref define_gf_group() defines the frame structure as well
    881 as other GF group level parameters (e.g. bit allocation) once the length of
    882 the current GF group is determined.
    883 
    884 The function first iterates through the first pass statistics in the GF group to
    885 accumulate various stats, using accumulate_this_frame_stats() and
    886 accumulate_next_frame_stats(). The accumulated statistics are then used to
    887 determine the use of the use of ALTREF frame along with other properties of the
    888 GF group. The values of \ref PRIMARY_RATE_CONTROL.cur_gf_index, \ref
    889 RATE_CONTROL.intervals_till_gf_calculate_due and \ref
    890 RATE_CONTROL.frames_till_gf_update_due are also updated accordingly.
    891 
    892 The function \ref av1_gop_setup_structure() is called at the end to determine
    893 the frame layers and reference maps in the GF group, where the
    894 construct_multi_layer_gf_structure() function sets the frame update types for
    895 each frame and the group structure.
    896 
    897 - If ALTREF frames are allowed for the GF group: the first frame is set to
    898   KF_UPDATE, GF_UPDATE or ARF_UPDATE. The last frames of the GF group is set to
    899   OVERLAY_UPDATE.  Then in set_multi_layer_params(), frame update
    900   types are determined recursively in a binary tree fashion, and assigned to
    901   give the final IBBB structure for the group.  - If the current branch has more
    902   than 2 frames and we have not reached maximum layer depth, then the middle
    903   frame is set as INTNL_ARF_UPDATE, and the left and right branches are
    904   processed recursively.  - If the current branch has less than 3 frames, or we
    905   have reached maximum layer depth, then every frame in the branch is set to
    906   LF_UPDATE.
    907 
    908 - If ALTREF frame is not allowed for the GF group: the frames are set
    909   as LF_UPDATE. This basically forms an IPPP GF group structure.
    910 
    911 As mentioned, the encoder may use Temporal dependancy modelling (TPL - see \ref
    912 architecture_enc_tpl) to determine whether we should use a maximum length of 32
    913 or 16 for the current GF group. This requires calls to \ref define_gf_group()
    914 but should not change other settings (since it is in essence a trial). This
    915 special case is indicated by the setting parameter <b>is_final_pass</b> for to
    916 zero.
    917 
    918 For single pass encodes where look-ahead processing is disabled
    919 (\ref AV1_PRIMARY.lap_enabled = 0), \ref define_gf_group_pass0() is used
    920 instead of \ref define_gf_group().
    921 
    922 \subsection architecture_enc_kf_groups Key Frame Groups
    923 
    924 A special constraint for GF group length is the location of the next keyframe
    925 (KF). The frames between two KFs are referred to as a KF group. Each KF group
    926 can be encoded and decoded independently. Because of this, a GF group cannot
    927 span beyond a KF and the location of the next KF is set as a hard boundary
    928 for GF group length.
    929 
    930 <ul>
    931    <li>For two-pass encoding \ref RATE_CONTROL.frames_to_key controls when to
    932    encode a key frame. When it is zero, the current frame is a keyframe and
    933    the function \ref find_next_key_frame() is called. This in turn calls
    934    \ref define_kf_interval() to work out where the next key frame should
    935    be placed.</li>
    936 
    937    <li>For single-pass with look-ahead enabled, \ref define_kf_interval()
    938    is called whenever a GF group update is needed (when
    939    \ref RATE_CONTROL.frames_till_gf_update_due is zero). This is because
    940    generally KFs are more widely spaced and the look-ahead buffer is usually
    941    not long enough.</li>
    942 
    943    <li>For single-pass with look-ahead disabled, the KFs are placed according
    944    to the command line parameter <b>--kf-max-dist</b> (The above two cases are
    945    also subject to this constraint).</li>
    946 </ul>
    947 
    948 The function \ref define_kf_interval() tries to detect a scenecut.
    949 If a scenecut within kf-max-dist is detected, then it is set as the next
    950 keyframe. Otherwise the given maximum value is used.
    951 
    952 \section architecture_enc_tpl Temporal Dependency Modelling
    953 
    954 The temporal dependency model runs at the beginning of each GOP. It builds the
    955 motion trajectory within the GOP in units of 16x16 blocks. The temporal
    956 dependency of a 16x16 block is evaluated as the predictive coding gains it
    957 contributes to its trailing motion trajectory. This temporal dependency model
    958 reflects how important a coding block is for the coding efficiency of the
    959 overall GOP. It is hence used to scale the Lagrangian multiplier used in the
    960 rate-distortion optimization framework.
    961 
    962 \subsection architecture_enc_tpl_config Configurations
    963 
    964 The temporal dependency model and its applications are by default turned on in
    965 libaom encoder for the VoD use case. To disable it, use --tpl-model=0 in the
    966 aomenc configuration.
    967 
    968 \subsection architecture_enc_tpl_algoritms Algorithms
    969 
    970 The scheme works in the reverse frame processing order over the source frames,
    971 propagating information from future frames back to the current frame. For each
    972 frame, a propagation step is run for each MB. it operates as follows:
    973 
    974 <ul>
    975    <li> Estimate the intra prediction cost in terms of sum of absolute Hadamard
    976    transform difference (SATD) noted as intra_cost. It also loads the motion
    977    information available from the first-pass encode and estimates the inter
    978    prediction cost as inter_cost. Due to the use of hybrid inter/intra
    979    prediction mode, the inter_cost value is further upper bounded by
    980    intra_cost. A propagation cost variable is used to collect all the
    981    information flowed back from future processing frames. It is initialized as
    982    0 for all the blocks in the last processing frame in a group of pictures
    983    (GOP).</li>
    984 
    985    <li> The fraction of information from a current block to be propagated towards
    986    its reference block is estimated as:
    987 \f[
    988    propagation\_fraction = (1 - inter\_cost/intra\_cost)
    989 \f]
    990    It reflects how much the motion compensated reference would reduce the
    991    prediction error in percentage.</li>
    992 
    993    <li> The total amount of information the current block contributes to the GOP
    994    is estimated as intra_cost + propagation_cost. The information that it
    995    propagates towards its reference block is captured by:
    996 
    997 \f[
    998    propagation\_amount =
    999    (intra\_cost + propagation\_cost) * propagation\_fraction
   1000 \f]</li>
   1001 
   1002    <li> Note that the reference block may not necessarily sit on the grid of
   1003    16x16 blocks. The propagation amount is hence dispensed to all the blocks
   1004    that overlap with the reference block. The corresponding block in the
   1005    reference frame accumulates its own propagation cost as it receives back
   1006    propagation.
   1007 
   1008 \f[
   1009    propagation\_cost = propagation\_cost +
   1010                        (\frac{overlap\_area}{(16*16)} * propagation\_amount)
   1011 \f]</li>
   1012 
   1013    <li> In the final encoding stage, the distortion propagation factor of a block
   1014    is evaluated as \f$(1 + \frac{propagation\_cost}{intra\_cost})\f$, where the second term
   1015    captures its impact on later frames in a GOP.</li>
   1016 
   1017    <li> The Lagrangian multiplier is adapted at the 64x64 block level. For every
   1018    64x64 block in a frame, we have a distortion propagation factor:
   1019 
   1020 \f[
   1021   dist\_prop[i] = 1 + \frac{propagation\_cost[i]}{intra\_cost[i]}
   1022 \f]
   1023 
   1024    where i denotes the block index in the frame. We also have the frame level
   1025    distortion propagation factor:
   1026 
   1027 \f[
   1028   dist\_prop = 1 +
   1029   \frac{\sum_{i}propagation\_cost[i]}{\sum_{i}intra\_cost[i]}
   1030 \f]
   1031 
   1032    which is used to normalize the propagation factor at the 64x64 block level. The
   1033    Lagrangian multiplier is hence adapted as:
   1034 
   1035 \f[
   1036   &lambda;[i] = &lambda;[0] * \frac{dist\_prop}{dist\_prop[i]}
   1037 \f]
   1038 
   1039    where &lambda;0 is the multiplier associated with the frame level QP. The
   1040    64x64 block level QP is scaled according to the Lagrangian multiplier.
   1041 </ul>
   1042 
   1043 \subsection architecture_enc_tpl_keyfun Key Functions and data structures
   1044 
   1045 The reader is also refered to the following functions and data structures:
   1046 
   1047 - \ref TplParams
   1048 - \ref av1_tpl_setup_stats() builds the TPL model.
   1049 - \ref setup_delta_q() Assign different quantization parameters to each super
   1050   block based on its TPL weight.
   1051 
   1052 \section architecture_enc_partitions Block Partition Search
   1053 
   1054  A frame is first split into tiles in \ref encode_tiles(), with each tile
   1055  compressed by av1_encode_tile(). Then a tile is processed in superblock rows
   1056  via \ref av1_encode_sb_row() and then \ref encode_sb_row().
   1057 
   1058  The partition search processes superblocks sequentially in \ref
   1059  encode_sb_row(). Two search modes are supported, depending upon the encoding
   1060  configuration, \ref encode_nonrd_sb() is for 1-pass and real-time modes,
   1061  while \ref encode_rd_sb() performs more exhaustive rate distortion based
   1062  searches.
   1063 
   1064  Partition search over the recursive quad-tree space is implemented by
   1065  recursive calls to \ref av1_nonrd_use_partition(),
   1066  \ref av1_rd_use_partition(), or av1_rd_pick_partition() and returning best
   1067  options for sub-trees to their parent partitions.
   1068 
   1069  In libaom, the partition search lays on top of the mode search (predictor,
   1070  transform, etc.), instead of being a separate module. The interface of mode
   1071  search is \ref pick_sb_modes(), which connects the partition_search with
   1072  \ref architecture_enc_inter_modes and \ref architecture_enc_intra_modes. To
   1073  make good decisions, reconstruction is also required in order to build
   1074  references and contexts. This is implemented by \ref encode_sb() at the
   1075  sub-tree level and \ref encode_b() at coding block level.
   1076 
   1077  See also \ref partition_search
   1078 
   1079 \section architecture_enc_intra_modes Intra Mode Search
   1080 
   1081 AV1 also provides 71 different intra prediction modes, i.e. modes that predict
   1082 only based upon information in the current frame with no dependency on
   1083 previous or future frames. For key frames, where this independence from any
   1084 other frame is a defining requirement and for other cases where intra only
   1085 frames are required, the encoder need only considers these modes in the rate
   1086 distortion loop.
   1087 
   1088 Even so, in most use cases, searching all possible intra prediction modes for
   1089 every block and partition size is not practical and some pruning of the search
   1090 tree is necessary.
   1091 
   1092 For the Rate distortion optimized case, the main top level function
   1093 responsible for selecting the intra prediction mode for a given block is
   1094 \ref av1_rd_pick_intra_mode_sb(). The readers attention is also drawn to the
   1095 functions \ref hybrid_intra_mode_search() and \ref av1_nonrd_pick_intra_mode()
   1096 which may be used where encode speed is critical. The choice between the
   1097 rd path and the non rd or hybrid paths depends on the encoder use case and the
   1098 \ref AV1_COMP.speed parameter. Further fine control of the speed vs quality
   1099 trade off is provided by means of fields in \ref AV1_COMP.sf (which has type
   1100 \ref SPEED_FEATURES).
   1101 
   1102 Note that some intra modes are only considered for specific use cases or
   1103 types of video. For example the palette based prediction modes are often
   1104 valueable for graphics or screen share content but not for natural video.
   1105 (See \ref av1_search_palette_mode())
   1106 
   1107 See also \ref intra_mode_search for more details.
   1108 
   1109 \section architecture_enc_inter_modes Inter Prediction Mode Search
   1110 
   1111 For inter frames, where we also allow prediction using one or more previously
   1112 coded frames (which may chronologically speaking be past or future frames or
   1113 non-display reference buffers such as ARF frames), the size of the search tree
   1114 that needs to be traversed, to select a prediction mode, is considerably more
   1115 massive.
   1116 
   1117 In addition to the 71 possible intra modes we also need to consider 56 single
   1118 frame inter prediction modes (7 reference frames x 4 modes x 2 for OBMC
   1119 (overlapped block motion compensation)), 12768 compound inter prediction modes
   1120 (these are modes that combine inter predictors from two reference frames) and
   1121 36708 compound inter / intra prediction modes.
   1122 
   1123 As with the intra mode search, libaom supports an RD based pathway and a non
   1124 rd pathway for speed critical use cases.  The entry points for these two cases
   1125 are \ref av1_rd_pick_inter_mode() and \ref av1_nonrd_pick_inter_mode_sb()
   1126 respectively.
   1127 
   1128 Various heuristics and predictive strategies are used to prune the search tree
   1129 with fine control provided through the speed features parameter in the main
   1130 compressor instance data structure \ref AV1_COMP.sf.
   1131 
   1132 It is worth noting, that some prediction modes incurr a much larger rate cost
   1133 than others (ignoring for now the cost of coding the error residual). For
   1134 example, a compound mode that requires the encoder to specify two reference
   1135 frames and two new motion vectors will almost inevitable have a higher rate
   1136 cost than a simple inter prediction mode that uses a predicted or 0,0 motion
   1137 vector. As such, if we have already found a mode for the current block that
   1138 has a low RD cost, we can skip a large number of the possible modes on the
   1139 basis that even if the error residual is 0 the inherent rate cost of the
   1140 mode itself will garauntee that it is not chosen.
   1141 
   1142 See also \ref inter_mode_search for more details.
   1143 
   1144 \section architecture_enc_tx_search Transform Search
   1145 
   1146 AV1 implements the transform stage using 4 seperable 1-d transforms (DCT,
   1147 ADST, FLIPADST and IDTX, where FLIPADST is the reversed version of ADST
   1148 and IDTX is the identity transform) which can be combined to give 16 2-d
   1149 combinations.
   1150 
   1151 These combinations can be applied at 19 different scales from 64x64 pixels
   1152 down to 4x4 pixels.
   1153 
   1154 This gives rise to a large number of possible candidate transform options
   1155 for coding the residual error after prediction. An exhaustive rate-distortion
   1156 based evaluation of all candidates would not be practical from a speed
   1157 perspective in a production encoder implementation. Hence libaom addopts a
   1158 number of strategies to prune the selection of both the transform size and
   1159 transform type.
   1160 
   1161 There are a number of strategies that have been tested and implememnted in
   1162 libaom including:
   1163 
   1164 - A statistics based approach that looks at the frequency with which certain
   1165   combinations are used in a given context and prunes out very unlikely
   1166   candidates. It is worth noting here that some size candidates can be pruned
   1167   out immediately based on the size of the prediction partition. For example it
   1168   does not make sense to use a transform size that is larger than the
   1169   prediction partition size but also a very large prediction partition size is
   1170   unlikely to be optimally pared with small transforms.
   1171 
   1172 - A Machine learning based model
   1173 
   1174 - A method that initially tests candidates using a fast algorithm that skips
   1175   entropy encoding and uses an estimated cost model to choose a reduced subset
   1176   for full RD analysis. This subject is covered more fully in a paper authored
   1177   by Bohan Li, Jingning Han, and Yaowu Xu titled: <b>Fast Transform Type
   1178   Selection Using Conditional Laplace Distribution Based Rate Estimation</b>
   1179 
   1180 <b>TODO Add link to paper when available</b>
   1181 
   1182 See also \ref transform_search for more details.
   1183 
   1184 \section architecture_post_enc_filt Post Encode Loop Filtering
   1185 
   1186 AV1 supports three types of post encode <b>in loop</b> filtering to improve
   1187 the quality of the reconstructed video.
   1188 
   1189 - <b>Deblocking Filter</b> The first of these is a farily traditional boundary
   1190   deblocking filter that attempts to smooth discontinuities that may occur at
   1191   the boundaries between blocks. See also \ref in_loop_filter.
   1192 
   1193 - <b>CDEF Filter</b> The constrained directional enhancement filter (CDEF)
   1194   allows the codec to apply a non-linear deringing filter along certain
   1195   (potentially oblique) directions. A primary filter is applied along the
   1196   selected direction, whilst a secondary filter is applied at 45 degrees to
   1197   the primary direction. (See also \ref in_loop_cdef and
   1198   <a href="https://arxiv.org/abs/2008.06091"> A Technical Overview of AV1</a>.
   1199 
   1200 - <b>Loop Restoration Filter</b> The loop restoration filter is applied after
   1201   any prior post filtering stages. It acts on units of either 64 x 64,
   1202   128 x 128, or 256 x 256 pixel blocks, refered to as loop restoration units.
   1203   Each unit can independently select either to bypass filtering, use a Wiener
   1204   filter, or use a self-guided filter. (See also \ref in_loop_restoration and
   1205   <a href="https://arxiv.org/abs/2008.06091"> A Technical Overview of AV1</a>.
   1206 
   1207 \section architecture_entropy Entropy Coding
   1208 
   1209 \subsection architecture_entropy_aritmetic Arithmetic Coder
   1210 
   1211 VP9, used a binary arithmetic coder to encode symbols, where the propability
   1212 of a 1 or 0 at each descision node was based on a context model that took
   1213 into account recently coded values (for example previously coded coefficients
   1214 in the current block). A mechanism existed to update the context model each
   1215 frame, either explicitly in the bitstream, or implicitly at both the encoder
   1216 and decoder based on the observed frequency of different outcomes in the
   1217 previous frame. VP9 also supported seperate context models for different types
   1218 of frame (e.g. inter coded frames and key frames).
   1219 
   1220 In contrast, AV1 uses an M-ary symbol arithmetic coder to compress the syntax
   1221 elements, where integer \f$M\in[2, 14]\f$. This approach is based upon the entropy
   1222 coding strategy used in the Daala video codec and allows for some bit-level
   1223 parallelism in its implementation. AV1 also has an extended context model and
   1224 allows for updates to the probabilities on a per symbol basis as opposed to
   1225 the per frame strategy in VP9.
   1226 
   1227 To improve the performance / throughput of the arithmetic encoder, especially
   1228 in hardware implementations, the probability model is updated and maintained
   1229 at 15-bit precision, but the arithmetic encoder only uses the most significant
   1230 9 bits when encoding a symbol. A more detailed discussion of the algorithm
   1231 and design constraints can be found in
   1232 <a href="https://arxiv.org/abs/2008.06091"> A Technical Overview of AV1</a>.
   1233 
   1234 TODO add references to key functions / files.
   1235 
   1236 As with VP9, a mechanism exists in AV1 to encode some elements into the
   1237 bitstream as uncrompresed bits or literal values, without using the arithmetic
   1238 coder. For example, some frame and sequence header values, where it is
   1239 beneficial to be able to read the values directly.
   1240 
   1241 TODO add references to key functions / files.
   1242 
   1243 \subsection architecture_entropy_coef Transform Coefficient Coding and Optimization
   1244 \image html coeff_coding.png "" width=70%
   1245 
   1246 \subsubsection architecture_entropy_coef_what Transform coefficient coding
   1247 Transform coefficient coding is where the encoder compresses a quantized version
   1248 of prediction residue into the bitstream.
   1249 
   1250 \paragraph architecture_entropy_coef_prepare Preparation - transform and quantize
   1251 Before the entropy coding stage, the encoder decouple the pixel-to-pixel
   1252 correlation of the prediction residue by transforming the residue from the
   1253 spatial domain to the frequency domain. Then the encoder quantizes the transform
   1254 coefficients to make the coefficients ready for entropy coding.
   1255 
   1256 \paragraph architecture_entropy_coef_coding The coding process
   1257 The encoder uses \ref av1_write_coeffs_txb() to write the coefficients of
   1258 a transform block into the bitstream.
   1259 The coding process has three stages.
   1260 1. The encoder will code transform block skip flag (txb_skip). If the skip flag is
   1261 off, then the encoder will code the end of block position (eob) which is the scan
   1262 index of the last non-zero coefficient plus one.
   1263 2. Second, the encoder will code lower magnitude levels of each coefficient in
   1264 reverse scan order.
   1265 3. Finally, the encoder will code the sign and higher magnitude levels for each
   1266 coefficient if they are available.
   1267 
   1268 Related functions:
   1269 - \ref av1_write_coeffs_txb()
   1270 - write_inter_txb_coeff()
   1271 - \ref av1_write_intra_coeffs_mb()
   1272 
   1273 \paragraph architecture_entropy_coef_context Context information
   1274 To improve the compression efficiency, the encoder uses several context models
   1275 tailored for transform coefficients to capture the correlations between coding
   1276 symbols. Most of the context models are built to capture the correlations
   1277 between the coefficients within the same transform block. However, transform
   1278 block skip flag (txb_skip) and the sign of dc coefficient (dc_sign) require
   1279 context info from neighboring transform blocks.
   1280 
   1281 Here is how context info spread between transform blocks. Before coding a
   1282 transform block, the encoder will use get_txb_ctx() to collect the context
   1283 information from neighboring transform blocks. Then the context information
   1284 will be used for coding transform block skip flag (txb_skip) and the sign of
   1285 dc coefficient (dc_sign). After the transform block is coded, the encoder will
   1286 extract the context info from the current block using
   1287 \ref av1_get_txb_entropy_context(). Then encoder will store the context info
   1288 into a byte (uint8_t) using av1_set_entropy_contexts(). The encoder will use
   1289 the context info to code other transform blocks.
   1290 
   1291 Related functions:
   1292 - \ref av1_get_txb_entropy_context()
   1293 - av1_set_entropy_contexts()
   1294 - get_txb_ctx()
   1295 - \ref av1_update_intra_mb_txb_context()
   1296 
   1297 \subsubsection architecture_entropy_coef_rd RD optimization
   1298 Beside the actual entropy coding, the encoder uses several utility functions
   1299 to make optimal RD decisions.
   1300 
   1301 \paragraph architecture_entropy_coef_cost Entropy cost
   1302 The encoder uses \ref av1_cost_coeffs_txb() or \ref av1_cost_coeffs_txb_laplacian()
   1303 to estimate the entropy cost of a transform block. Note that
   1304 \ref av1_cost_coeffs_txb() is slower but accurate whereas
   1305 \ref av1_cost_coeffs_txb_laplacian() is faster but less accurate.
   1306 
   1307 Related functions:
   1308 - \ref av1_cost_coeffs_txb()
   1309 - \ref av1_cost_coeffs_txb_laplacian()
   1310 - av1_cost_coeffs_txb_estimate() (see av1/encoder/txb_rdopt.c)
   1311 
   1312 \paragraph architecture_entropy_coef_opt Quantized level optimization
   1313 Beside computing entropy cost, the encoder also uses \ref av1_optimize_txb()
   1314 to adjust the coefficient’s quantized levels to achieve optimal RD trade-off.
   1315 In \ref av1_optimize_txb(), the encoder goes through each quantized
   1316 coefficient and lowers the quantized coefficient level by one if the action
   1317 yields a better RD score.
   1318 
   1319 Related functions:
   1320 - \ref av1_optimize_txb()
   1321 
   1322 All the related functions are listed in \ref coefficient_coding.
   1323 
   1324 \section architecture_simd SIMD usage
   1325 
   1326 In order to efficiently encode video on modern platforms, it is necessary to
   1327 implement optimized versions of many core encoding and decoding functions using
   1328 architecture-specific SIMD instructions.
   1329 
   1330 Functions which have optimized implementations will have multiple variants
   1331 in the code, each suffixed with the name of the appropriate instruction set.
   1332 There will additionally be an `_c` version, which acts as a reference
   1333 implementation which the SIMD variants can be tested against.
   1334 
   1335 As different machines with the same nominal architecture may support different
   1336 subsets of SIMD instructions, we have dynamic CPU detection logic which chooses
   1337 the appropriate functions to use at run time. This process is handled by
   1338 `build/cmake/rtcd.pl`, with function definitions in the files
   1339 `*_rtcd_defs.pl` elsewhere in the codebase.
   1340 
   1341 Currently SIMD is supported on the following platforms:
   1342 
   1343 - x86: Requires SSE4.1 or above
   1344 
   1345 - Arm: Requires Neon (Armv7-A and above)
   1346 
   1347 We aim to provide implementations of all performance-critical functions which
   1348 are compatible with the instruction sets listed above. Additional SIMD
   1349 extensions (e.g. AVX on x86, SVE on Arm) are also used to provide even
   1350 greater performance where available.
   1351 
   1352 */
   1353 
   1354 /*!\defgroup encoder_algo Encoder Algorithm
   1355  *
   1356  * The encoder algorithm describes how a sequence is encoded, including high
   1357  * level decision as well as algorithm used at every encoding stage.
   1358  */
   1359 
   1360 /*!\defgroup high_level_algo High-level Algorithm
   1361  * \ingroup encoder_algo
   1362  * This module describes sequence level/frame level algorithm in AV1.
   1363  * More details will be added.
   1364  * @{
   1365  */
   1366 
   1367 /*!\defgroup speed_features Speed vs Quality Trade Off
   1368  * \ingroup high_level_algo
   1369  * This module describes the encode speed vs quality tradeoff
   1370  * @{
   1371  */
   1372 /*! @} - end defgroup speed_features */
   1373 
   1374 /*!\defgroup src_frame_proc Source Frame Processing
   1375  * \ingroup high_level_algo
   1376  * This module describes algorithms in AV1 assosciated with the
   1377  * pre-processing of source frames. See also \ref architecture_enc_src_proc
   1378  *
   1379  * @{
   1380  */
   1381 /*! @} - end defgroup src_frame_proc */
   1382 
   1383 /*!\defgroup rate_control Rate Control
   1384  * \ingroup high_level_algo
   1385  * This module describes rate control algorithm in AV1.
   1386  *  See also \ref architecture_enc_rate_ctrl
   1387  * @{
   1388  */
   1389 /*! @} - end defgroup rate_control */
   1390 
   1391 /*!\defgroup tpl_modelling Temporal Dependency Modelling
   1392  * \ingroup high_level_algo
   1393  * This module includes algorithms to implement temporal dependency modelling.
   1394  *  See also \ref architecture_enc_tpl
   1395  * @{
   1396  */
   1397 /*! @} - end defgroup tpl_modelling */
   1398 
   1399 /*!\defgroup two_pass_algo Two Pass Mode
   1400    \ingroup high_level_algo
   1401 
   1402  In two pass mode, the input file is passed into the encoder for a quick
   1403  first pass, where statistics are gathered. These statistics and the input
   1404  file are then passed back into the encoder for a second pass. The statistics
   1405  help the encoder reach the desired bitrate without as much overshooting or
   1406  undershooting.
   1407 
   1408  During the first pass, the codec will return "stats" packets that contain
   1409  information useful for the second pass. The caller should concatenate these
   1410  packets as they are received. In the second pass, the concatenated packets
   1411  are passed in, along with the frames to encode. During the second pass,
   1412  "frame" packets are returned that represent the compressed video.
   1413 
   1414  A complete example can be found in `examples/twopass_encoder.c`. Pseudocode
   1415  is provided below to illustrate the core parts.
   1416 
   1417  During the first pass, the uncompressed frames are passed in and stats
   1418  information is appended to a byte array.
   1419 
   1420 ~~~~~~~~~~~~~~~{.c}
   1421 // For simplicity, assume that there is enough memory in the stats buffer.
   1422 // Actual code will want to use a resizable array. stats_len represents
   1423 // the length of data already present in the buffer.
   1424 void get_stats_data(aom_codec_ctx_t *encoder, char *stats,
   1425                     size_t *stats_len, bool *got_data) {
   1426   const aom_codec_cx_pkt_t *pkt;
   1427   aom_codec_iter_t iter = NULL;
   1428   while ((pkt = aom_codec_get_cx_data(encoder, &iter))) {
   1429     *got_data = true;
   1430     if (pkt->kind != AOM_CODEC_STATS_PKT) continue;
   1431     memcpy(stats + *stats_len, pkt->data.twopass_stats.buf,
   1432            pkt->data.twopass_stats.sz);
   1433     *stats_len += pkt->data.twopass_stats.sz;
   1434   }
   1435 }
   1436 
   1437 void first_pass(char *stats, size_t *stats_len) {
   1438   struct aom_codec_enc_cfg first_pass_cfg;
   1439   ... // Initialize the config as needed.
   1440   first_pass_cfg.g_pass = AOM_RC_FIRST_PASS;
   1441   aom_codec_ctx_t first_pass_encoder;
   1442   ... // Initialize the encoder.
   1443 
   1444   while (frame_available) {
   1445     // Read in the uncompressed frame, update frame_available
   1446     aom_image_t *frame_to_encode = ...;
   1447     aom_codec_encode(&first_pass_encoder, img, pts, duration, flags);
   1448     get_stats_data(&first_pass_encoder, stats, stats_len);
   1449   }
   1450   // After all frames have been processed, call aom_codec_encode with
   1451   // a NULL ptr repeatedly, until no more data is returned. The NULL
   1452   // ptr tells the encoder that no more frames are available.
   1453   bool got_data;
   1454   do {
   1455     got_data = false;
   1456     aom_codec_encode(&first_pass_encoder, NULL, pts, duration, flags);
   1457     get_stats_data(&first_pass_encoder, stats, stats_len, &got_data);
   1458   } while (got_data);
   1459 
   1460   aom_codec_destroy(&first_pass_encoder);
   1461 }
   1462 ~~~~~~~~~~~~~~~
   1463 
   1464  During the second pass, the uncompressed frames and the stats are
   1465  passed into the encoder.
   1466 
   1467 ~~~~~~~~~~~~~~~{.c}
   1468 // Write out each encoded frame to the file.
   1469 void get_cx_data(aom_codec_ctx_t *encoder, FILE *file,
   1470                  bool *got_data) {
   1471   const aom_codec_cx_pkt_t *pkt;
   1472   aom_codec_iter_t iter = NULL;
   1473   while ((pkt = aom_codec_get_cx_data(encoder, &iter))) {
   1474    *got_data = true;
   1475    if (pkt->kind != AOM_CODEC_CX_FRAME_PKT) continue;
   1476    fwrite(pkt->data.frame.buf, 1, pkt->data.frame.sz, file);
   1477   }
   1478 }
   1479 
   1480 void second_pass(char *stats, size_t stats_len) {
   1481   struct aom_codec_enc_cfg second_pass_cfg;
   1482   ... // Initialize the config file as needed.
   1483   second_pass_cfg.g_pass = AOM_RC_LAST_PASS;
   1484   cfg.rc_twopass_stats_in.buf = stats;
   1485   cfg.rc_twopass_stats_in.sz = stats_len;
   1486   aom_codec_ctx_t second_pass_encoder;
   1487   ... // Initialize the encoder from the config.
   1488 
   1489   FILE *output = fopen("output.obu", "wb");
   1490   while (frame_available) {
   1491     // Read in the uncompressed frame, update frame_available
   1492     aom_image_t *frame_to_encode = ...;
   1493     aom_codec_encode(&second_pass_encoder, img, pts, duration, flags);
   1494     get_cx_data(&second_pass_encoder, output);
   1495   }
   1496   // Pass in NULL to flush the encoder.
   1497   bool got_data;
   1498   do {
   1499     got_data = false;
   1500     aom_codec_encode(&second_pass_encoder, NULL, pts, duration, flags);
   1501     get_cx_data(&second_pass_encoder, output, &got_data);
   1502   } while (got_data);
   1503 
   1504   aom_codec_destroy(&second_pass_encoder);
   1505 }
   1506 ~~~~~~~~~~~~~~~
   1507  */
   1508 
   1509  /*!\defgroup look_ahead_buffer The Look-Ahead Buffer
   1510     \ingroup high_level_algo
   1511 
   1512  A program should call \ref aom_codec_encode() for each frame that needs
   1513  processing. These frames are internally copied and stored in a fixed-size
   1514  circular buffer, known as the look-ahead buffer. Other parts of the code
   1515  will use future frame information to inform current frame decisions;
   1516  examples include the first-pass algorithm, TPL model, and temporal filter.
   1517  Note that this buffer also keeps a reference to the last source frame.
   1518 
   1519  The look-ahead buffer is defined in \ref av1/encoder/lookahead.h. It acts as an
   1520  opaque structure, with an interface to create and free memory associated with
   1521  it. It supports pushing and popping frames onto the structure in a FIFO
   1522  fashion. It also allows look-ahead when using the \ref av1_lookahead_peek()
   1523  function with a non-negative number, and look-behind when -1 is passed in (for
   1524  the last source frame; e.g., firstpass will use this for motion estimation).
   1525  The \ref av1_lookahead_depth() function returns the current number of frames
   1526  stored in it. Note that \ref av1_lookahead_pop() is a bit of a misnomer - it
   1527  only pops if either the "flush" variable is set, or the buffer is at maximum
   1528  capacity.
   1529 
   1530  The buffer is stored in the \ref AV1_PRIMARY::lookahead field.
   1531  It is initialized in the first call to \ref aom_codec_encode(), in the
   1532  \ref av1_receive_raw_frame() sub-routine. The buffer size is defined by
   1533  the g_lag_in_frames parameter set in the
   1534  \ref aom_codec_enc_cfg_t::g_lag_in_frames struct.
   1535  This can be modified manually but should only be set once. On the command
   1536  line, the flag "--lag-in-frames" controls it. The default size is 19 for
   1537  non-realtime usage and 1 for realtime. Note that a maximum value of 35 is
   1538  enforced.
   1539 
   1540  A frame will stay in the buffer as long as possible. As mentioned above,
   1541  the \ref av1_lookahead_pop() only removes a frame when either flush is set,
   1542  or the buffer is full. Note that each call to \ref aom_codec_encode() inserts
   1543  another frame into the buffer, and pop is called by the sub-function
   1544  \ref av1_encode_strategy(). The buffer is told to flush when
   1545  \ref aom_codec_encode() is passed a NULL image pointer. Note that the caller
   1546  must repeatedly call \ref aom_codec_encode() with a NULL image pointer, until
   1547  no more packets are available, in order to fully flush the buffer.
   1548 
   1549  */
   1550 
   1551 /*! @} - end defgroup high_level_algo */
   1552 
   1553 /*!\defgroup partition_search Partition Search
   1554  * \ingroup encoder_algo
   1555  * For and overview of the partition search see \ref architecture_enc_partitions
   1556  * @{
   1557  */
   1558 
   1559 /*! @} - end defgroup partition_search */
   1560 
   1561 /*!\defgroup intra_mode_search Intra Mode Search
   1562  * \ingroup encoder_algo
   1563  * This module describes intra mode search algorithm in AV1.
   1564  * More details will be added.
   1565  * @{
   1566  */
   1567 /*! @} - end defgroup intra_mode_search */
   1568 
   1569 /*!\defgroup inter_mode_search Inter Mode Search
   1570  * \ingroup encoder_algo
   1571  * This module describes inter mode search algorithm in AV1.
   1572  * More details will be added.
   1573  * @{
   1574  */
   1575 /*! @} - end defgroup inter_mode_search */
   1576 
   1577 /*!\defgroup palette_mode_search Palette Mode Search
   1578  * \ingroup intra_mode_search
   1579  * This module describes palette mode search algorithm in AV1.
   1580  * More details will be added.
   1581  * @{
   1582  */
   1583 /*! @} - end defgroup palette_mode_search */
   1584 
   1585 /*!\defgroup transform_search Transform Search
   1586  * \ingroup encoder_algo
   1587  * This module describes transform search algorithm in AV1.
   1588  * @{
   1589  */
   1590 /*! @} - end defgroup transform_search */
   1591 
   1592 /*!\defgroup coefficient_coding Transform Coefficient Coding and Optimization
   1593  * \ingroup encoder_algo
   1594  * This module describes the algorithms of transform coefficient coding and optimization in AV1.
   1595  * More details will be added.
   1596  * @{
   1597  */
   1598 /*! @} - end defgroup coefficient_coding */
   1599 
   1600 /*!\defgroup in_loop_filter In-loop Filter
   1601  * \ingroup encoder_algo
   1602  * This module describes in-loop filter algorithm in AV1.
   1603  * More details will be added.
   1604  * @{
   1605  */
   1606 /*! @} - end defgroup in_loop_filter */
   1607 
   1608 /*!\defgroup in_loop_cdef CDEF
   1609  * \ingroup encoder_algo
   1610  * This module describes the CDEF parameter search algorithm
   1611  * in AV1. More details will be added.
   1612  * @{
   1613  */
   1614 /*! @} - end defgroup in_loop_restoration */
   1615 
   1616 /*!\defgroup in_loop_restoration Loop Restoration
   1617  * \ingroup encoder_algo
   1618  * This module describes the loop restoration search
   1619  * and estimation algorithm in AV1.
   1620  * More details will be added.
   1621  * @{
   1622  */
   1623 /*! @} - end defgroup in_loop_restoration */
   1624 
   1625 /*!\defgroup cyclic_refresh Cyclic Refresh
   1626  * \ingroup encoder_algo
   1627  * This module describes the cyclic refresh (aq-mode=3) in AV1.
   1628  * More details will be added.
   1629  * @{
   1630  */
   1631 /*! @} - end defgroup cyclic_refresh */
   1632 
   1633 /*!\defgroup SVC Scalable Video Coding
   1634  * \ingroup encoder_algo
   1635  * This module describes scalable video coding algorithm in AV1.
   1636  * More details will be added.
   1637  * @{
   1638  */
   1639 /*! @} - end defgroup SVC */
   1640 /*!\defgroup variance_partition Variance Partition
   1641  * \ingroup encoder_algo
   1642  * This module describes variance partition algorithm in AV1.
   1643  * More details will be added.
   1644  * @{
   1645  */
   1646 /*! @} - end defgroup variance_partition */
   1647 /*!\defgroup nonrd_mode_search NonRD Optimized Mode Search
   1648  * \ingroup encoder_algo
   1649  * This module describes NonRD Optimized Mode Search used in Real-Time mode.
   1650  * More details will be added.
   1651  * @{
   1652  */
   1653 /*! @} - end defgroup nonrd_mode_search */