[ tor-browser ].git.dasho

NEWS (16731B)
      1 Changes for 1.5.2 'Sonic':
      2 --------------------------
      3 
      4 1.5.2 is a minor release of dav1d, focused on maintenance:
      5  - minor speed improvement in recon
      6  - improvements on loongarch symboles visibility and asm
      7  - mark C globals with small code model
      8  - reduce the code size of the frame header parsing (OBU)
      9  - minor fixes on tools and CI
     10  - fix compilation with nasm 3.00
     11 
     12 
     13 Changes for 1.5.1 'Sonic':
     14 --------------------------
     15 
     16 1.5.1 is a minor release of dav1d, focusing on optimizations and stack reduction:
     17 
     18  - Rewrite of the looprestoration (SGR, wiener) to reduce stack usage
     19  - Rewrite of {put,prep}_scaled functions
     20 
     21 Now, the required stack space for dav1d should be: 62 KB on x86_64 and
     22 58KB on arm and aarch64.
     23 
     24  - Improvements on the SSSE3 SGR
     25  - Improvements on ARM32/ARM64 looprestoration optimizations
     26  - RISC-V: blend optimizations for high bitdepth
     27  - Power9: blend optimizations for 8bpc
     28  - Port RISC-V to POSIX/non-Linux OS
     29  - AArch64: Add Neon implementation of load_tmvs
     30  - Fix a rare, but possible deadlock, in flush()
     31 
     32 
     33 Changes for 1.5.0 'Sonic':
     34 --------------------------
     35 
     36 1.5.0 is a major release of dav1d, that:
     37  - WARNING: we removed some of the SSE2 optimizations, so if you care about
     38             systems without SSSE3, you should be careful when updating!
     39  - Add Arm OpenBSD run-time CPU feature
     40  - Optimize index offset calculations for decode_coefs
     41  - picture: copy HDR10+ and T35 metadata only to visible frames
     42  - SSSE3 new optimizations for 6-tap (8bit and hbd)
     43  - AArch64/SVE: Add HBD subpel filters using 128-bit SVE2
     44  - AArch64: Add USMMLA implempentation for 6-tap H/HV
     45  - AArch64: Optimize Armv8.0 NEON for HBD horizontal filters and 6-tap filters
     46  - Power9: Optimized ITX till 16x4.
     47  - Loongarch: numerous optimizations
     48  - RISC-V optimizations for pal, cdef_filter, ipred, mc_blend, mc_bdir, itx
     49  - Allow playing videos in full-screen mode in dav1dplay
     50 
     51 
     52 Changes for 1.4.3 'Road Runner':
     53 --------------------------------
     54 
     55 1.4.3 is a small release focused on security issues
     56  - AArch64: Fix potential out of bounds access in DotProd H/HV filters
     57  - cli: Prevent buffer over-read
     58 
     59 
     60 Changes for 1.4.2 'Road Runner':
     61 --------------------------------
     62 
     63 1.4.2 is a small release of dav1d, improving notably ARM, AVX-512 and PowerPC
     64  - AVX2 optimizations for 8-tap and new variants for 6-tap
     65  - AVX-512 optimizations for 8-tap and new variants for 6-tap
     66  - Improve entropy decoding on ARM64
     67  - New ARM64 optimizations for convolutions based on DotProd extension
     68  - New ARM64 optimizations for convolutions based on i8mm extension
     69  - New ARM64 optimizations for subpel and prep filters for i8mm
     70  - Misc improvements on existing ARM64 optimizations, notably for put/prep
     71  - New PowerPC9 optimizations for loopfilter
     72  - Support for macOS kperf API for benchmarking
     73 
     74 
     75 Changes for 1.4.1 'Road Runner':
     76 --------------------------------
     77 
     78 1.4.1 is a small release of dav1d, improving notably ARM and RISC-V speed
     79 
     80 - Optimizations for 6tap filters for NEON (ARM)
     81 - More RISC-V optimizations for itx (4x8, 8x4, 4x16, 16x4, 8x16, 16x8)
     82 - Reduction of binary size on ARM64, ARM32 and RISC-V
     83 - Fix out-of-bounds read in 8bpc SSE2/SSSE3 wiener_filter
     84 - Msac optimizations
     85 
     86 
     87 Changes for 1.4.0 'Road Runner':
     88 --------------------------------
     89 
     90 1.4.0 is a medium release of dav1d, focusing on new architecture support and optimizations
     91 
     92 - AVX-512 optimizations for z1, z2, z3 in 8bit and high-bitdepth
     93 - New architecture supported: loongarch
     94 - Loongarch optimizations for 8bit
     95 - New architecture supported: RISC-V
     96 - RISC-V optimizations for itx
     97 - Misc improvements in threading and in reducing binary size
     98 - Fix potential integer overflow with extremely large frame sizes (CVE-2024-1580)
     99 
    100 
    101 Changes for 1.3.0 'Tundra Peregrine Falcon (Calidus)':
    102 ------------------------------------------------------
    103 
    104 1.3.0 is a medium release of dav1d, focus on new APIs and memory usage reduction.
    105 
    106 - Reduce memory usage in numerous places
    107 - ABI break in Dav1dSequenceHeader, Dav1dFrameHeader, Dav1dContentLightLevel structures
    108 - new API function to check the API version: dav1d_version_api()
    109 - Rewrite of the SGR functions for ARM64 to be faster
    110 - NEON implemetation of save_tmvs for ARM32 and ARM64
    111 - x86 palette DSP for pal_idx_finish function
    112 
    113 
    114 Changes for 1.2.1 'Arctic Peregrine Falcon':
    115 --------------------------------------------
    116 
    117 1.2.1 is a small release of dav1d, adding more SIMD and fixes
    118 
    119 - Fix a threading race on task_thread.init_done
    120 - NEON z2 8bpc and high bit-depth optimizations
    121 - SSSE3 z2 high bit-depth optimziations
    122 - Fix a desynced luma/chroma planes issue with Film Grain
    123 - Reduce memory consumption
    124 - Improve dav1d_parse_sequence_header() speed
    125 - OBU: Improve header parsing and fix potential overflows
    126 - OBU: Improve ITU-T T.35 parsing speed
    127 - Misc buildsystems, CI and headers fixes
    128 
    129 
    130 Changes for 1.2.0 'Arctic Peregrine Falcon':
    131 --------------------------------------------
    132 
    133 1.2.0 is a small release of dav1d, adding more SIMD and fixes
    134 
    135 - Improvements on attachments of props and T.35 entries on output pictures
    136 - NEON z1/z3 high bit-depth optimizations and improvements for 8bpc
    137 - SSSE3 z2/z3 8bpc and SSSE3 z1/z3 high bit-depth optimziations
    138 - refmvs.save_tmvs optimizations in SSSE3/AVX2/AVX-512
    139 - AVX-512 optimizations for high bit-depth itx (16x64, 32x64, 64x16, 64x32, 64x64)
    140 - AVX2 optimizations for 12bpc for 16x32, 32x16, 32x32 itx
    141 
    142 
    143 Changes for 1.1.0 'Arctic Peregrine Falcon':
    144 --------------------------------------------
    145 
    146 1.1.0 is an important release of dav1d, fixing numerous bugs, and adding SIMD
    147 
    148 - New function dav1d_get_frame_delay to query the decoder frame delay
    149 - Numerous fixes for strict conformity to the specs and samples
    150 - NEON and AVX-512 misc fixes and improvements
    151 - Partial AVX2 12bpc transform implementations
    152 - AVX-512 high bit-depth cdef_filter, loopfilter, itx
    153 - NEON z1/z3 optimization for 8bpc
    154 - SSSE3 z1 optimization for 8bpc
    155 
    156  "From VideoLAN with love"
    157 
    158 
    159 Changes for 1.0.0 'Peregrine Falcon':
    160 -------------------------------------
    161 
    162 1.0.0 is a major release of dav1d, adding important features and bug fixes.
    163 
    164 It notably changes, in an important way, the way threading works, by adding
    165 an automatic thread management.
    166 
    167 It also adds support for AVX-512 acceleration, and adds speedups to existing x86
    168 code (from SSE2 to AVX2).
    169 
    170 1.0.0 adds new grain API to ease acceleration on the GPU, and adds an API call
    171 to get information of which frame failed to decode, in error cases.
    172 
    173 Finally, 1.0.0 fixes numerous small bugs that were reported since the beginning
    174 of the project to have a proper release.
    175 
    176                                      .''.
    177          .''.      .        *''*    :_\/_:     .
    178         :_\/_:   _\(/_  .:.*_\/_*   : /\ :  .'.:.'.
    179     .''.: /\ :   ./)\   ':'* /\ * :  '..'.  -=:o:=-
    180    :_\/_:'.:::.    ' *''*    * '.\'/.' _\(/_'.':'.'
    181    : /\ : :::::     *_\/_*     -= o =-  /)\    '  *
    182     '..'  ':::'     * /\ *     .'/.\'.   '
    183         *            *..*         :
    184           *                       :
    185           *         1.0.0
    186 
    187 
    188 
    189 Changes for 0.9.2 'Golden Eagle':
    190 ---------------------------------
    191 
    192 0.9.2 is a small update of dav1d on the 0.9.x branch:
    193  - x86: SSE4 optimizations of inverse transforms for 10bit for all sizes
    194  - x86: mc.resize optimizations with AVX2/SSSE3 for 10/12b
    195  - x86: SSSE3 optimizations for cdef_filter in 10/12b and mc_w_mask_422/444 in 8b
    196  - ARM NEON optimizations for FilmGrain Gen_grain functions
    197  - Optimizations for splat_mv in SSE2/AVX2 and NEON
    198  - x86: SGR improvements for SSSE3 CPUs
    199  - x86: AVX2 optimizations for cfl_ac
    200 
    201 
    202 Changes for 0.9.1 'Golden Eagle':
    203 ---------------------------------
    204 
    205 0.9.1 is a middle-size revision of dav1d, adding notably 10b acceleration for SSSE3:
    206  - 10/12b SSSE3 optimizations for mc (avg, w_avg, mask, w_mask, emu_edge),
    207    prep/put_bilin, prep/put_8tap, ipred (dc/h/v, paeth, smooth, pal, filter), wiener,
    208    sgr (10b), warp8x8, deblock, film_grain, cfl_ac/pred for 32bit and 64bit x86 processors
    209  - Film grain NEON for fguv 10/12b, fgy/fguv 8b and fgy/fguv 10/12 arm32
    210  - Fixes for filmgrain on ARM
    211  - itx 10bit optimizations for 4x4/x8/x16, 8x4/x8/x16 for SSE4
    212  - Misc improvements on SSE2, SSE4
    213 
    214 
    215 Changes for 0.9.0 'Golden Eagle':
    216 ---------------------------------
    217 
    218 0.9.0 is a major version of dav1d, adding notably 10b acceleration on x64.
    219 
    220 Details:
    221  - x86 (64bit) AVX2 implementation of most 10b/12b functions, which should provide
    222    a large boost for high-bitdepth decoding on modern x86 computers and servers.
    223  - ARM64 neon implementation of FilmGrain (4:2:0/4:2:2/4:4:4 8bit)
    224  - New API to signal events happening during the decoding process
    225 
    226 
    227 Changes for 0.8.2 'Eurasian Hobby':
    228 -----------------------------------
    229 
    230 0.8.2 is a middle-size update of the 0.8.0 branch:
    231  - ARM32 optimizations for ipred and itx in 10/12bits,
    232    completing the 10b/12b work on ARM64 and ARM32
    233  - Give the post-filters their own threads
    234  - ARM64: rewrite the wiener functions
    235  - Speed up coefficient decoding, 0.5%-3% global decoding gain
    236  - x86 optimizations for CDEF_filter and wiener in 10/12bit
    237  - x86: rewrite the SGR AVX2 asm
    238  - x86: improve msac speed on SSE2+ machines
    239  - ARM32: improve speed of ipred and warp
    240  - ARM64: improve speed of ipred, cdef_dir, cdef_filter, warp_motion and itx16
    241  - ARM32/64: improve speed of looprestoration
    242  - Add seeking, pausing to the player
    243  - Update the player for rendering of 10b/12b
    244  - Misc speed improvements and fixes on all platforms
    245  - Add a xxh3 muxer in the dav1d application
    246 
    247 
    248 Changes for 0.8.1 'Eurasian Hobby':
    249 -----------------------------------
    250 
    251 0.8.1 is a minor update on 0.8.0:
    252  - Keep references to buffers valid after dav1d_close(). Fixes a regression
    253    caused by the picture buffer pool added in 0.8.0.
    254  - ARM32 optimizations for 10bit bitdepth for SGR
    255  - ARM32 optimizations for 16bit bitdepth for blend/w_masl/emu_edge
    256  - ARM64 optimizations for 10bit bitdepth for SGR
    257  - x86 optimizations for wiener in SSE2/SSSE3/AVX2
    258 
    259 
    260 Changes for 0.8.0 'Eurasian Hobby':
    261 -----------------------------------
    262 
    263 0.8.0 is a major update for dav1d:
    264  - Improve the performance by using a picture buffer pool;
    265    The improvements can reach 10% on some cases on Windows.
    266  - Support for Apple ARM Silicon
    267  - ARM32 optimizations for 8bit bitdepth for ipred paeth, smooth, cfl
    268  - ARM32 optimizations for 10/12/16bit bitdepth for mc_avg/mask/w_avg,
    269    put/prep 8tap/bilin, wiener and CDEF filters
    270  - ARM64 optimizations for cfl_ac 444 for all bitdepths
    271  - x86 optimizations for MC 8-tap, mc_scaled in AVX2
    272  - x86 optimizations for CDEF in SSE and {put/prep}_{8tap/bilin} in SSSE3
    273 
    274 
    275 Changes for 0.7.1 'Frigatebird':
    276 ------------------------------
    277 
    278 0.7.1 is a minor update on 0.7.0:
    279  - ARM32 NEON optimizations for itxfm, which can give up to 28% speedup, and MSAC
    280  - SSE2 optimizations for prep_bilin and prep_8tap
    281  - AVX2 optimizations for MC scaled
    282  - Fix a clamping issue in motion vector projection
    283  - Fix an issue on some specific Haswell CPU on ipred_z AVX2 functions
    284  - Improvements on the dav1dplay utility player to support resizing
    285 
    286 
    287 Changes for 0.7.0 'Frigatebird':
    288 ------------------------------
    289 
    290 0.7.0 is a major release for dav1d:
    291  - Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread)
    292  - 10b/12b ARM64 optimizations are mostly complete:
    293    - ipred (paeth, smooth, dc, pal, filter, cfl)
    294    - itxfm (only 10b)
    295  - AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize
    296  - AVX2 for cfl4:4:4
    297  - AVX-512 CDEF filter
    298  - ARM64 8b improvements for cfl_ac and itxfm
    299  - ARM64 implementation for emu_edge in 8b/10b/12b
    300  - ARM32 implementation for emu_edge in 8b
    301  - Improvements on the dav1dplay utility player to support 10 bit,
    302    non-4:2:0 pixel formats and film grain on the GPU
    303 
    304 
    305 Changes for 0.6.0 'Gyrfalcon':
    306 ------------------------------
    307 
    308 0.6.0 is a major release for dav1d:
    309  - New ARM64 optimizations for the 10/12bit depth:
    310     - mc_avg, mc_w_avg, mc_mask
    311     - mc_put/mc_prep 8tap/bilin
    312     - mc_warp_8x8
    313     - mc_w_mask
    314     - mc_blend
    315     - wiener
    316     - SGR
    317     - loopfilter
    318     - cdef
    319  - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask
    320  - New SSSE3 optimizations for film grain
    321  - New AVX2 optimizations for msac_adapt16
    322  - Fix rare mismatches against the reference decoder, notably because of clipping
    323  - Improvements on ARM64 on msac, cdef, mc_blend_v and looprestoration optimizations
    324  - Improvements on AVX2 optimizations for cdef_filter
    325  - Improvements in the C version for itxfm, cdef_filter
    326 
    327 
    328 Changes for 0.5.2 'Asiatic Cheetah':
    329 ------------------------------------
    330 
    331 0.5.2 is a small release improving speed for ARM32 and adding minor features:
    332  - ARM32 optimizations for loopfilter, ipred_dc|h|v
    333  - Add section-5 raw OBU demuxer
    334  - Improve the speed by reducing the L2 cache collisions
    335  - Fix minor issues
    336 
    337 
    338 Changes for 0.5.1 'Asiatic Cheetah':
    339 ------------------------------------
    340 
    341 0.5.1 is a small release improving speeds and fixing minor issues
    342 compared to 0.5.0:
    343  - SSE2 optimizations for CDEF, wiener and warp_affine
    344  - NEON optimizations for SGR on ARM32
    345  - Fix mismatch issue in x86 asm in inverse identity transforms
    346  - Fix build issue in ARM64 assembly if debug info was enabled
    347  - Add a workaround for Xcode 11 -fstack-check bug
    348 
    349 
    350 Changes for 0.5.0 'Asiatic Cheetah':
    351 ------------------------------------
    352 
    353 0.5.0 is a medium release fixing regressions and minor issues,
    354 and improving speed significantly:
    355  - Export ITU T.35 metadata
    356  - Speed improvements on blend_ on ARM
    357  - Speed improvements on decode_coef and MSAC
    358  - NEON optimizations for blend*, w_mask_, ipred functions for ARM64
    359  - NEON optimizations for CDEF and warp on ARM32
    360  - SSE2 optimizations for MSAC hi_tok decoding
    361  - SSSE3 optimizations for deblocking loopfilters and warp_affine
    362  - AVX2 optimizations for film grain and ipred_z2
    363  - SSE4 optimizations for warp_affine
    364  - VSX optimizations for wiener
    365  - Fix inverse transform overflows in x86 and NEON asm
    366  - Fix integer overflows with large frames
    367  - Improve film grain generation to match reference code
    368  - Improve compatibility with older binutils for ARM
    369  - More advanced Player example in tools
    370 
    371 
    372 Changes for 0.4.0 'Cheetah':
    373 ----------------------------
    374 
    375  - Fix playback with unknown OBUs
    376  - Add an option to limit the maximum frame size
    377  - SSE2 and ARM64 optimizations for MSAC
    378  - Improve speed on 32bits systems
    379  - Optimization in obmc blend
    380  - Reduce RAM usage significantly
    381  - The initial PPC SIMD code, cdef_filter
    382  - NEON optimizations for blend functions on ARM
    383  - NEON optimizations for w_mask functions on ARM
    384  - NEON optimizations for inverse transforms on ARM64
    385  - VSX optimizations for CDEF filter
    386  - Improve handling of malloc failures
    387  - Simple Player example in tools
    388 
    389 
    390 Changes for 0.3.1 'Sailfish':
    391 ------------------------------
    392 
    393  - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs
    394  - Reduce binary size, notably on Windows
    395  - SSSE3 optimizations for ipred_filter
    396  - ARM optimizations for MSAC
    397 
    398 
    399 Changes for 0.3.0 'Sailfish':
    400 ------------------------------
    401 
    402 This is the final release for the numerous speed improvements of 0.3.0-rc.
    403 It mostly:
    404  - Fixes an annoying crash on SSSE3 that happened in the itx functions
    405 
    406 
    407 Changes for 0.2.2 (0.3.0-rc) 'Antelope':
    408 -----------------------------
    409 
    410  - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase
    411    The impact is important on SSSE3, SSE4 and AVX2 cpus
    412  - SSSE3 optimizations for all blocks size in itx
    413  - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444)
    414  - Speed improvements on CDEF for SSE4 CPUs
    415  - NEON optimizations for SGR and loop filter
    416  - Minor crashes, improvements and build changes
    417 
    418 
    419 Changes for 0.2.1 'Antelope':
    420 ----------------------------
    421 
    422  - SSSE3 optimization for cdef_dir
    423  - AVX2 improvements of the existing CDEF optimizations
    424  - NEON improvements of the existing CDEF and wiener optimizations
    425  - Clarification about the numbering/versionning scheme
    426 
    427 
    428 Changes for 0.2.0 'Antelope':
    429 ----------------------------
    430 
    431  - ARM64 and ARM optimizations using NEON instructions
    432  - SSSE3 optimizations for both 32 and 64bits
    433  - More AVX2 assembly, reaching almost completion
    434  - Fix installation of includes
    435  - Rewrite inverse transforms to avoid overflows
    436  - Snap packaging for Linux
    437  - Updated API (ABI and API break)
    438  - Fixes for un-decodable samples
    439 
    440 
    441 Changes for 0.1.0 'Gazelle':
    442 ----------------------------
    443 
    444 Initial release of dav1d, the fast and small AV1 decoder.
    445  - Support for all features of the AV1 bitstream
    446  - Support for all bitdepth, 8, 10 and 12bits
    447  - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale
    448  - Full acceleration for AVX2 64bits processors, making it the fastest decoder
    449  - Partial acceleration for SSSE3 processors
    450  - Partial acceleration for NEON processors
	tor-browser The Tor Browser
	git clone https://git.dasho.dev/tor-browser.git
	Log \| Files \| Refs \| README \| LICENSE