NEWS (16731B)
1 Changes for 1.5.2 'Sonic': 2 -------------------------- 3 4 1.5.2 is a minor release of dav1d, focused on maintenance: 5 - minor speed improvement in recon 6 - improvements on loongarch symboles visibility and asm 7 - mark C globals with small code model 8 - reduce the code size of the frame header parsing (OBU) 9 - minor fixes on tools and CI 10 - fix compilation with nasm 3.00 11 12 13 Changes for 1.5.1 'Sonic': 14 -------------------------- 15 16 1.5.1 is a minor release of dav1d, focusing on optimizations and stack reduction: 17 18 - Rewrite of the looprestoration (SGR, wiener) to reduce stack usage 19 - Rewrite of {put,prep}_scaled functions 20 21 Now, the required stack space for dav1d should be: 62 KB on x86_64 and 22 58KB on arm and aarch64. 23 24 - Improvements on the SSSE3 SGR 25 - Improvements on ARM32/ARM64 looprestoration optimizations 26 - RISC-V: blend optimizations for high bitdepth 27 - Power9: blend optimizations for 8bpc 28 - Port RISC-V to POSIX/non-Linux OS 29 - AArch64: Add Neon implementation of load_tmvs 30 - Fix a rare, but possible deadlock, in flush() 31 32 33 Changes for 1.5.0 'Sonic': 34 -------------------------- 35 36 1.5.0 is a major release of dav1d, that: 37 - WARNING: we removed some of the SSE2 optimizations, so if you care about 38 systems without SSSE3, you should be careful when updating! 39 - Add Arm OpenBSD run-time CPU feature 40 - Optimize index offset calculations for decode_coefs 41 - picture: copy HDR10+ and T35 metadata only to visible frames 42 - SSSE3 new optimizations for 6-tap (8bit and hbd) 43 - AArch64/SVE: Add HBD subpel filters using 128-bit SVE2 44 - AArch64: Add USMMLA implempentation for 6-tap H/HV 45 - AArch64: Optimize Armv8.0 NEON for HBD horizontal filters and 6-tap filters 46 - Power9: Optimized ITX till 16x4. 47 - Loongarch: numerous optimizations 48 - RISC-V optimizations for pal, cdef_filter, ipred, mc_blend, mc_bdir, itx 49 - Allow playing videos in full-screen mode in dav1dplay 50 51 52 Changes for 1.4.3 'Road Runner': 53 -------------------------------- 54 55 1.4.3 is a small release focused on security issues 56 - AArch64: Fix potential out of bounds access in DotProd H/HV filters 57 - cli: Prevent buffer over-read 58 59 60 Changes for 1.4.2 'Road Runner': 61 -------------------------------- 62 63 1.4.2 is a small release of dav1d, improving notably ARM, AVX-512 and PowerPC 64 - AVX2 optimizations for 8-tap and new variants for 6-tap 65 - AVX-512 optimizations for 8-tap and new variants for 6-tap 66 - Improve entropy decoding on ARM64 67 - New ARM64 optimizations for convolutions based on DotProd extension 68 - New ARM64 optimizations for convolutions based on i8mm extension 69 - New ARM64 optimizations for subpel and prep filters for i8mm 70 - Misc improvements on existing ARM64 optimizations, notably for put/prep 71 - New PowerPC9 optimizations for loopfilter 72 - Support for macOS kperf API for benchmarking 73 74 75 Changes for 1.4.1 'Road Runner': 76 -------------------------------- 77 78 1.4.1 is a small release of dav1d, improving notably ARM and RISC-V speed 79 80 - Optimizations for 6tap filters for NEON (ARM) 81 - More RISC-V optimizations for itx (4x8, 8x4, 4x16, 16x4, 8x16, 16x8) 82 - Reduction of binary size on ARM64, ARM32 and RISC-V 83 - Fix out-of-bounds read in 8bpc SSE2/SSSE3 wiener_filter 84 - Msac optimizations 85 86 87 Changes for 1.4.0 'Road Runner': 88 -------------------------------- 89 90 1.4.0 is a medium release of dav1d, focusing on new architecture support and optimizations 91 92 - AVX-512 optimizations for z1, z2, z3 in 8bit and high-bitdepth 93 - New architecture supported: loongarch 94 - Loongarch optimizations for 8bit 95 - New architecture supported: RISC-V 96 - RISC-V optimizations for itx 97 - Misc improvements in threading and in reducing binary size 98 - Fix potential integer overflow with extremely large frame sizes (CVE-2024-1580) 99 100 101 Changes for 1.3.0 'Tundra Peregrine Falcon (Calidus)': 102 ------------------------------------------------------ 103 104 1.3.0 is a medium release of dav1d, focus on new APIs and memory usage reduction. 105 106 - Reduce memory usage in numerous places 107 - ABI break in Dav1dSequenceHeader, Dav1dFrameHeader, Dav1dContentLightLevel structures 108 - new API function to check the API version: dav1d_version_api() 109 - Rewrite of the SGR functions for ARM64 to be faster 110 - NEON implemetation of save_tmvs for ARM32 and ARM64 111 - x86 palette DSP for pal_idx_finish function 112 113 114 Changes for 1.2.1 'Arctic Peregrine Falcon': 115 -------------------------------------------- 116 117 1.2.1 is a small release of dav1d, adding more SIMD and fixes 118 119 - Fix a threading race on task_thread.init_done 120 - NEON z2 8bpc and high bit-depth optimizations 121 - SSSE3 z2 high bit-depth optimziations 122 - Fix a desynced luma/chroma planes issue with Film Grain 123 - Reduce memory consumption 124 - Improve dav1d_parse_sequence_header() speed 125 - OBU: Improve header parsing and fix potential overflows 126 - OBU: Improve ITU-T T.35 parsing speed 127 - Misc buildsystems, CI and headers fixes 128 129 130 Changes for 1.2.0 'Arctic Peregrine Falcon': 131 -------------------------------------------- 132 133 1.2.0 is a small release of dav1d, adding more SIMD and fixes 134 135 - Improvements on attachments of props and T.35 entries on output pictures 136 - NEON z1/z3 high bit-depth optimizations and improvements for 8bpc 137 - SSSE3 z2/z3 8bpc and SSSE3 z1/z3 high bit-depth optimziations 138 - refmvs.save_tmvs optimizations in SSSE3/AVX2/AVX-512 139 - AVX-512 optimizations for high bit-depth itx (16x64, 32x64, 64x16, 64x32, 64x64) 140 - AVX2 optimizations for 12bpc for 16x32, 32x16, 32x32 itx 141 142 143 Changes for 1.1.0 'Arctic Peregrine Falcon': 144 -------------------------------------------- 145 146 1.1.0 is an important release of dav1d, fixing numerous bugs, and adding SIMD 147 148 - New function dav1d_get_frame_delay to query the decoder frame delay 149 - Numerous fixes for strict conformity to the specs and samples 150 - NEON and AVX-512 misc fixes and improvements 151 - Partial AVX2 12bpc transform implementations 152 - AVX-512 high bit-depth cdef_filter, loopfilter, itx 153 - NEON z1/z3 optimization for 8bpc 154 - SSSE3 z1 optimization for 8bpc 155 156 "From VideoLAN with love" 157 158 159 Changes for 1.0.0 'Peregrine Falcon': 160 ------------------------------------- 161 162 1.0.0 is a major release of dav1d, adding important features and bug fixes. 163 164 It notably changes, in an important way, the way threading works, by adding 165 an automatic thread management. 166 167 It also adds support for AVX-512 acceleration, and adds speedups to existing x86 168 code (from SSE2 to AVX2). 169 170 1.0.0 adds new grain API to ease acceleration on the GPU, and adds an API call 171 to get information of which frame failed to decode, in error cases. 172 173 Finally, 1.0.0 fixes numerous small bugs that were reported since the beginning 174 of the project to have a proper release. 175 176 .''. 177 .''. . *''* :_\/_: . 178 :_\/_: _\(/_ .:.*_\/_* : /\ : .'.:.'. 179 .''.: /\ : ./)\ ':'* /\ * : '..'. -=:o:=- 180 :_\/_:'.:::. ' *''* * '.\'/.' _\(/_'.':'.' 181 : /\ : ::::: *_\/_* -= o =- /)\ ' * 182 '..' ':::' * /\ * .'/.\'. ' 183 * *..* : 184 * : 185 * 1.0.0 186 187 188 189 Changes for 0.9.2 'Golden Eagle': 190 --------------------------------- 191 192 0.9.2 is a small update of dav1d on the 0.9.x branch: 193 - x86: SSE4 optimizations of inverse transforms for 10bit for all sizes 194 - x86: mc.resize optimizations with AVX2/SSSE3 for 10/12b 195 - x86: SSSE3 optimizations for cdef_filter in 10/12b and mc_w_mask_422/444 in 8b 196 - ARM NEON optimizations for FilmGrain Gen_grain functions 197 - Optimizations for splat_mv in SSE2/AVX2 and NEON 198 - x86: SGR improvements for SSSE3 CPUs 199 - x86: AVX2 optimizations for cfl_ac 200 201 202 Changes for 0.9.1 'Golden Eagle': 203 --------------------------------- 204 205 0.9.1 is a middle-size revision of dav1d, adding notably 10b acceleration for SSSE3: 206 - 10/12b SSSE3 optimizations for mc (avg, w_avg, mask, w_mask, emu_edge), 207 prep/put_bilin, prep/put_8tap, ipred (dc/h/v, paeth, smooth, pal, filter), wiener, 208 sgr (10b), warp8x8, deblock, film_grain, cfl_ac/pred for 32bit and 64bit x86 processors 209 - Film grain NEON for fguv 10/12b, fgy/fguv 8b and fgy/fguv 10/12 arm32 210 - Fixes for filmgrain on ARM 211 - itx 10bit optimizations for 4x4/x8/x16, 8x4/x8/x16 for SSE4 212 - Misc improvements on SSE2, SSE4 213 214 215 Changes for 0.9.0 'Golden Eagle': 216 --------------------------------- 217 218 0.9.0 is a major version of dav1d, adding notably 10b acceleration on x64. 219 220 Details: 221 - x86 (64bit) AVX2 implementation of most 10b/12b functions, which should provide 222 a large boost for high-bitdepth decoding on modern x86 computers and servers. 223 - ARM64 neon implementation of FilmGrain (4:2:0/4:2:2/4:4:4 8bit) 224 - New API to signal events happening during the decoding process 225 226 227 Changes for 0.8.2 'Eurasian Hobby': 228 ----------------------------------- 229 230 0.8.2 is a middle-size update of the 0.8.0 branch: 231 - ARM32 optimizations for ipred and itx in 10/12bits, 232 completing the 10b/12b work on ARM64 and ARM32 233 - Give the post-filters their own threads 234 - ARM64: rewrite the wiener functions 235 - Speed up coefficient decoding, 0.5%-3% global decoding gain 236 - x86 optimizations for CDEF_filter and wiener in 10/12bit 237 - x86: rewrite the SGR AVX2 asm 238 - x86: improve msac speed on SSE2+ machines 239 - ARM32: improve speed of ipred and warp 240 - ARM64: improve speed of ipred, cdef_dir, cdef_filter, warp_motion and itx16 241 - ARM32/64: improve speed of looprestoration 242 - Add seeking, pausing to the player 243 - Update the player for rendering of 10b/12b 244 - Misc speed improvements and fixes on all platforms 245 - Add a xxh3 muxer in the dav1d application 246 247 248 Changes for 0.8.1 'Eurasian Hobby': 249 ----------------------------------- 250 251 0.8.1 is a minor update on 0.8.0: 252 - Keep references to buffers valid after dav1d_close(). Fixes a regression 253 caused by the picture buffer pool added in 0.8.0. 254 - ARM32 optimizations for 10bit bitdepth for SGR 255 - ARM32 optimizations for 16bit bitdepth for blend/w_masl/emu_edge 256 - ARM64 optimizations for 10bit bitdepth for SGR 257 - x86 optimizations for wiener in SSE2/SSSE3/AVX2 258 259 260 Changes for 0.8.0 'Eurasian Hobby': 261 ----------------------------------- 262 263 0.8.0 is a major update for dav1d: 264 - Improve the performance by using a picture buffer pool; 265 The improvements can reach 10% on some cases on Windows. 266 - Support for Apple ARM Silicon 267 - ARM32 optimizations for 8bit bitdepth for ipred paeth, smooth, cfl 268 - ARM32 optimizations for 10/12/16bit bitdepth for mc_avg/mask/w_avg, 269 put/prep 8tap/bilin, wiener and CDEF filters 270 - ARM64 optimizations for cfl_ac 444 for all bitdepths 271 - x86 optimizations for MC 8-tap, mc_scaled in AVX2 272 - x86 optimizations for CDEF in SSE and {put/prep}_{8tap/bilin} in SSSE3 273 274 275 Changes for 0.7.1 'Frigatebird': 276 ------------------------------ 277 278 0.7.1 is a minor update on 0.7.0: 279 - ARM32 NEON optimizations for itxfm, which can give up to 28% speedup, and MSAC 280 - SSE2 optimizations for prep_bilin and prep_8tap 281 - AVX2 optimizations for MC scaled 282 - Fix a clamping issue in motion vector projection 283 - Fix an issue on some specific Haswell CPU on ipred_z AVX2 functions 284 - Improvements on the dav1dplay utility player to support resizing 285 286 287 Changes for 0.7.0 'Frigatebird': 288 ------------------------------ 289 290 0.7.0 is a major release for dav1d: 291 - Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread) 292 - 10b/12b ARM64 optimizations are mostly complete: 293 - ipred (paeth, smooth, dc, pal, filter, cfl) 294 - itxfm (only 10b) 295 - AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize 296 - AVX2 for cfl4:4:4 297 - AVX-512 CDEF filter 298 - ARM64 8b improvements for cfl_ac and itxfm 299 - ARM64 implementation for emu_edge in 8b/10b/12b 300 - ARM32 implementation for emu_edge in 8b 301 - Improvements on the dav1dplay utility player to support 10 bit, 302 non-4:2:0 pixel formats and film grain on the GPU 303 304 305 Changes for 0.6.0 'Gyrfalcon': 306 ------------------------------ 307 308 0.6.0 is a major release for dav1d: 309 - New ARM64 optimizations for the 10/12bit depth: 310 - mc_avg, mc_w_avg, mc_mask 311 - mc_put/mc_prep 8tap/bilin 312 - mc_warp_8x8 313 - mc_w_mask 314 - mc_blend 315 - wiener 316 - SGR 317 - loopfilter 318 - cdef 319 - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask 320 - New SSSE3 optimizations for film grain 321 - New AVX2 optimizations for msac_adapt16 322 - Fix rare mismatches against the reference decoder, notably because of clipping 323 - Improvements on ARM64 on msac, cdef, mc_blend_v and looprestoration optimizations 324 - Improvements on AVX2 optimizations for cdef_filter 325 - Improvements in the C version for itxfm, cdef_filter 326 327 328 Changes for 0.5.2 'Asiatic Cheetah': 329 ------------------------------------ 330 331 0.5.2 is a small release improving speed for ARM32 and adding minor features: 332 - ARM32 optimizations for loopfilter, ipred_dc|h|v 333 - Add section-5 raw OBU demuxer 334 - Improve the speed by reducing the L2 cache collisions 335 - Fix minor issues 336 337 338 Changes for 0.5.1 'Asiatic Cheetah': 339 ------------------------------------ 340 341 0.5.1 is a small release improving speeds and fixing minor issues 342 compared to 0.5.0: 343 - SSE2 optimizations for CDEF, wiener and warp_affine 344 - NEON optimizations for SGR on ARM32 345 - Fix mismatch issue in x86 asm in inverse identity transforms 346 - Fix build issue in ARM64 assembly if debug info was enabled 347 - Add a workaround for Xcode 11 -fstack-check bug 348 349 350 Changes for 0.5.0 'Asiatic Cheetah': 351 ------------------------------------ 352 353 0.5.0 is a medium release fixing regressions and minor issues, 354 and improving speed significantly: 355 - Export ITU T.35 metadata 356 - Speed improvements on blend_ on ARM 357 - Speed improvements on decode_coef and MSAC 358 - NEON optimizations for blend*, w_mask_, ipred functions for ARM64 359 - NEON optimizations for CDEF and warp on ARM32 360 - SSE2 optimizations for MSAC hi_tok decoding 361 - SSSE3 optimizations for deblocking loopfilters and warp_affine 362 - AVX2 optimizations for film grain and ipred_z2 363 - SSE4 optimizations for warp_affine 364 - VSX optimizations for wiener 365 - Fix inverse transform overflows in x86 and NEON asm 366 - Fix integer overflows with large frames 367 - Improve film grain generation to match reference code 368 - Improve compatibility with older binutils for ARM 369 - More advanced Player example in tools 370 371 372 Changes for 0.4.0 'Cheetah': 373 ---------------------------- 374 375 - Fix playback with unknown OBUs 376 - Add an option to limit the maximum frame size 377 - SSE2 and ARM64 optimizations for MSAC 378 - Improve speed on 32bits systems 379 - Optimization in obmc blend 380 - Reduce RAM usage significantly 381 - The initial PPC SIMD code, cdef_filter 382 - NEON optimizations for blend functions on ARM 383 - NEON optimizations for w_mask functions on ARM 384 - NEON optimizations for inverse transforms on ARM64 385 - VSX optimizations for CDEF filter 386 - Improve handling of malloc failures 387 - Simple Player example in tools 388 389 390 Changes for 0.3.1 'Sailfish': 391 ------------------------------ 392 393 - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs 394 - Reduce binary size, notably on Windows 395 - SSSE3 optimizations for ipred_filter 396 - ARM optimizations for MSAC 397 398 399 Changes for 0.3.0 'Sailfish': 400 ------------------------------ 401 402 This is the final release for the numerous speed improvements of 0.3.0-rc. 403 It mostly: 404 - Fixes an annoying crash on SSSE3 that happened in the itx functions 405 406 407 Changes for 0.2.2 (0.3.0-rc) 'Antelope': 408 ----------------------------- 409 410 - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase 411 The impact is important on SSSE3, SSE4 and AVX2 cpus 412 - SSSE3 optimizations for all blocks size in itx 413 - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444) 414 - Speed improvements on CDEF for SSE4 CPUs 415 - NEON optimizations for SGR and loop filter 416 - Minor crashes, improvements and build changes 417 418 419 Changes for 0.2.1 'Antelope': 420 ---------------------------- 421 422 - SSSE3 optimization for cdef_dir 423 - AVX2 improvements of the existing CDEF optimizations 424 - NEON improvements of the existing CDEF and wiener optimizations 425 - Clarification about the numbering/versionning scheme 426 427 428 Changes for 0.2.0 'Antelope': 429 ---------------------------- 430 431 - ARM64 and ARM optimizations using NEON instructions 432 - SSSE3 optimizations for both 32 and 64bits 433 - More AVX2 assembly, reaching almost completion 434 - Fix installation of includes 435 - Rewrite inverse transforms to avoid overflows 436 - Snap packaging for Linux 437 - Updated API (ABI and API break) 438 - Fixes for un-decodable samples 439 440 441 Changes for 0.1.0 'Gazelle': 442 ---------------------------- 443 444 Initial release of dav1d, the fast and small AV1 decoder. 445 - Support for all features of the AV1 bitstream 446 - Support for all bitdepth, 8, 10 and 12bits 447 - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale 448 - Full acceleration for AVX2 64bits processors, making it the fastest decoder 449 - Partial acceleration for SSSE3 processors 450 - Partial acceleration for NEON processors