changelog (13843B)
1 highway (1.3.0-1) UNRELEASED; urgency=medium 2 3 Add: 4 - AddLower, PairwiseAdd/Sub, MaskedAbsOr, BitsFromMask 5 - AVX10_2 and Loongson LASX/LSX targets 6 - AVX3_SPR F16, WASM_EMU256 F64 types 7 - CeilInt/FloorInt, DemoteToNearestInt and F16/F64 NearestInt 8 - Complex number operations, F16/BF16 assignment operators 9 - emulated bf16/f16 Load/StoreInterleaved 10 - hwy::Warn/HWY_WARN, use instead of fprintf 11 - HWY_UNREACHABLE, HWY_VISIT_TARGETS 12 - i16 Dot, AverageRound, RoundingShiftRight/RoundingShr 13 - InterleaveEvenBlocks/InterleaveOddBlocks, MinMagnitude/MaxMagnitude 14 - masked comparisons, promote, round, GetBiasedExponent 15 - MulByPow2/MulByFloorPow2, MulRound, MulLower/MulAddLower 16 - PositiveInfOrHighestValue/NegativeInfOrLowestValue 17 - RVV groundwork for runtime dispatch, enable tuples 18 - spin wait, NanoSleep, Counter2/4 barrier, Divisor64, perf_counters 19 20 Improvements: 21 - dpbf16 WidenMulPairwiseAdd Exp2, AVX10.2 float->int, AVX3 GetExponent 22 - header-only abort.h/cc, tests runnable with Bazel8 23 - HWY_BROKEN_*: allow individual override 24 - Lanes: 'optional constexpr', AllBits1 25 - MaskedEq/Ne, NEON SumOfMulQuadAccumulate, MaskedReduceMin/Max, MulEven 26 - RVV various ops via superoptimizer 27 - SetThreadName: support more systems 28 - SVE2 SatWidenMulPairwiseAccumulate, SSE2/SSSE3 U16 Min/Max 29 - TargetName: no longer returns unknown for other arch 30 - ThreadPool autotune, avoid WakeAll 31 - topology: add NUMA node, support Windows/Apple 32 33 Fixes: 34 - avoid wraparound for -ftrapv, topology for offline CPUs/RVV 35 - warnings from -Wmissing-declarations/prototypes 36 - AdvSIMD_HPFPCvt on OSX 37 - f32->bf16 rounding: avoid unspecified built-in cast 38 - MSAN, PPC InvariantTicksPerSecond on QEMU, HWY_RCAST_ALIGNED, IsNaN 39 - vqsort for ascending order, add 8-bit test 40 41 Thanks to all contributors, especially johnplatts and eustas! 42 43 -- Jan Wassenberg <janwas@google.com> Fri, 11 Jun 2025 18:00:00 +0200 44 45 highway (1.2.0-1) UNRELEASED; urgency=medium 46 47 * Add InterleaveEven/InterleaveOdd, BitShuffle, GatherIndexNOr 48 * Add IsNegative, IfNegativeThenElseZero, IfNegativeThenZeroElse 49 * Add NEON_BF16 target, HWY_VERSION_GE/LT, HWY_EXPORT_T/HWY_DYNAMIC_DISPATCH_T 50 * Add PromoteInRangeTo/ConvertInRangeTo/DemoteInRangeTo 51 * Add Rol/Ror, RotateLeft/RotateLeftSame/RotateRightSame 52 * Add SatWidenMulPairwiseAccumulate, SatWidenMulAccumFixedPoint 53 * Add stats.h, bit_set.h, IsEitherNaN 54 * Add UI8/UI32/UI64 MulHigh, I64 MulEven/MulOdd/Mul128 55 * Add WidenMulAccumulate, MulEvenAdd, MulOddAdd 56 * contrib/bit_pack: support 32/64-bit lanes 57 * contrib/math: Add Exp2, Hypot 58 * contrib/matvec: Add MatVecAdd 59 * contrib/sort: Add VQ/HeapSelect, partial sort 60 * contrib/topology: add affinity, detect topology/cache size/CPU name 61 * Enable runtime dispatch for NEON/RVV, bazel modules, abort handler 62 * Remove DASSERT for negative Gather indices 63 * Support opting out of GUnit dependency 64 * Use SPR/ZEN4 bf16 dot product 65 66 -- Jan Wassenberg <janwas@google.com> Mon, 13 May 2024 16:00:00 +0200 67 68 highway (1.1.0-1) UNRELEASED; urgency=medium 69 70 * Add BitCastScalar, DispatchedTarget, Foreach 71 * Add Div/Mod and MaskedDiv/ModOr, SaturatedAbs, SaturatedNeg 72 * Add InterleaveWholeLower/Upper, Dup128VecFromValues 73 * Add IsInteger, IsIntegerLaneType, RemoveVolatile, RemoveCvRef 74 * Add MaskedAdd/Sub/Mul/Div/Gather/Min/Max/SatAdd/SatSubOr 75 * Add MaskFalse, IfNegativeThenNegOrUndefIfZero, PromoteEven/OddTo 76 * Add ReduceMin/Max, 8-bit reductions, f16 <-> f64 conversions 77 * Add Span, AlignedArray, matrix-vector mul 78 * Add SumsOf2/4, I8 SumsOf8, SumsOfAdjQuadAbsDiff, SumsOfShuffledQuadAbsDiff 79 * Add ThreadPool, hierarchical profiler 80 * Build: use bazel_platforms 81 * Enable clang16 Arm/PPC runtime dispatch, F16 for GCC AVX3_SPR 82 * Extend Dot to f32*bf16, FMA to integer 83 * Fix: RVV 8-bit overflow, UB in vqsort, big-endian bugs, PPC HTM 84 * Improved codegen in various ops, fp16/bf16 tests and conversions 85 * New targets: HWY_Z14, HWY_Z15 86 * Test: add foreign_arch builders, CodeQL 87 88 -- Jan Wassenberg <janwas@google.com> Sat, 17 Feb 2024 12:00:00 +0100 89 90 highway (1.0.7-1) UNRELEASED; urgency=medium 91 92 * Add LoadNOr, GatherIndexN, ScatterIndexN 93 * Add additional float<->int conversions 94 * Codegen improvements for 8-bit shift, PPC Compress/Expand 95 * Fixes for MSVC, PPC, RVV, WASM, GCC 13, GCC 8.2, i686, f16 type, QEMU 7.2 96 * Support CMake args in Debian packaging 97 98 -- Jan Wassenberg <janwas@google.com> Tue, 29 Aug 2023 19:00:00 +0200 99 100 highway (1.0.6-1) UNRELEASED; urgency=medium 101 102 * Add MaskedGatherIndex, MaskedScatterIndex, LoadN, StoreN 103 * Add SatWidenMulPairwiseAdd, SumOfMulQuadAccumulate, PromoteUpperLowerTo 104 * Add F64 for Wasm, F64 AbsDiff 105 * Add F16 support to AVX3_SPR, RVV tuple (both not yet enabled) 106 * Validate all D args in x86 function signatures 107 * License: now dual Apache2/BSD3 108 * Doc: new users, vcpkg install instructions, AVX10 plans 109 * Doc: advice on dynamic dispatch plus -march flags 110 * Build: avoid installing hwy_test if !HWY_ENABLE_TESTS 111 * Codegen: improved PPC9 Find*True, variable-length CopyBytes 112 * Fix: GCC 8.2, MSVC, ICC, PPC9, SVE, arm64 MSVC issues 113 * Fix: IfNegativeThenElse, MulFixedPoint15, Debian changelog format 114 * Tests: faster builds (split up), use release builds 115 116 -- Jan Wassenberg <janwas@google.com> Fri, 11 Aug 2023 14:00:00 +0200 117 118 highway (1.0.5-1) UNRELEASED; urgency=medium 119 120 * Add Insert/ExtractBlock, BroadcastBlock/Lane, NumBlocks 121 * Add integer Le/Ge and [Neg]MulAdd, extend DemoteTo/PromoteTo 122 * Add Leading/TrailingZeroCount, HighestSetBitIndex, ReverseBits 123 * Add MaskedLoadOr, tuple Get/Set/Create, ReduceSum, WidenMulPairwiseAdd 124 * Add [ZeroExtend]ResizeBitCast, BitwiseIfThenElse, Find[Known]LastTrue 125 * Add AESRoundInv, AESKeyGenAssist 126 * Add contrib/math Atan2/SinCos, contrib/unroller 127 * Add fp16/bf16 support (Armv8, SVE, RVV), HWY_DYNAMIC_POINTER 128 * Add OrderedTruncate2To, Per4LaneBlockShuffle, TwoTablesLookupLanes 129 * Add SlideUp/Down[Blocks/Lanes], Slide1Up/Down, ReverseLaneBytes 130 * Add SetBeforeFirst, SetAtOrBefore/AfterFirst, SetOnlyFirst 131 * Add 8-bit Reverse2/4/8, Shl/Shr, RotateRight, Reverse, Mul 132 * Add 8/16-bit DupEven/Odd, TableLookupLanes 133 * Add F64 ApproximateReciprocal[Sqrt], 32/64-bit SaturatedAdd/Sub 134 * Build: Support Bazel modules 135 * Codegen improvements 136 * Compiler: support Clang 15/16 137 * Doc: add Github pages, support policy, evaluation 138 * Doc: publish AVX-512 throttling/startup findings 139 * Release: add signing 140 * Test: add GCC to Github Actions 141 * VQSort: small N speedups: fix seeding, func ptr, 8-wide network. 142 * VQSort: add BenchAllColdSort, VQSortStatic 143 * VQSort: fix subnormal/inf/NaN, support fp16, fix KV types 144 * Workarounds: RVV VXRM, x87 excess precision, missing intrinsics 145 146 -- Jan Wassenberg <janwas@google.com> Wed, 19 Jul 2023 15:00:00 +0200 147 148 highway (1.0.4-1) UNRELEASED; urgency=medium 149 150 * Add PPC8..10, SSE2, AVX3_ZEN4, NEON_WITHOUT_AES targets 151 * Add Expand, LoadExpand, integer AbsDiff, SumsOf8AbsDiff 152 * Improved Half/Twice support, codegen for Shift*Same 153 * Support Wasm in Godbolt 154 * Faster KV128 sorting 155 * Fix armv7 build config, CMake config mode 156 * Update RVV intrinsics for 1.0-draft 157 158 -- Jan Wassenberg <janwas@google.com> Fri, 17 Mar 2023 15:00:00 +0200 159 160 highway (1.0.3-1) UNRELEASED; urgency=medium 161 162 * Add RearrangeToOddPlusEven, Xor3, 8-bit CompressStore, HWY_ASSUME 163 * Add contrib/bit_pack for 8/16-bit lanes 164 * Add WASM_EMU256 target 165 * Documentation improvements 166 * Allow opting out of C++ stdlib usage for Compiler Explorer 167 * Update for new RVV intrinsics; faster WASM min/max and extmul/q15mul 168 * Fix UB, GCC atomic 169 170 -- Jan Wassenberg <janwas@google.com> Thu, 19 Jan 2023 13:00:00 +0200 171 172 highway (1.0.2-1) UNRELEASED; urgency=medium 173 174 * Add ExclusiveNeither, FindKnownFirstTrue, Ne128 175 * Add 16-bit SumOfLanes/ReorderWidenMulAccumulate/ReorderDemote2To 176 * Faster sort for low-entropy input, improved pivot selection 177 * Add GN build system, Highway FAQ, k32v32 type to vqsort 178 * CMake: Support find_package(GTest), add rvv-inl.h, add HWY_ENABLE_TESTS 179 * Fix MIPS and C++20 build, Apple LLVM 10.3 detection, EMU128 AllTrue on RVV 180 * Fix missing exec_prefix, RVV build, warnings, libatomic linking 181 * Work around GCC 10.4 issue, disabled RDCYCLE, arm7 with vfpv3 182 * Documentation/example improvements 183 * Support static dispatch to SVE2_128 and SVE_256 184 185 -- Jan Wassenberg <janwas@google.com> Thu, 27 Oct 2022 17:00:00 +0200 186 187 highway (1.0.1-1) UNRELEASED; urgency=medium 188 189 * Add Eq128, i64 Mul, unsigned->float ConvertTo 190 * Faster sort for few unique keys, more robust pivot selection 191 * Fix: floating-point generator for sort tests, Min/MaxOfLanes for i16 192 * Fix: avoid always_inline in debug, link atomic 193 * GCC warnings: string.h, maybe-uninitialized, ignored-attributes 194 * GCC warnings: preprocessor int overflow, spurious use-after-free/overflow 195 * Doc: <=HWY_AVX3, Full32/64/128, how to use generic-inl 196 197 -- Jan Wassenberg <janwas@google.com> Tue, 23 Aug 2022 10:00:00 +0200 198 199 highway (1.0.0-1) UNRELEASED; urgency=medium 200 201 * ABI change: 64-bit target values, more room for expansion 202 * Add CompressBlocksNot, CompressNot, Lt128Upper, Min/Max128Upper, TruncateTo 203 * Add HWY_SVE2_128 target 204 * Sort speedups especially for 128-bit 205 * Documentation clarifications 206 * Faster NEON CountTrue/FindFirstTrue/AllFalse/AllTrue 207 * Improved SVE codegen 208 * Fix u16x8 ConcatEven/Odd, SSSE3 i64 Lt 209 * MSVC 2017 workarounds 210 * Support for runtime dispatch on Arm/GCC/Linux 211 212 -- Jan Wassenberg <janwas@google.com> Wed, 27 Jul 2022 10:00:00 +0200 213 214 highway (0.17.0-1) UNRELEASED; urgency=medium 215 216 * Add ExtractLane, InsertLane, IsInf, IsFinite, IsNaN 217 * Add StoreInterleaved2, LoadInterleaved2/3/4, BlendedStore, SafeFillN 218 * Add MulFixedPoint15, Or3 219 * Add Copy[If], Find[If], Generate, Replace[If] algos 220 * Add HWY_EMU128 target (replaces HWY_SCALAR) 221 * HWY_RVV is feature-complete 222 * Add HWY_ENABLE_CONTRIB build flag, HWY_NATIVE_FMA, HWY_WANT_SSSE3/SSE4 macros 223 * Extend ConcatOdd/Even and StoreInterleaved* to all types 224 * Allow CappedTag<T, nonPowerOfTwo> 225 * Sort speedups: 2x for AVX2, 1.09x for AVX3; avoid x86 malloc 226 * Expand documentation 227 * Fix RDTSCP crash in nanobenchmark 228 * Fix XCR0 check (was ignoring AVX3 on ICL) 229 * Support Arm/RISC-V timers 230 231 -- Jan Wassenberg <janwas@google.com> Fri, 20 May 2022 10:00:00 +0200 232 233 highway (0.16.0-1) UNRELEASED; urgency=medium 234 235 * Add contrib/sort (vectorized quicksort) 236 * Add IfNegativeThenElse, IfVecThenElse 237 * Add Reverse2,4,8, ReverseBlocks, DupEven/Odd, AESLastRound 238 * Add OrAnd, Min128, Max128, Lt128, SumsOf8 239 * Support capped/partial vectors on RVV/SVE, int64 in WASM 240 * Support SVE2, shared library build 241 * Remove deprecated overloads without the required d arg (UpperHalf etc.) 242 243 -- Jan Wassenberg <janwas@google.com> Thu, 03 Feb 2022 11:00:00 +0100 244 245 highway (0.15.0-1) UNRELEASED; urgency=medium 246 247 * New ops: CompressBlendedStore, ConcatOdd/Even, IndicesFromVec 248 * New ops: OddEvenBlocks, SwapAdjacentBlocks, Reverse, RotateRight 249 * Add bf16, unsigned comparisons, more lane types for Reverse/TableLookupLanes 250 * Contrib: add sort(ing network) and dot(product) 251 * Targets: update RVV for LLVM, add experimental WASM2 252 * Separate library hwy_test for test utils 253 * Add non-macro Simd<> aliases 254 * Fixes: const V& for GCC, AVX3 BZHI, POPCNT with AVX on MSVC, avoid %zu 255 256 -- Jan Wassenberg <janwas@google.com> Wed, 10 Nov 2021 10:00:00 +0100 257 258 highway (0.14.2-1) UNRELEASED; urgency=medium 259 260 * Add MaskedLoad 261 * Fix non-glibc PPC, Windows GCC, MSVC 19.14 262 * Opt-in for -Werror; separate design_philosophy.md 263 264 -- Jan Wassenberg <janwas@google.com> Tue, 24 Aug 2021 15:00:00 +0200 265 266 highway (0.14.1-1) UNRELEASED; urgency=medium 267 268 * Add LoadMaskBits, CompressBits[Store] 269 * Fix CPU feature check (AES/F16C) and warnings 270 * Improved DASSERT - disabled in optimized builds 271 272 -- Jan Wassenberg <janwas@google.com> Tue, 17 Aug 2021 14:00:00 +0200 273 274 highway (0.14.0-1) UNRELEASED; urgency=medium 275 276 * Add SVE, S-SSE3, AVX3_DL targets 277 * Support partial vectors in all ops 278 * Add PopulationCount, FindFirstTrue, Ne, TableLookupBytesOr0 279 * Add AESRound, CLMul, MulOdd, HWY_CAP_FLOAT16 280 281 -- Jan Wassenberg <janwas@google.com> Thu, 29 Jul 2021 15:00:00 +0200 282 283 highway (0.12.2-1) UNRELEASED; urgency=medium 284 285 * fix scalar-only test and Windows macro conflict with Load/StoreFence 286 * replace deprecated wasm intrinsics 287 288 -- Jan Wassenberg <janwas@google.com> Mon, 31 May 2021 16:00:00 +0200 289 290 highway (0.12.1-1) UNRELEASED; urgency=medium 291 292 * doc updates, ARM GCC support, fix s390/ppc, complete partial vectors 293 * fix warnings, faster ARM div/sqrt, separate hwy_contrib library 294 * add Abs(i64)/FirstN/Pause, enable AVX2 on MSVC 295 296 -- Jan Wassenberg <janwas@google.com> Wed, 19 May 2021 15:00:00 +0200 297 298 highway (0.12.0-1) UNRELEASED; urgency=medium 299 300 * Add Shift*8, Compress16, emulated Scatter/Gather, StoreInterleaved3/4 301 * Remove deprecated HWY_*_LANES, deprecate HWY_GATHER_LANES 302 * Proper IEEE rounding, reduce libstdc++ usage, inlined math 303 304 -- Jan Wassenberg <janwas@google.com> Thu, 15 Apr 2021 20:00:00 +0200 305 306 highway (0.11.1-1) UNRELEASED; urgency=medium 307 308 * Fix clang7 asan error, finish f16 conversions and add test 309 310 -- Jan Wassenberg <janwas@google.com> Thu, 25 Feb 2021 16:00:00 +0200 311 312 highway (0.11.0-1) UNRELEASED; urgency=medium 313 314 * Add RVV+mask logical ops, allow Shl/ShiftLeftSame on all targets, more math 315 316 -- Jan Wassenberg <janwas@google.com> Thu, 18 Feb 2021 20:00:00 +0200 317 318 highway (0.7.0-1) UNRELEASED; urgency=medium 319 320 * Added API stability notice, Compress[Store], contrib/, SignBit, CopySign 321 322 -- Jan Wassenberg <janwas@google.com> Tue, 5 Jan 2021 17:00:00 +0200 323 324 highway (0.1-1) UNRELEASED; urgency=medium 325 326 * Initial debian package. 327 328 -- Alex Deymo <deymo@google.com> Mon, 19 Oct 2020 16:48:07 +0200