This library is a very fast UTF-8 validator using AVX2 / SSE4 instructions. As far as I am aware, it is the fastest validator in the world on the CPUs that support these instructions (… and not AVX – 512). Using AVX2, it can validate random UTF-8 text as fast as. (cycles / byte, and random ASCII text at.) cycles / byte. For UTF-8, this is roughly 1.5-1.7x faster than thefastvalidate-utf-8library.
This repository contains the library (one C file), a build script for themake.pybuild system, and a Lua test script (which requires LuaJIT due to use of theffi
module).
A detailed description of the algorithm can be found inz_validate.c
. This algorithm should map fairly nicely to AVX – 512, and should in fact be a bit faster than 2x the speed of AVX2 since a few instructions can be saved. But I don’t have an AVX – 512 machine, so I haven’t tried it yet.
Validator | (K UTF-8) | (K ASCII) | 16 M UTF-8 | 16 M ASCII |
---|---|---|---|---|
validate_utf8_fast_avx |
0. 410 | 0. 410 | 0. 496 | 0. 429 |
validate_utf8_fast_avx_asciipath |
0. 436 | 0. 074 | 0. 457 | 0. 156 |
z_validate_utf8_avx2 |
0. 264 | 0. 079 | 0. 290 | 0. 160 |
z_validate_utf8_sse4 |
0. 568 | 0. 163 | 0. 596 | 0. 202 |
GIPHY App Key not set. Please check settings