This library is a very fast UTF-8 validator using AVX2 / SSE4 instructions. As far as I am aware, it is the fastest validator in the world on the CPUs that support these instructions (… and not AVX – 512). Using AVX2, it can validate random UTF-8 text as fast as. (cycles / byte, and random ASCII text at.) cycles / byte. For UTF-8, this is roughly 1.5-1.7x faster than thefastvalidate-utf-8library.
This repository contains the library (one C file), a build script for themake.pybuild system, and a Lua test script (which requires LuaJIT due to use of the
A detailed description of the algorithm can be found in
z_validate.c. This algorithm should map fairly nicely to AVX – 512, and should in fact be a bit faster than 2x the speed of AVX2 since a few instructions can be saved. But I don’t have an AVX – 512 machine, so I haven’t tried it yet.
|Validator||(K UTF-8)||(K ASCII)||16 M UTF-8||16 M ASCII|
||0. 410||0. 410||0. 496||0. 429|
||0. 436||0. 074||0. 457||0. 156|
||0. 264||0. 079||0. 290||0. 160|
||0. 568||0. 163||0. 596||0. 202|