xxhash: improve performance by using other memory access method
xxhash defaults to memcpy() for unaligned memory access, but it is terribly slow with musl. We can get 10x to 30x speed-up with the `__packed` statement.
With the default method,
./xxhsum -b
./xxhsum 0.6.5 (64-bits little endian), by Yann Collet
Sample of 100 KB…
XXH32 : 102400 ->1061 it/s ( 103.6 MB/s)
XXH32 unaligned : 102400 ->1067 it/s ( 104.2 MB/s)
XXH64 : 102400 ->3233 it/s ( 315.7 MB/s)
XXH64 unaligned : 102400 ->3242 it/s ( 316.6 MB/s)
Now compile with `-DXXH_FORCE_MEMORY_ACCESS=1`
./xxhsum -b
./xxhsum 0.6.5 (64-bits little endian), by Yann Collet
Sample of 100 KB…
XXH32 : 102400 ->29917 it/s ( 2921.6 MB/s)
XXH32 unaligned : 102400 ->29318 it/s ( 2863.1 MB/s)
XXH64 : 102400 ->49090 it/s ( 4793.9 MB/s)
XXH64 unaligned : 102400 ->48262 it/s ( 4713.1 MB/s)
(from redmine: issue id 9771, created on 2018-12-15)