The Meow hash is a high-speed hash function named after the character Meow in Meow the Infinite
. We developed the hash function at Molly Rocket
for use in the asset pipeline of 1935
Because we have to process hundreds of gigabytes of art assets to build game packages, we wanted a fast, non-cryptographic hash for use in change detection and deduplication. We had been using a cryptographic hash (SHA-1
), but it was unnecessarily slowing things down.
To our surprise, we found a lack of published, well-optimized, large-data hash functions. Most hash work seems to focus on small input sizes (for things like dictionary lookup) or on cryptographic quality. We wanted the fastest possible hash that would be collision-free in practice (like SHA-1 was), and we didn’t need any cryptograhic security.
We ended up creating Meow to fill this niche.
It’s the fastest hash function we know of, and we have benchmarked all the ones we could find. On modern Intel x64 CPUs, it hashes 16 bytes per cycle
single-threaded. This means in cache it can hash at a rate of 64 gigabytes
per second on a 4.2gHz machine. Out of cache, it hashes at whatever speed your main memory bus can provide to a single core, since that is usually the limiting factor on modern x64 CPUs.
It has also now been tuned to be the fastest hash on small
inputs, too. Despite the fact that it is a full 128-bit hash, it still outperforms
“fast” 64-bit hashes across all input sizes. So whether you have data buffers of a few bytes or a few gigabytes, Meow is the fastest hash available.
The Meow hash is not designed for cryptography and therefore we make no claims about its security. Assume it is completely insecure
That said, it is extremely robust for its designed purpose. It cleanly passes every test in smhasher
and has produced no collisions on our large datasets as yet. Also, unlike other “fast hashes”, it is not a small 32-bit or 64-bit hash. It produces a full 128 bits of usable hash every time.
The Meow hash also passes smhasher
cleanly at every truncation level down to 32 bits, so you can safely truncate a Meow hash value to the size you want to store. Both 32 bit and 64 bit truncations have been tested themselves and separate hashes, and pass smhasher cleanly.
It is a very basic license and should make it easy to use.
The source code
includes a number of files, but only meow_hash.h
is actually necessary. The rest of the files may or may not be necessary depending on how you want to use Meow. Currently, the distribution supports x64, ARM, and vanilla C implementations.
To use the Meow hash in a program, just #include
the headers then call it:
Note that there are two headers to include: meow_intrinsics.h
for the platform support, and meow_hash.h
for the function itself. This is intentional — you do not
need to include meow_intrinsics.h
if you don’t want to! Instead, you can choose to make your own set of #define
’s before including meow_hash.h
, so that you can provide the foundation pieces for Meow to use. This can be very useful if you are trying to integrate Meow into a codebase that already has its own conventions for 128-bit SIMD.
The Meow hash was designed for very specific scenarios. There are two very good reasons not
to use it:
Never use the Meow hash for anything involving security unless you are a cryptographer and know exactly what you are doing. The Meow hash was not designed for cryptography and has no cryptographic security proofs or analysis whatsoever.
Don’t base a system on the Meow hash if your primary deployment target uses a non-AES-capable CPU. Modern ARM and x64 cores usually have the AES instructions necessary to implement a Meow hash at high speed, but other CPUs might not. CPUs that don’t have AES instructions will be signficantly slower at running the Meow hash, so you do not want to use it in deployments that expect to use primarily non-AES-enabled CPU cores.
Building the entire Meow suite
If you download the entire Meow repository, you can build it all using simple batch files on both Windows and Linux. From the root of the repository, on Windows:
Both will build several executables for testing. On Linux, there will only be a CLANG build, and it will be placed in a build
directory. On Windows, it will attempt to build with cl into build_msvc
and CLANG into build_clang
, depending on which compilers you have installed.
The executable meow_example.exe
(built from meow_example.cpp
) is an example of the basic ways you might use a Meow hash. You can run it from the command line with no arguments to have it hash a random buffer, with one argument to print the hash of a file, and with two arguments to have it compare the hashes of two files. meow_example.cpp
is designed to be very simple, and should provide an easy way to see how Meow hashes are used in practice.
The executable meow_test.exe
will test the Meow hash build for obvious defects. It’s not a replacement for serious testing (like smhasher
), but it will give you a quick sanity check that you haven’t done something horribly wrong with the build.
Hash collisions are subject to the birthday paradox
, which means that once you process enough hashes, collisions are expected
for various hash truncation levels. You should expect to see Meow32 collisions once you reach 16 bits worth of files (32-64 thousand files), Meow64 collisions once you get 32 bits worth of files (2-4 billion files), and Meow128 collisions once… well… you really shouldn’t be able to amass enough files for that! So if you find a collision with Meow128, please report it to us on the GitHub
! We may need to look into it in more detail.
You can also use Meow hash via a streaming API and a vanilla C API for compatibility. These are defined in meow_more.h
. To use the streaming API:
Generally speaking, if you are trying to use the C version for compatibility purposes, you will need to do some kind of CPU detection appropriate to your platform and then pick whether to call MeowHash_Accelerated
. The proper way to detect this varies from platform to platform, and is outside the scope of Meow’s implementation, so it is expected that you will implement that part of the code yourself during the startup of your application. An example implementation using try/catch is given in meow_more_example.cpp
The Meow hash is still in alpha form. We would love to get feedback on people who can test it on their datasets, especially if you find any collisions or other problems with the hash quality. Please post issues on the GitHub
as you find them!