The Meow hash is a high-speed hash function named after the character Meow in
Meow the Infinite. We developed the hash function at
Molly Rocket for use in the asset pipeline of
1935.
Because we have to process hundreds of gigabytes of art assets to build game packages, we wanted a fast, non-cryptographic hash for use in change detection and deduplication. We had been using a cryptographic hash (
SHA-1), but it was unnecessarily slowing things down.
To our surprise, we found a lack of published, well-optimized, large-data hash functions. Most hash work seems to focus on small input sizes (for things like dictionary lookup) or on cryptographic quality. We wanted the fastest possible hash that would be collision-free in practice (like SHA-1 was), and we didn’t need any cryptograhic security.
We ended up creating Meow to fill this niche.
It’s the fastest hash function we know of, and we have benchmarked all the ones we could find. On modern Intel x64 CPUs, it hashes 16 bytes
per cycle single-threaded. This means in cache it can hash at a rate of 64
gigabytes per second on a 4.2gHz machine. Out of cache, it hashes at whatever speed your main memory bus can provide to a single core, since that is usually the limiting factor on modern x64 CPUs.
It has also now been tuned to be the fastest hash on
small inputs, too. Despite the fact that it is a full 128-bit hash, it
still outperforms “fast” 64-bit hashes across all input sizes. So whether you have data buffers of a few bytes or a few gigabytes, Meow is the fastest hash available.
The Meow hash is not designed for cryptography and therefore we make no claims about its security. Assume it is
completely insecure.
That said, it is extremely robust for its designed purpose. It cleanly passes every test in
smhasher and has produced no collisions on our large datasets as yet. Also, unlike other “fast hashes”, it is not a small 32-bit or 64-bit hash. It produces a full 128 bits of usable hash every time.
The Meow hash also passes
smhasher cleanly at every truncation level down to 32 bits, so you can safely truncate a Meow hash value to the size you want to store. Both 32 bit and 64 bit truncations have been tested themselves and separate hashes, and pass smhasher cleanly.
It is a very basic license and should make it easy to use.
The
source code includes a number of files, but only
meow_hash.h is actually necessary. The rest of the files may or may not be necessary depending on how you want to use Meow. Currently, the distribution supports x64, ARM, and vanilla C implementations.
To use the Meow hash in a program, just
#include the headers then call it:
Note that there are two headers to include:
meow_intrinsics.h for the platform support, and
meow_hash.h for the function itself. This is intentional — you
do not need to include
meow_intrinsics.h if you don’t want to! Instead, you can choose to make your own set of
#define’s before including
meow_hash.h, so that you can provide the foundation pieces for Meow to use. This can be very useful if you are trying to integrate Meow into a codebase that already has its own conventions for 128-bit SIMD.
The Meow hash was designed for very specific scenarios. There are two very good reasons
not to use it:
• | Never use the Meow hash for anything involving security unless you are a cryptographer and know exactly what you are doing. The Meow hash was not designed for cryptography and has no cryptographic security proofs or analysis whatsoever. |
• | Don’t base a system on the Meow hash if your primary deployment target uses a non-AES-capable CPU. Modern ARM and x64 cores usually have the AES instructions necessary to implement a Meow hash at high speed, but other CPUs might not. CPUs that don’t have AES instructions will be signficantly slower at running the Meow hash, so you do not want to use it in deployments that expect to use primarily non-AES-enabled CPU cores. |
Building the entire Meow suite
If you download the entire Meow repository, you can build it all using simple batch files on both Windows and Linux. From the root of the repository, on Windows:
Both will build several executables for testing. On Linux, there will only be a CLANG build, and it will be placed in a
build directory. On Windows, it will attempt to build with cl into
build_msvc and CLANG into
build_clang, depending on which compilers you have installed.
The executable
meow_example.exe (built from
meow_example.cpp) is an example of the basic ways you might use a Meow hash. You can run it from the command line with no arguments to have it hash a random buffer, with one argument to print the hash of a file, and with two arguments to have it compare the hashes of two files.
meow_example.cpp is designed to be very simple, and should provide an easy way to see how Meow hashes are used in practice.
The executable
meow_test.exe will test the Meow hash build for obvious defects. It’s not a replacement for serious testing (like
smhasher), but it will give you a quick sanity check that you haven’t done something horribly wrong with the build.
Hash collisions are subject to the
birthday paradox, which means that once you process enough hashes, collisions are
expected for various hash truncation levels. You should expect to see Meow32 collisions once you reach 16 bits worth of files (32-64 thousand files), Meow64 collisions once you get 32 bits worth of files (2-4 billion files), and Meow128 collisions once… well… you really shouldn’t be able to amass enough files for that! So if you find a collision with Meow128, please report it to us on the
GitHub! We may need to look into it in more detail.
You can also use Meow hash via a streaming API and a vanilla C API for compatibility. These are defined in
meow_more.h. To use the streaming API:
Generally speaking, if you are trying to use the C version for compatibility purposes, you will need to do some kind of CPU detection appropriate to your platform and then pick whether to call
MeowHash_Accelerated or
MeowHash_C. The proper way to detect this varies from platform to platform, and is outside the scope of Meow’s implementation, so it is expected that you will implement that part of the code yourself during the startup of your application. An example implementation using try/catch is given in
meow_more_example.cpp.
The Meow hash is still in alpha form. We would love to get feedback on people who can test it on their datasets, especially if you find any collisions or other problems with the hash quality. Please
post issues on the GitHub as you find them!