# Hash Functions for GPU Rendering

Back in 2013, I wrote a somewhat popular article about pseudorandom number generation on the GPU. In the eight years since, a number of new PRNGs and hash functions have been developed; and a few months ago, an excellent paper on the topic appeared in JCGT: Hash Functions for GPU Rendering, by Mark Jarzynski and Marc Olano. I thought it was time to update my former post in light of this paper’s findings.

Jarzynski and Olano’s paper compares GPU implementations of a large number of different hash functions along dual axes of performance (measured by time to render a quad evaluating the hash at each pixel) and statistical quality (quantified by the count of failures of TESTU01 “Big Crush” tests). Naturally, there is quite a spread of results in both performance and quality. Jarzynski and Olano then identify the few hash functions that lie along the Pareto frontier—meaning they are the best choices along the whole spectrum of performance/quality trade-offs.

When choosing a hash function, we might sometimes prioritize performance, and other times might prefer to sacrifice performance in favor of higher quality (real-time versus offline applications, for example). The Pareto frontier provides the set of optimal choices for any point along that balance—ranging from LCGs at the extreme performance-oriented end, to some quite expensive but very high-quality hashes at the other end.

In my 2013 article, I recommended the “Wang hash” as a general-purpose 32-bit-to-32-bit integer hash
function. The Wang hash was among those tested by Jarzynski and Olano, but unfortunately it did not
lie along the Pareto frontier—not even close! The solution that dominates it—and one of the best
balanced choices between performance and quality overall—is **PCG**. In particular, the 32-bit PCG
hash used by Jarzynski and Olano goes as follows:

```
uint pcg_hash(uint input)
{
uint state = input * 747796405u + 2891336453u;
uint word = ((state >> ((state >> 28u) + 4u)) ^ state) * 277803737u;
return (word >> 22u) ^ word;
}
```

This has slightly better performance and *much* better statistical quality than the Wang hash. It’s
fast enough to be useful for real-time, while also being high-quality enough for almost any graphics
use-case (if you’re not using precomputed blue noise, or low-discrepancy sequences). It should
probably be your default GPU hash function.

Just to prove it works, here’s the bit pattern generated by a few thousand invocations of the above function on consecutive inputs:

Yep, looks random! 👍

## PCG Variants

Incidentally, you might notice that the PCG function posted above doesn’t match that found in other sources, such as the minimal C implementation on the PCG website. This is because “PCG” isn’t a single function, but more of a recipe for constructing PRNG functions. It works by starting with an LCG, and then applying a permutation function to mix around the bits and improve the quality of the results. There many possible permutation functions, and O’Neill’s original PCG paper provides a set of building blocks that can be combined in various ways to get generators with different characteristics. In particular, the PCG used by Jarzynski and Olano corresponds to the 32-bit “RXS-M-XS” variant described in §6.3.4 of O’Neill. (See also the list of variants on Wikipedia).

## Hash or PRNG?

One of the main points I discussed in my 2013 article was the distinction between PRNGs and hash
functions: the former are designed for a good distribution *within* a single stateful stream, but do
not necessarily provide good distribution *across* streams with consecutive seeds; hash functions are
stateless and designed to give a good distribution even with consecutive (or otherwise highly
correlated) inputs.

PCG is actually designed to be a PRNG, *not* a hash function, so it may surprise you to see it being
used as a hash here. What gives? Well, apparently PCG is just so good that it works well as a hash
function too! ¯\_(ツ)_/¯

It’s worth noting that PCG *does* support more or less efficient jump-ahead, owing to the LCG at its
core; it’s possible to advance an LCG by $n$ steps in only $O(\log n)$ work using
modular exponentiation.
However, that is not what Jarzynski and Olano’s code does: it’s not jumping ahead to the $n$th
value in a single PCG sequence, but essentially just taking the first value from each of $n$
sequences with consecutive initial states. The fact that this works at all is somewhat surprising,
and a testament to the power of permutation functions.

In my previous article, I also recommended that if you need multiple random values per pixel, you could start with a hash function and then iterate either LCG or Xorshift using the hash output as an initial state. You can still do that, using PCG as the initial hash—but it might be just as fast to iterate PCG. The interesting thing about PCG’s design is that only the LCG portion of it actually carries data dependencies from one iteration to the next, and LCGs are super fast. The permutation parts are independent of each other and can be pipelined to exploit instruction-level parallelism when doing multiple iterations.

For completeness, the “PRNG form” of the above PCG variant looks like:

```
uint rng_state;
uint rand_pcg()
{
uint state = rng_state;
rng_state = rng_state * 747796405u + 2891336453u;
uint word = ((state >> ((state >> 28u) + 4u)) ^ state) * 277803737u;
return (word >> 22u) ^ word;
}
```

That’s about it! Be sure to check out Jarzynski and Olano’s paper for some more tidbits, including a discussion of hashes with multi-dimensional inputs and outputs.