Nathan Reed’s coding blog
https://www.reedbeta.com/
Latest posts on Nathan Reed’s coding blogen-usWed, 06 Apr 2022 16:53:42 +0000Texture Gathers and Coordinate Precision
https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/
https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/Nathan ReedSat, 15 Jan 2022 08:21:17 -0800https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#commentsGraphicsGPU<p>A few years ago I came across an interesting problem. I was trying to implement some custom texture
filtering logic in a pixel shader. It was for a shadow map, and I wanted to experiment with filters
beyond the usual hardware bilinear.</p>
<p>I went about it by using texture gathers to retrieve a neighborhood of texels, then
performing my own filtering math in the shader. I used <code>frac</code> on the scaled texture coordinates to
figure out where in the texel I was, emulating the logic the GPU texture unit would have used to
calculate weights for bilinear filtering.</p>
<p>To my surprise, I noticed a strange artifact in the resulting image when I got the camera close to a
surface. A grid of flickery, stipply lines appeared, delineating the texels in the soft edges of the
shadows—but not in areas that were fully shadowed or fully lit. What was going on?</p>
<!--more-->
<div class="toc">
<ul>
<li><a href="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#into-the-texture-verse">Into the Texture-Verse</a></li>
<li><a href="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#precision-limited-edition">Precision, Limited Edition</a></li>
<li><a href="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#eight-is-a-magic-number">Eight is a Magic Number</a></li>
<li><a href="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#interlude-nearest-filtering">Interlude: Nearest Filtering</a></li>
<li><a href="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#conclusion">Conclusion</a></li>
</ul>
</div>
<figure class="not-too-wide" alt="Artifacts in shadow due to gather mismatch" title="Artifacts in shadow due to gather mismatch" >
<a href="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/shadow-artifact.jpg"><img src="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/shadow-artifact.jpg"/></a> <figcaption><p>Dramatic reenactment of the artifact that started me on this investigation.</p></figcaption>
</figure>
<p>After some head-scratching and experimenting, I understood a little more about the source of these
errors. In the affected pixels, there was a mismatch between the texels returned by
the gather and the texels that the shader <em>thought</em> it was working with.</p>
<p>You see, the objective of a gather operation is to retrieve the set of four texels that would be
used for bilinear filtering, if that’s what we were doing. You give it a UV position, and it finds
the 2×2 quad of texels whose centers surround that point, and returns all four of them in a vector
(one channel at a time).</p>
<p>As the UV position moves through the texture, when it crosses the line between texel centers, the
gather will switch to returning the next set of four texels.</p>
<p><img alt="Diagram of texels returned by a gather operation" class="not-too-wide only-light-theme" src="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/gather.png" title="Diagram of texels returned by a gather operation" />
<img alt="Diagram of texels returned by a gather operation" class="not-too-wide only-dark-theme" src="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/gather-dark.png" title="Diagram of texels returned by a gather operation" /></p>
<p>In this diagram, the large labeled squares are texels. Whenever the input UV position is within
the solid blue box, the gather returns texels ABCD. If the input point moves
to the right and crosses into the dotted blue box, then the gather will suddenly start returning
BEDF instead. It’s a step function—a discontinuity.</p>
<p>Meanwhile, in my pixel shader I’m calculating weights for combining these texels according to some
filter. To do that, I need to know where I am within the current gather quad. The expression for
this is:</p>
<div class="codehilite"><pre><span></span><code><span class="kt">float2</span><span class="w"> </span><span class="n">texelFrac</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">frac</span><span class="p">(</span><span class="n">uv</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">textureSize</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">0.5</span><span class="p">);</span><span class="w"></span>
</code></pre></div>
<p>(The <code>- 0.5</code> here is to make coordinates relative to texel centers instead of texel edges.)</p>
<p>This <code>frac</code> is <em>supposed</em> to wrap around from 1 back to 0 at the exact same place where the gather switches
to the next set of texels. The <code>frac</code> has a discontinuity, and it needs to match <em>exactly</em> with the
discontinuity in the gather result, for the filter calculation to be consistent.</p>
<p>But in my shader, they didn’t match. As I discovered, there was a region—a very small
region, but large enough to be visible—where the gather switched to the next set of texels
<em>before</em> the <code>frac</code> wrapped around to 0. Then, the shader blithely made its weight calculations for
the wrong set of texels, with ugly results.</p>
<p><img alt="Texel squares according to frac (blue) and gather (yellow)" class="not-too-wide only-light-theme" src="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/gather-and-frac.png" title="Texel squares according to frac (blue) and gather (yellow)" />
<img alt="Texel squares according to frac (blue) and gather (yellow)" class="not-too-wide only-dark-theme" src="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/gather-and-frac-dark.png" title="Texel squares according to frac (blue) and gather (yellow)" /></p>
<p>This diagram is not to scale—the actual mismatch is much smaller than depicted here—but it
illustrates what was going on. It was as if the texel squares as judged by the gather
were the yellow squares, ever so slightly offset from the blue ones that I got by calculating
directly in the shader. Those flickery lines in the shadow will make their entrance whenever some
pixels happen to fall into the tiny slivers of space between these two conflicting accounts of
“where the texel grid is”.</p>
<p>Now on the one hand, this suggests a simple fix. We can add a small offset to our calculation:</p>
<div class="codehilite"><pre><span></span><code><span class="k">const</span><span class="w"> </span><span class="kt">float</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="cm">/* TBD */</span><span class="p">;</span><span class="w"></span>
<span class="kt">float2</span><span class="w"> </span><span class="n">texelFrac</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">frac</span><span class="p">(</span><span class="n">uv</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">textureSize</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="o">-</span><span class="mf">0.5</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">offset</span><span class="p">));</span><span class="w"></span>
</code></pre></div>
<p>Then we can empirically hand-tweak the value of <code>offset</code>, and see if we can find a value that makes
the artifact go away.</p>
<p>On the other hand, we’d really like to understand why this mismatch exists in the first place. And
as it turns out, once we understand it properly, we’ll be able to deduce the exact, correct value
for <code>offset</code>—no hand-tweaking necessary.</p>
<h2 id="into-the-texture-verse"><a href="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#into-the-texture-verse" title="Permalink to this section">Into the Texture-Verse</a></h2>
<p>Texture gathers and samples are performed by a GPU’s “texture units”—fixed-function hardware
blocks that shaders call out to. From a shader author’s
point of view, texture units are largely a black box: put UVs in, get filtered results back.
But to address our questions about the behavior of gathers, we’ll need to dig down a bit into
what goes on inside that black box.</p>
<p>We won’t (and can’t) go all the way down to the exact hardware architecture, as those
details are proprietary, and GPU vendors don’t share a lot about them. Fortunately, we won’t need to,
as we can get a general <em>logical</em> picture of what’s happening on the basis of formal API specs,
which all the vendors’ texture units need to comply with.</p>
<p>In particular, we can look at the
<a href="https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm">Direct3D functional spec</a>
(written for D3D11, but applies to D3D12 as well),
and the <a href="https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html">Vulkan spec</a>.
We could also look at OpenGL, but we won’t bother, as Vulkan generally specifies GPU
behavior the same or more tightly than OpenGL.</p>
<p>Let’s start with Direct3D. What does it have to say about how texture sampling works?</p>
<p>Quite a bit—that’s
the topic of a lengthy section, <a href="https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#7.18%20Texture%20Sampling">§7.18 Texture Sampling</a>.
There are numerous steps described for the sampling pipeline, including range reduction, texel
addressing modes, mipmap selection and anisotropy, and filtering. Let’s focus in on how the texels
to sample are determined in the case of (bi)linear filtering:</p>
<blockquote>
<p class="attribution"><a href="https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#7.18.8%20Linear%20Sample%20Addressing">D3D §7.18.8 Linear Sample Addressing</a></p>
<p>…Linear sampling in 1D selects the nearest two texels to the sample location and weights the texels
based on the proximity of the sample location to them.</p>
<ul>
<li>Given a 1D texture coordinate in normalized space U, assumed to be any float32 value.</li>
<li>U is scaled by the Texture1D size, and 0.5f is subtracted. Call this scaledU.</li>
<li>scaledU is converted to at least 16.8 Fixed Point. Call this fxpScaledU.</li>
<li>The integer part of fxpScaledU is the chosen left texel. Call this tFloorU. Note that the
conversion to Fixed Point basically accomplished: tFloorU = floor(scaledU).</li>
<li>
<p>The right texel, tCeilU is simply tFloorU + 1.</p>
<p>…</p>
</li>
</ul>
<p>The procedure described above applies to linear sampling of a given miplevel of a Texture2D as well…</p>
</blockquote>
<p>OK, here’s something interesting: “<strong>scaledU is converted to at least 16.8 Fixed Point.</strong>” What’s that
about? Why would we want the texture sample coordinates to be in fixed-point, rather than staying in
the usual 32-bit floating-point?</p>
<p>One reason is uniformity of precision. Another section of the D3D spec explains:</p>
<blockquote>
<p class="attribution"><a href="https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#3.2.4%20Fixed%20Point%20Integers">D3D §3.2.4 Fixed Point Integers</a></p>
<p>Fixed point integer representations are used in a couple of places in D3D11…</p>
<ul>
<li>Texture coordinates for sampling operations are snapped to fixed point (after being scaled by
texture size), to uniformly distribute precision across texture space, in choosing filter tap
locations/weights. Weight values are converted back to floating point before actual filtering
arithmetic is performed.</li>
</ul>
</blockquote>
<p>As you may know, floating-point values are designed to have finer precision when the value is closer
to 0. That means texture coordinates would be more precise near the origin of UV space, and less
elsewhere. However, image-space operations such as filtering should behave identically no matter their
position within the image. Fixed-point formats have the same precision everywhere, so they are
well-suited for this.</p>
<figure class="not-too-wide only-light-theme" alt="Fixed-point texture coordinate grid (3 subpixel bits)" title="Fixed-point texture coordinate grid (3 subpixel bits)" >
<img src="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/fixed-point.png"/> <figcaption><p>Illustration of fixed-point texture coordinates, if there were only 3 subpixel bits (2<sup>3</sup> = 8 subdivisions). Each dot is a possible fixed-point value. Two adjacent bilinear/gather footprints are highlighted in yellow and cyan.</p></figcaption>
</figure>
<figure class="not-too-wide only-dark-theme" alt="Fixed-point texture coordinate grid (3 subpixel bits)" title="Fixed-point texture coordinate grid (3 subpixel bits)" >
<img src="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/fixed-point-dark.png"/> <figcaption><p>Illustration of fixed-point texture coordinates, if there were only 3 subpixel bits (2<sup>3</sup> = 8 subdivisions). Each dot is a possible fixed-point value. Two adjacent bilinear/gather footprints are highlighted in yellow and cyan.</p></figcaption>
</figure>
<p>(Incidentally, you might wonder: don’t we already have non-uniform precision in the original float32
coordinates that the shader passed into the texture unit? Yes—but given current API limits on
texture sizes, the 24-bit float mantissa gives precision equal or better than 16.8 fixed-point,
throughout at least the [0,1]² UV rectangle. You can still lose too much precision if you work with
too-large UV values in float32 format, though.)</p>
<p>Another possible reason for using fixed-point in texture units is just that integer ALUs are
smaller and cheaper than floating-point ones. But there are a lot of other operations in
the texture pipeline still done in full float32 format, so this likely isn’t a major design concern.</p>
<h2 id="precision-limited-edition"><a href="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#precision-limited-edition" title="Permalink to this section">Precision, Limited Edition</a></h2>
<p>At this point, we can surmise that our mysterious gather discrepancy may have something to do with
coordinates being converted to “at least 16.8 fixed point”, per the D3D spec.</p>
<p>These are the scaled texel coordinates, so the integer part of the value (the 16 bits in
front of the radix point) determines which texels we’re looking at, and then there are at least 8
more bits in the fractional part, specifying where we are within the texel.</p>
<p>The minimum 8 bits of <em>sub-texel</em> precision is also re-stated in various other locations in the spec,
such as:</p>
<blockquote>
<p class="attribution"><a href="https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#7.18.16.1%20Texture%20Addressing%20and%20LOD%20Precision">D3D §7.18.16.1 Texture Addressing and LOD Precision</a></p>
<p>The amount of subtexel precision required (after scaling texture coordinates by texture size) is
at least 8-bits of fractional precision (2<sup>8</sup> subdivisions).</p>
</blockquote>
<p>The D3D spec text is also clear that conversion to fixed-point occurs <em>before</em> taking the
integer part of the coordinate to determine which texels are filtered.</p>
<p>But how does this end up inducing a tiny offset to the locations of texel squares, when we compare
the 32-bit float inputs to the fixed-point versions?</p>
<p>There’s one more ingredient we need to look at it, which is <em>how</em> the conversion to fixed-point is
accomplished. Specifically: how does it do rounding? The 16.8 fixed-point has coarser precision than
the input floats in most cases, so floats will need to be snapped to one of the available 16.8 values.</p>
<p>Back to our best friend, the D3D spec, which gives detailed rules about the various numeric formats,
the arithmetic rules they need to satisfy, and the processes for conversion amongst them. Regarding
conversion of floats to fixed-point:</p>
<blockquote>
<p class="attribution"><a href="https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#3.2.4.1%20FLOAT%20-%3E%20Fixed%20Point%20Integer">D3D §3.2.4.1 FLOAT -> Fixed Point Integer</a></p>
<p>For D3D11 implementations are permitted 0.6f ULP tolerance in the integer result vs. the
infinitely precise value n*2^f after the last step above.</p>
<p>The diagram below depicts the ideal/reference float to fixed conversion (including round-to-nearest-even),
yielding 1/2 ULP accuracy to an infinitely precise result, which is more accurate than required by
the tolerance defined above. Future D3D versions will require exact conversion like this reference.</p>
<p><em>[in the “float32 -> Fixed Point Conversion” diagram:]</em></p>
<ul>
<li>Round the 32-bit value to a decimal that is extraBits to the left of the LSB end, using
nearest-even.</li>
</ul>
</blockquote>
<p>There’s the answer: the conversion uses rounding to nearest-even (the same as
the default mode for float math). This means floating-point values will be snapped to the nearest
fixed-point value, with ties breaking to the even side.</p>
<p>Now, we’re finally in a position to explain the artifact that started this whole quest. When we pass
our float32 UVs into the texture unit, they get rounded to the nearest
fixed-point value at 8 subpixel bits—in other words, the nearest 1/256th of a texel. This means
that the last <em>half</em> a bit—the last 1/512th of a texel—will round up to the next higher integer
texel value.</p>
<figure class="not-too-wide only-light-theme" alt="Rounding to the nearest fixed-point texture coordinate" title="Rounding to the nearest fixed-point texture coordinate" >
<img src="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/round-nearest.png"/> <figcaption><p>When fixed-point conversion is done by round-to-nearest, all the points in the yellow square end up rounded to one of the yellow dots, and assigned the corresponding set of texels; likewise the cyan ones.</p>
<p>Note how the squares are offset from the texel centers by half the grid spacing.</p></figcaption>
</figure>
<figure class="not-too-wide only-dark-theme" alt="Rounding to the nearest fixed-point texture coordinate" title="Rounding to the nearest fixed-point texture coordinate" >
<img src="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/round-nearest-dark.png"/> <figcaption><p>When fixed-point conversion is done by round-to-nearest, all the points in the yellow square end up rounded to one of the yellow dots, and assigned the corresponding set of texels; likewise the cyan ones.</p>
<p>Note how the squares are offset from the texel centers by half the grid spacing.</p></figcaption>
</figure>
<p>Therefore, in that last 1/512th, bilinear filtering operations and gathers will choose a one-higher
set of texels to interpolate between—while the shader computing <code>frac</code> on the original float32
values will still think it’s in the original set of texels. This is exactly what we saw in
the original artifact!</p>
<p>Accordingly, we can now see that the <code>frac</code> input needs to be shifted by exactly 1/512th texel in
order to make its wrap point line up. It’s very much like the old C/C++ trick of adding 0.5 before
converting a float to integer, to obtain rounding instead of truncation.</p>
<div class="codehilite"><pre><span></span><code><span class="k">const</span><span class="w"> </span><span class="kt">float</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">1.0</span><span class="o">/</span><span class="mf">512.0</span><span class="p">;</span><span class="w"></span>
<span class="kt">float2</span><span class="w"> </span><span class="n">texelFrac</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">frac</span><span class="p">(</span><span class="n">uv</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">textureSize</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="p">(</span><span class="o">-</span><span class="mf">0.5</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">offset</span><span class="p">));</span><span class="w"></span>
</code></pre></div>
<p>Lo and behold, the flickery lines on the shadow are now completely gone. 👌🎉😎</p>
<h2 id="eight-is-a-magic-number"><a href="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#eight-is-a-magic-number" title="Permalink to this section">Eight is a Magic Number</a></h2>
<p>All GPUs that support D3D11—which means essentially all PC desktop/laptop GPUs from the last decade
and a half—should be compliant with the D3D spec, so they should all be rounding and converting their
texture coordinates the same way. Except that there’s still some wiggle room there: the
spec only prescribes 8 subtexel bits as a <em>minimum</em>. GPU designers have the option to use <em>more</em> than 8,
if they wish. How many bits do they actually use?</p>
<p>Let’s see what Vulkan has to say about it. The Vulkan spec’s chapter
<a href="https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#textures">§16 Image Operations</a>
describes much the same operations as the D3D spec, but at a more abstract mathematical level—it
doesn’t nail down the exact sequence of operations and precision the way D3D does. In particular,
Vulkan doesn’t say what numeric format should be used for the <code>floor</code> operation that extracts the
integer texel coordinates. However, it does say:</p>
<blockquote>
<p class="attribution"><a href="https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#_unnormalized_texel_coordinate_operations">VK §16.6 Unnormalized Texel Coordinate Operations</a></p>
<p>…the number of fraction bits retained is specified by <code>VkPhysicalDeviceLimits::subTexelPrecisionBits</code>.</p>
</blockquote>
<p>So, Vulkan doesn’t out-and-out <em>say</em> that texture coordinates should be converted to a fixed-point
format, but that seems to be implied or assumed, given the specification of a number of “fraction bits”
retained.</p>
<p>Also, in Vulkan the number of subtexel bits can be queried in the physical device properties.
That means we can use Sascha Willems’ fantastic <a href="http://vulkan.gpuinfo.org/">Vulkan Hardware Database</a>
to get an idea of what <code>subTexelPrecisionBits</code> values <a href="http://vulkan.gpuinfo.org/displaydevicelimit.php?name=subTexelPrecisionBits">are reported for actual GPUs out there</a>.</p>
<p>The results as of this writing show about 89% of devices returning 8, and the rest returning 4.
There are no devices returning more than 8.</p>
<figure class="max-width-50 invert-when-dark" alt="Report on the distribution of subTexelPrecisionBits across Vulkan GPUs" title="Report on the distribution of subTexelPrecisionBits across Vulkan GPUs" >
<img src="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/subtexelprecisionbits.png"/> <figcaption><p>The distribution of <code>subTexelPrecisionBits</code> as reported by the Vulkan Hardware Database. The reports of values 0 and 6 look bogus, as do most of the reports of 4.</p></figcaption>
</figure>
<p>The Vulkan spec minimum for <code>subTexelPrecisionBits</code> is also 4, not 8 (see
<a href="https://www.khronos.org/registry/vulkan/specs/1.2-extensions/html/vkspec.html#limits-required">Table 53 – Required Limits</a>).
And it seems there’s a significant minority of GPUs that have only 4 subtexel bits. Or is there?
Let’s poke at that a little further.</p>
<p>Of the reports that return 4 bits, a majority of them seem to be from Apple platforms.
Now, Apple doesn’t implement Vulkan directly, so these must be going through <a href="https://github.com/KhronosGroup/MoltenVK">MoltenVK</a>.
And it turns out that MoltenVK
<a href="https://github.com/KhronosGroup/MoltenVK/blob/9986e92f35d957e3760fa468a53ecad3c9b86478/MoltenVK/MoltenVK/GPUObjects/MVKDevice.mm#L2184-L2189">hardcodes <code>subTexelPrecisionBits</code> to 4</a>,
at the time of this writing. The associated comment suggests that Metal doesn’t publicly
expose or specify this value, so they’re just setting it to the minimum. This value
shouldn’t be taken as meaningful!
In fact, I would bet money that all the Apple GPUs have 8 subtexel bits,
just like everyone else. (The only one I’ve tested directly is the M1, and it indeed seems to be 8.)
However, I don’t think there is any public documentation from Apple to confirm or refute this.</p>
<p>Many other reports of 4 subtexel bits come from older Linux drivers for GPUs that definitely have 8
subtexel bits; those might also be incomplete Vulkan implementations, or some other odd
happenstance. Some Android GPUs also have both 4 and 8 reported in the database for the same GPU;
I assume 8 is the correct value for those. Finally, there are
software rasterizers such as SwiftShader and llvmpipe, which also seem to just return the spec minimum.</p>
<p>The fact that the Vulkan spec minimum is 4, rather than 8, suggests that there are (or were) some GPUs
out there that actually only have 4 subtexel bits—or why wouldn’t the spec minimum be 8? But I
haven’t been able to find out what GPUs those could be.</p>
<p>Moreover, there’s a very practical reason why 8 bits is the standard value!
Subtexel precision is directly related to bilinear filtering, and most textures in 3D apps are
in 8-bit-per-channel formats. If you’re going to interpolate 8-bit texture values and store them in
an 8-bit framebuffer, then you <em>need</em> 8-bit subtexel precision; otherwise, you’re likely to see
banding whenever a texture is magnified—whenever the camera gets close to a surface. Lots of
effects like reflection cubemaps, skyboxes, and bloom filters would also be really messed up if you
had less than 8 subtexel bits!</p>
<p>Overall, it seems very safe to assume that any GPU you’d actually want to run on will have exactly 8
bits of subtexel precision—no more, no less.</p>
<p>What about the rounding mode? Unfortunately, as noted earlier, the Vulkan spec doesn’t actually say that
texture coordinates should be converted to fixed-point, and thus doesn’t specify rounding behavior
for that operation.</p>
<p>Given that the D3D behavior is more tightly specified here, we can expect that behavior to hold
whenever we’re on a D3D-supporting GPU (even if we’re running with Vulkan or OpenGL on that GPU).
The question is a little trickier for other GPUs, such as Apple’s and the assorted mobile GPUs. They
don’t support D3D, so they’re under no obligation to follow D3D’s spec. That said, it seems
probable that they do also use round-to-nearest here, especially Apple. (I’d be a little more
hesitant to assume this across the board with the mobile crowd.)</p>
<p>I can tell you that from my experiments, the 1/512 offset consistently fixes the gather mismatch
across all desktop GPU vendors, OSes, and APIs that I’ve been able to try, including Apple’s. However,
I haven’t had the chance to test this on mobile GPUs so far.</p>
<h2 id="interlude-nearest-filtering"><a href="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#interlude-nearest-filtering" title="Permalink to this section">Interlude: Nearest Filtering</a></h2>
<p>I initially followed a bit of a red herring with this investigation. I wanted to verify whether the
1/512 offset was correct across a wider range of hardware, so I created a Shadertoy to test it, and
asked people to run it and let me know the results. (By the way, thanks, everyone!)</p>
<p>The results I got were all over the place. For some GPU vendors an offset was required, and for
others, it wasn’t. In some cases, it seemed like it might have changed between different architectures
of the same vendor. There was even some evidence that it depended on which API you were using, with
D3D and OpenGL giving different results on the same GPU—although I wasn’t able to conclusively
verify that. Oh jeez. What the heck?</p>
<p>As it turns out, I’d taken a shortcut that was actually kind of a long-cut. You see, Shadertoy is built on
WebGL, which doesn’t actually support texture gathers currently (they’re planned to be in the next
version of WebGL). So, I substituted with something that’s similar in many ways: nearest-neighbor
filtering mode.</p>
<p>Just like gathers, nearest-neighbor filtering also has to select a texel based on the texture unit’s
judgement of which texel square your coordinates are in, and there is again the possibility of a
mismatch versus the shader’s version of the calculation. The only difference is that there isn’t a 0.5
texel offset—otherwise, I expected it to work the same way as a gather, using the same math and
rounding modes.</p>
<p>Surprise! It doesn’t. The results of nearest-neighbor filtering suggest that GPUs aren’t consistent
in how they compute the nearest texel to the sample point. To find the nearest texel, we need to apply
<code>floor</code> to the scaled texel coordinates; but it looks like some GPUs round off the coordinates to
8 subpixel bits before taking the <code>floor</code>, and others might truncate instead of rounding—or they
might just be applying <code>floor</code> to the floating-point value directly, rather than converting it to
fixed-point at all.</p>
<p>Now, the D3D11 functional spec does say (<a href="https://microsoft.github.io/DirectX-Specs/d3d/archive/D3D11_3_FunctionalSpec.htm#7.18.7%20Point%20Sample%20Addressing">§7.18.7 Point Sample Addressing</a>)
that point sampling (aka nearest filtering) is supposed to use the same fixed-point conversion and
rounding as in the bilinear case. And some GPUs out there are definitely in violation of that, to
the tune of 1/512th texel, unless I’ve misunderstood something!</p>
<p><a href="https://www.shadertoy.com/view/flyGRd">Here’s the Shadertoy</a>, if you want to check it out (see the
code comments for an explanation).</p>
<p>Happily, however, if you’re actually interested in gathers, the behavior of those appears to be
completely consistent. (Honestly, surprising for anything to do with GPU hardware!)</p>
<h2 id="conclusion"><a href="https://www.reedbeta.com/blog/texture-gathers-and-coordinate-precision/#conclusion" title="Permalink to this section">Conclusion</a></h2>
<p>The inner workings of texture units are something we can usually gloss over as GPU programmers.
For the most part, once we’ve prepared the mipmaps and configured the sampler settings, things Just
Work™ and we don’t need to think about it a lot.</p>
<p>Once in awhile, though, something comes along that brings the texture unit’s internal behavior to
the fore, and this was a great example. If you ever try to build a custom filter in a shader using
texture gathers, the mismatch in the texture unit’s internal precision versus the float32
calculations in the shader will create a very noticeable visual issue.</p>
<p>Fortuitously, we were able to get a good read on what’s going on from a close
perusal of API specs, and hardware survey data plus a few directed tests helped to confirm that gathers
really do work the way it says in the spec, across a wide range of GPUs. And best of all, the fix is
simple and universal once we’ve understood the problem.</p>git-partial-submodule
https://www.reedbeta.com/made/git-partial-submodule/
https://www.reedbeta.com/made/git-partial-submodule/Nathan ReedSat, 04 Sep 2021 11:47:29 -0700https://www.reedbeta.com/made/git-partial-submodule/#comments<p><a class="biglink" href="https://github.com/Reedbeta/git-partial-submodule/">View on GitHub</a></p>
<p>Have you ever thought about adding a submodule to your git project, but you didn’t want to bear the
burden of downloading and storing the submodule’s entire history, or you only need a handful of
files out of the submodule?</p>
<p>Git provides <a href="https://github.blog/2020-12-21-get-up-to-speed-with-partial-clone-and-shallow-clone/">partial clone</a>
and <a href="https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/">sparse checkout</a>
features that can make this happen for top-level repositories, but so far they aren’t available for
submodules. That’s a hole I aimed to fill with this project. <strong>git-partial-submodule</strong> is a tool for
setting up submodules with blobless clones. It can also save sparse-checkout patterns in your
<code>.gitmodules</code> file, allowing them to be managed by version control, and automatically applied when
the submodules are cloned.</p>
<!--more-->
<p>As a motivating example, a fresh clone of <a href="https://github.com/ocornut/imgui">Dear ImGui</a> consumes
about 80 MB (of which 75 MB is in the <code>.git</code> directory) and takes about 10 seconds to clone on a
fast connection. It also brings in roughly 200 files, including numerous examples and backends and
various other ancillary files. The actual ImGui implementation—the part you need for your app—is
in 11 files totaling 2.5 MB.</p>
<p>In contrast, a blobless, sparse clone of Dear ImGui requires only about 7 MB (4.5 MB in the <code>.git</code>
directory), takes ~2 seconds to clone, and checks out only the files you want.</p>
<p>(This is not to pick on Dear ImGui at all! These issues arise with any healthy, long-lived project,
and the history bloat in particular is an artifact of git’s design.)</p>
<p>One way developers might address this is by “vendoring”, or copying the ImGui files they need into
their own repository and checking them in. That can be a legitimate solution, but it has various
downsides.</p>
<p>Another solution supported out of the box by git is “shallow” clones, which essentially only
download the latest commit and no history. Submodules can be configured to be cloned shallowly.
This works, and is useful in some cases such as cloning on a build machine where you’re not going to
be manipulating the repository at all. However, shallow clones make it difficult to do normal
development workflows with the submodule. In contrast, a blobless clone functions normally with
most workflows, as it can download missing data on demand.</p>
<p>Since git’s own submodule commands do not (yet) allow specifying blobless mode or sparse checkout,
I built git-partial-submodule to work around this. It’s a single-file Python script that you use
just for the initial setup of submodules. Instead of <code>git submodule add</code>, you do
<code>git-partial-submodule.py add</code>. When cloning a repository with existing submodules, you use
<code>git-partial-submodule.py clone</code> instead of recursively cloning or <code>git submodule update --init</code>.</p>
<p>It works by manually calling <code>git clone</code> with the blobless/sparse options, setting up the submodule
repo in your <code>.git/modules</code> directory, and hooking everything up so git sees it as a legit submodule.
Afterward, ordinary submodule operations such as fetches and updates <em>should</em> work normally—although
I haven’t done super extensive testing on this, and I’ve been warned that blobless/sparse are still
experimental git features that may have sharp edges.</p>
<p>The other thing git-partial-submodule does is to save and restore sparse-checkout patterns in your
<code>.gitmodules</code> for each submodule. When you only need a subset of the submodule’s file tree, this
lets you manage those patterns under version control in the superproject, so that others who clone
the project (and are also using git-partial-submodule) will automatically get the right set of
files. You can configure this using the ordinary <code>git sparse-checkout</code> commands, but currently you
have to remember to do the extra step of saving the patterns to <code>.gitmodules</code> when changing them, or
restoring the patterns <em>from</em> <code>.gitmodules</code> after pulling/merging. This might be able to be
automated further using some git hooks, but I haven’t looked into it yet.</p>
<p>I’m excited to try out this workflow for some of my own projects, replacing vendored projects with
partial submodules, and I hope it will be helpful to some others out there as well. Issues and PRs
are open on GitHub, and contributions are welcome. If you end up trying this, let me know if it
works for you!</p>Slope Space in BRDF Theory
https://www.reedbeta.com/blog/slope-space-in-brdf-theory/
https://www.reedbeta.com/blog/slope-space-in-brdf-theory/Nathan ReedFri, 16 Jul 2021 15:34:37 -0700https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#commentsGraphicsMath<p>When you read BRDF theory papers, you’ll often see mention of <em>slope space</em>. Sometimes, components
of the BRDF such as NDFs or masking-shadowing functions are defined in slope space, or operations
are done in slope space before being converted back to ordinary vectors or polar coordinates.
However, the meaning and intuition of slope space is rarely explained. Since it may not be obvious
exactly what slope space is, why it is useful, or how to transform things to and from it, I thought
I would write down a gentler introduction to it. <!--more--></p>
<div class="toc">
<ul>
<li><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#slope-refresher">Slope Refresher</a></li>
<li><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#normals-and-slopes">Normals and Slopes</a></li>
<li><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#slope-space">Slope Space</a></li>
<li><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#converting-to-polar-coordinates">Converting to Polar Coordinates</a></li>
<li><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#properties-of-slope-space">Properties of Slope Space</a></li>
<li><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#distributions-in-slope-space">Distributions in Slope Space</a><ul>
<li><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#the-jacobian">The Jacobian</a></li>
<li><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#some-common-distributions">Some Common Distributions</a></li>
</ul>
</li>
<li><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="slope-refresher"><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#slope-refresher" title="Permalink to this section">Slope Refresher</a></h2>
<p>First off, what even is this “slope” thing we’re talking about? If you think back to your high school
algebra class, the slope of a line was defined as “rise over run”, or the ratio $\Delta y / \Delta x$
between some two points on the line.</p>
<p><img alt="Slope of a line" class="invert-when-dark" src="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/line-slope.png" style="max-height:16em" title="Slope of a line" /></p>
<p>The steeper the line, the larger the magnitude of its slope. The sign of the slope indicates which
direction the line is sloping in. The slope is infinite if the line is vertical.</p>
<p>The concept of slope can readily be generalized to planes as well as lines. Planes have <em>two</em> slopes,
one for $\Delta z / \Delta x$ and one for $\Delta z / \Delta y$ (using $z$-up coordinates, and
assuming the surface is not vertical):</p>
<p><img alt="Slopes of a plane" class="invert-when-dark" src="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/plane-slope.png" style="max-height:26em" title="Slopes of a plane" /></p>
<p>These values describe how much the surface rises or falls in $z$ if you take a step along either
$x$ or $y$. This completely specifies the orientation of a planar surface, as steps in any other
direction can be derived from the $x$ and $y$ slopes.</p>
<p>In calculus, the slope of a line is generalized to the derivative or “instantaneous slope” of a curve,
$\mathrm{d}y/\mathrm{d}x$. For curved surfaces, so long as they can be expressed as a heightfield
(where $z$ is a function of $x, y$), slopes become partial derivatives $\partial z / \partial x$ and
$\partial z / \partial y$.</p>
<p>It’s worth noting that slopes are completely <em>coordinate-dependent</em> quantities. If you transform
to a different coordinate system, the slopes of $z$ with respect to $x, y$ will be totally different
values, or even infinite (if the surface is not a heightfield anymore in the new coordinates).</p>
<h2 id="normals-and-slopes"><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#normals-and-slopes" title="Permalink to this section">Normals and Slopes</a></h2>
<p>We usually describe surfaces in 3D by their normal vector rather than their slopes, as the normal is
able to gracefully handle surfaces in any orientation without infinities, and is easier to transform
into different coordinate systems. However, there is a simple relationship between a surface’s
normal and its slopes, as this diagram should hopefully convince you:</p>
<p><img alt="Normal vector compared with slope" class="invert-when-dark" src="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/normal-slope.png" style="max-height:20em" title="Normal vector compared with slope" /></p>
<p>The two triangles with the dotted lines in the figure are congruent (same angles and sizes), but
rotated by 90 degrees. As the normal is, by definition, perpendicular to the surface, the normal’s
components have the same proportionality as coordinate deltas along the surface, just swapped around.
This diagram shows the $xz$ projection, but the same holds true of the $yz$ components:
$$
\begin{aligned}
\frac{\Delta z}{\Delta x} &= -\frac{\mathbf{n}_x}{\mathbf{n}_z} \\[1em]
\frac{\Delta z}{\Delta y} &= -\frac{\mathbf{n}_y}{\mathbf{n}_z}
\end{aligned}
$$
The negative sign is because $\Delta z$ is going down while $\mathbf{n}_z$ is going up (or vice
versa, depending on the orientation).</p>
<p>Just for completeness, when you have a heightfield surface $z(x, y)$, the partial derivatives are
related to its normal at a point in the same way:
$$
\begin{aligned}
\frac{\partial z}{\partial x} &= -\frac{\mathbf{n}_x}{\mathbf{n}_z} \\[1em]
\frac{\partial z}{\partial y} &= -\frac{\mathbf{n}_y}{\mathbf{n}_z}
\end{aligned}
$$</p>
<h2 id="slope-space"><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#slope-space" title="Permalink to this section">Slope Space</a></h2>
<p>Now we’re finally ready to define slope space. Due to the relationship between slopes and normal
vectors, slopes act as an alternate parameterization of unit vectors in the $z > 0$ hemisphere.
Given any vector, we can treat it as a normal and find the slopes of a surface perpendicular to it.
“Slope space” refers to this domain: the 2D space of all the possible slope values. As slopes can be
any real numbers, slope space is just the real plane, $\mathbb{R}^2$, but with a special meaning.</p>
<p>A good way to visualize slope space is to identify it with the plane $z = 1$. Then, vectors at the
origin can be converted to slope space by intersecting them with the plane:</p>
<p><img alt="Slope space as the z=1 plane" class="invert-when-dark" src="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/normal-z-1.png" style="max-height:16em" title="Slope space as the z=1 plane" /></p>
<p>Here I’ve introduced the notation $\tilde{\mathbf{n}}$ for the 2D vector in slope space corresponding
to the 3D vector $\mathbf{n}$. The tilde ($\sim$) notation for slope-space quantities is commonly
used in the BRDF literature, and I’ll follow it here.</p>
<p>Intersecting a ray with the $z = 1$ plane is equivalent to rescaling the vector so that $\mathbf{n}_z = 1$,
and then the slopes can be read off as the negated $x, y$ components of the rescaled vector. You can
visualize the slope plane as having inverted $x, y$ axes compared to the base coordinates to take
care of this. (Note the $x$-axis on the slope plane, pointing to the left, in the diagram above.)</p>
<p>So, you can picture the hemisphere being blown up and stretched onto the plane, by projecting each
point away from the origin until it hits the plane. This establishes a bijection (one-to-one mapping)
between the unit vectors with $z > 0$ and points on the plane.</p>
<p>To make it official, the slope-space parameterization of an arbitrary vector $\mathbf{v}$ with
$\mathbf{v}_z > 0$ is defined by:
$$
\begin{aligned}
\tilde{\mathbf{v}}_x &= -\frac{\mathbf{v}_x}{\mathbf{v}_z} \\[1em]
\tilde{\mathbf{v}}_y &= -\frac{\mathbf{v}_y}{\mathbf{v}_z}
\end{aligned}
$$
This assumes that the vector is upward-pointing, so that $\mathbf{v}_z > 0$. Finite slopes cannot
represent horizontal vectors (normal to vertical surfaces), and they cannot distinguish between
upward- and downward-pointing vectors, as slopes have no sense of orientation—reverse the normal,
and you still get the same slopes.</p>
<p>Converting back from slopes to an ordinary unit normal vector is also simple:
$$
\mathbf{v} = \text{normalize}(-\tilde{\mathbf{v}}_x, -\tilde{\mathbf{v}}_y, 1)
$$</p>
<h2 id="converting-to-polar-coordinates"><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#converting-to-polar-coordinates" title="Permalink to this section">Converting to Polar Coordinates</a></h2>
<p>Another common parameterization of unit vectors is the polar coordinates $\theta, \phi$.
It’s straightforward to work out the direct conversion between slope space and polar coordinates.</p>
<p>Following common conventions, we define the polar coordinates so that $\theta$ measures downward
from the $+z$ axis, and $\phi$ measures counterclockwise from the $+x$ axis. The conversion between
polar and 3D unit vectors is:
$$
\begin{aligned}
\theta &= \text{acos}(z) \\
\phi &= \text{atan2}(y, x)
\end{aligned}
\qquad
\begin{aligned}
x &= \sin\theta \cos\phi \\
y &= \sin\theta \sin\phi \\
z &= \cos\theta
\end{aligned}
$$
and the conversion between polar and slope space is:
$$
\begin{aligned}
\theta &= \text{atan}(\sqrt{\tilde x^2 + \tilde y^2}) \\
\phi &= \text{atan2}(-\tilde y, -\tilde x)
\end{aligned}
\qquad
\begin{aligned}
\tilde x &= -\!\tan\theta \cos\phi \\
\tilde y &= -\!\tan\theta \sin\phi \\
\end{aligned}
$$
This can be derived by setting $\tilde x = -x/z$ and substituting the conversion from polar, then
using the identity $\sin/\cos = \tan$.</p>
<p>A fact worth noting here is that the magnitude of a slope-space vector, $|\tilde{\mathbf{v}}|$, is
equal to $\tan\theta_\mathbf{v}$.</p>
<h2 id="properties-of-slope-space"><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#properties-of-slope-space" title="Permalink to this section">Properties of Slope Space</a></h2>
<p>Now we’ve seen how to define slope space and convert back and forth from it. But why is it useful?
Why would we want to represent vectors or functions in this way?</p>
<p>In microfacet BRDF theory, we usually assume the microsurface is a heightfield for simplicity (which
is a pretty reasonable assumption for a lot of everyday materials). If the microsurface is a
heightfield, then its normals are constrained to the upper hemisphere. Slope space, which
parameterizes exactly the upper hemisphere, is a good match for this.</p>
<p>From a performance perspective, slope space is also much cheaper to transform to and from than polar
coordinates, which makes it nicer to use in shaders. It requires only some divides or a normalize,
as opposed to a bunch of forward or inverse trigonometric functions.</p>
<p>Slope space also has no boundaries, in contrast to other representations of unit vectors. The origin
(0, 0) of the slope plane represents a flat surface normal, and the farther away you get, the more
extreme the slope, but you can’t make the surface turn upside down or produce an invalid normal. So,
you can freely do various manipulations on vectors in slope space without worrying about exceeding
any bounds.</p>
<p>Another useful fact about slope space is that many linear transformations of a surface, such as
scaling or shearing, map to transformations of its slope space in simple ways. For example, scaling
a surface by a factor $\alpha$ along its $z$-axis causes its normal vectors’ $z$-components to scale
by $1/\alpha$ (due to normals taking the inverse transpose), but then since $\mathbf{n}_z$ is in the
denominator in the definition of slope space, we have that the slopes of the surface are scaled by
$\alpha$.</p>
<p>Here’s a table of how transformations of the microsurface map to transformations of slope space:</p>
<table>
<thead>
<tr>
<th>Surface</th>
<th>Slope Space</th>
</tr>
</thead>
<tbody>
<tr>
<td>Horizontal scale by $(\alpha_x, \alpha_y)$</td>
<td>Scale by $(1/\alpha_x, 1/\alpha_y)$</td>
</tr>
<tr>
<td>Vertical scale by $\alpha$</td>
<td>Scale by $\alpha$</td>
</tr>
<tr>
<td>Horizontal rotate ($xy$) by $\theta$</td>
<td>Rotate by $\theta$</td>
</tr>
<tr>
<td>Vertical rotate ($xz, yz$)</td>
<td>Projective transform<br/><em>(not recommended)</em></td>
</tr>
<tr>
<td>Horizontal shear ($xy$) by
$\begin{bmatrix}
1 & k_2 \\
k_1 & 1
\end{bmatrix}$
</td>
<td>Shear by
$\begin{bmatrix}
1 & -k_1 \\
-k_2 & 1
\end{bmatrix}$
</td>
</tr>
<tr>
<td>Vertical shear by
$\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
k_x & k_y & 1
\end{bmatrix}$
</td>
<td>Translate by $(k_x, k_y)$</td>
</tr>
<tr>
<td>Vertical shear by
$\begin{bmatrix}
1 & 0 & k_x \\
0 & 1 & k_y \\
0 & 0 & 1
\end{bmatrix}$
</td>
<td>Projective transform<br/><em>(not recommended)</em></td>
</tr>
</tbody>
</table>
<p>These transformations in slope space are often exploited by parameterized BRDF models; they can
implement roughness, anisotropy, and such as transformations applied to a single canonical BRDF
(see for example <a href="http://jcgt.org/published/0003/02/03/">Heitz 2014</a>, section 5).</p>
<h2 id="distributions-in-slope-space"><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#distributions-in-slope-space" title="Permalink to this section">Distributions in Slope Space</a></h2>
<p>One of the key ingredients in a microfacet BRDF is its normal distribution function (NDF), and one
of the key uses for slope space is defining NDFs. Because slope space is an unbounded 2D plane, we
can import existing 1D or 2D distribution functions and manipulate them in various ways, just as we
would in any 2D domain. As long as we end up with a valid, normalized probability distribution in
the slope plane (sometimes called a slope distribution function, or a $P^{22}$ function—I’m not
sure where the latter term comes from), we can transform it to a properly normalized NDF expressed in
polar or vector form. Let’s see how to do that.</p>
<h3 id="the-jacobian"><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#the-jacobian" title="Permalink to this section">The Jacobian</a></h3>
<p>When mapping distribution functions from one space to another, it’s important to remember that the
values of these functions are not dimensionless numbers; they are <em>densities</em> with respect to the area
or volume measure of the underlying space. Therefore, it’s not enough just to change variables to
express the function in the new coordinates; you also have to correct for the way the mapping
stretches or squeezes the volume, which can vary from place to place.</p>
<p>Symbolically, suppose we have a domain $A$ with a probability density $p(a)$ defined on it. We want
to map this to a domain $B$ parameterized by some new coordinates $b$. What we want is <em>not</em> just
$p(a) = p(b)$ when $a \mapsto b$ under the mapping. Rather, we need to maintain:
$$
p(a) \, \mathrm{d}A = p(b) \, \mathrm{d}B
$$
where $\mathrm{d}A, \mathrm{d}B$ are matching volume elements of the respective spaces, with
$\mathrm{d}A \mapsto \mathrm{d}B$ under the mapping we’re using. This says that the amount of
probability (or whatever thing whose density we’re measuring) in the infinitesimal volume $\mathrm{d}A$
is conserved under the mapping; the same amount of probability is present in $\mathrm{d}B$.</p>
<p>This equation can be rewritten:
$$
p(b) = p(a) \frac{\mathrm{d}A}{\mathrm{d}B}
$$
The factor $\mathrm{d}A / \mathrm{d}B$ here is called the Jacobian, referring to the determinant of
the <a href="https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant">Jacobian matrix</a> which contains
all the derivatives of the change of variables from $a$ to $b$. Actually, this is the <em>inverse</em>
Jacobian, as the forward Jacobian for $A \to B$ would be $\mathrm{d}B / \mathrm{d}A$. The forward
Jacobian is the factor by which the mapping stretches or squeezes volumes locally around a point.
Because a probability density has volume in the denominator, it transforms using the inverse Jacobian.</p>
<p>So, when converting a slope-space distribution to an NDF, we have to multiply by the appropriate
Jacobian. But how do we find out what that is? First off, we have to recall that NDFs are defined
not as a density over solid angle in the hemisphere, but
<a href="/blog/hows-the-ndf-really-defined/">as a density over projected area on the $xy$ plane</a>.
Thus, it’s not enough to just find the Jacobian from slope space to polar coordinates; we also need
to find the Jacobian from polar coordinates to projected area.</p>
<p>To do this, I find it easiest to use the formalism of <a href="https://en.wikipedia.org/wiki/Differential_form">differential forms</a>.
Explaining how those work is out of the scope of this article, but
<a href="https://www.math.purdue.edu/~arapura/preprints/diffforms.pdf">here’s an exposition I found useful</a>.
They’re essentially fields of <a href="/blog/normals-inverse-transpose-part-3/">dual $k$-vectors</a>.</p>
<p>First, we can write down the $xy$ projected area element, $\mathrm{d}x \wedge \mathrm{d}y$, in terms
of polar coordinates by differentiating the mapping from polar to Cartesian, which I’ll repeat here
for convenience:
$$
\begin{gathered}
\left\{
\begin{aligned}
x &= \sin\theta \cos\phi \\
y &= \sin\theta \sin\phi \\
z &= \cos\theta
\end{aligned}
\right. \\[2em]
\begin{aligned}
\mathrm{d}x \wedge \mathrm{d}y
&= (\cos\theta\cos\phi\,\mathrm{d}\theta - \sin\theta\sin\phi\,\mathrm{d}\phi) \ \wedge \\
&\qquad (\cos\theta\sin\phi\,\mathrm{d}\theta + \sin\theta\cos\phi\,\mathrm{d}\phi) \\[0.5em]
&= \cos\theta\sin\theta\cos^2\phi\,(\mathrm{d}\theta \wedge \mathrm{d}\phi) \ - \\
&\qquad \cos\theta\sin\theta\sin^2\phi\,(\mathrm{d}\phi \wedge \mathrm{d}\theta) \\[0.5em]
&= \cos\theta\sin\theta\,(\mathrm{d}\theta \wedge \mathrm{d}\phi)
\end{aligned}
\end{gathered}
$$
Then, we can do the same thing with the slope-space area element:
$$
\begin{gathered}
\left\{
\begin{aligned}
\tilde x &= -\!\tan\theta \cos\phi \\
\tilde y &= -\!\tan\theta \sin\phi \\
\end{aligned}
\right. \\[1.5em]
\begin{aligned}
\mathrm{d}\tilde x \wedge \mathrm{d} \tilde y
&= -(\cos^{-2}\theta\cos\phi\,\mathrm{d}\theta - \tan\theta\sin\phi\,\mathrm{d}\phi) \ \wedge \\
&\qquad -(\cos^{-2}\theta\sin\phi\,\mathrm{d}\theta + \tan\theta\cos\phi\,\mathrm{d}\phi) \\[0.5em]
&= \tan\theta\cos^{-2}\theta\cos^2\phi\,(\mathrm{d}\theta \wedge \mathrm{d}\phi) \ - \\
&\qquad \tan\theta\cos^{-2}\theta\sin^2\phi\,(\mathrm{d}\phi \wedge \mathrm{d}\theta) \\[0.5em]
&= \frac{\tan\theta}{\cos^2\theta} \, (\mathrm{d}\theta \wedge \mathrm{d}\phi)
\end{aligned}
\end{gathered}
$$
Now, all we have to do is divide:
$$
\begin{aligned}
\frac{\mathrm{d}\tilde x \wedge \mathrm{d} \tilde y}{\mathrm{d}x \wedge \mathrm{d}y} &=
\frac{\tan\theta}{\cos^2\theta} \frac{1}{\cos\theta\sin\theta} \\[1em]
&= \frac{1}{\cos^4\theta}
\end{aligned}
$$
Et voilà! The Jacobian for converting densities from slope space to NDF form is $1/\cos^4\theta$.
We’ll have to multiply by this factor in addition to changing variables.</p>
<h3 id="some-common-distributions"><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#some-common-distributions" title="Permalink to this section">Some Common Distributions</a></h3>
<p>As an example of the conversion from slope space to NDF, let’s take the standard (bivariate)
Gaussian distribution defined on slope space:
$$
D(\tilde{\mathbf{m}}, \sigma) = \frac{1}{2\pi\sigma^2} \exp\left(-\frac{|\tilde{\mathbf{m}}|^2}{2\sigma^2}\right)
$$
To turn this into an NDF, we need to change variables from $\tilde{\mathbf{m}}$ to $(\theta_\mathbf{m}, \phi_\mathbf{m})$,
and also multiply by the Jacobian $1/\cos^4\theta_\mathbf{m}$. Recalling that $|\tilde{\mathbf{m}}| = \tan\theta_\mathbf{m}$,
this becomes:
$$
D(\mathbf{m}, \sigma) = \frac{1}{2\pi\sigma^2\cos^4\theta_\mathbf{m}} \exp\left(-\frac{\tan^2\theta_\mathbf{m}}{2\sigma^2}\right)
$$
Hey, that looks familiar—it’s the Beckmann NDF! (Although it’s more usually seen with the roughness
parameter $\alpha = \sqrt{2}\sigma$.) The Beckmann distribution is a Gaussian in slope space.</p>
<p>The isotropic GGX NDF (<a href="https://www.cs.cornell.edu/~srm/publications/EGSR07-btdf.pdf">Walter et al 2007</a>)
looks like this:
$$
D(\mathbf{m}, \alpha) = \frac{\alpha^2}{\pi \cos^4\theta_\mathbf{m} \bigl(\alpha^2 + \tan^2\theta_\mathbf{m} \bigr)^2 }
$$
You might now recognize those familiar-looking $\cos^4\theta_\mathbf{m}$ and $\tan\theta_\mathbf{m}$
factors. Yep, this NDF is also a convert from slope space! Working backwards, we can see that it was
originally:
$$
D(\tilde{\mathbf{m}}, \alpha) = \frac{\alpha^2}{\pi \bigl(\alpha^2 + |\tilde{\mathbf{m}}|^2 \bigr)^2 }
$$
Although this formula is probably less familiar, it matches the pdf of the bivariate
<a href="https://en.wikipedia.org/wiki/Multivariate_t-distribution">Student’s <span style="white-space:nowrap">$t$-distribution</span></a> with the
“normality” parameter $\nu$ set to 2, and scaled by $\alpha/\sqrt{2}$. (You can also create a family of NDFs
that interpolate between GGX and Beckmann, by exposing a user parameter that controls $\nu$; see
<a href="https://mribar03.bitbucket.io/projects/eg_2017/distribution.pdf">Ribardière et al 2017</a>.)</p>
<p>(Incidentally, the GGX NDF is often seen written in this alternate form:
$$
D(\mathbf{m}, \alpha) = \frac{\alpha^2}{\pi \bigl( (\alpha^2 - 1)\cos^2\theta_\mathbf{m} + 1 \bigr)^2 }
$$
This is the same function as the form above (which is from the original GGX paper), but rearranged
to make it cheaper to evaluate, as it eliminates the $\tan^2$ using the identity
<span style="white-space:nowrap">$\tan^2 = (1 - \cos^2)/\cos^2$</span>. However, this form also
introduces numerical precision problems, and <a href="https://github.com/google/filament">Filament</a> has a
<a href="https://google.github.io/filament/Filament.html#materialsystem/specularbrdf/normaldistributionfunction(speculard)">numerically stable form</a>:
$$
D(\mathbf{m}, \alpha) = \frac{\alpha^2}{\pi \bigl(\alpha^2 \cos^2\theta_\mathbf{m} + \sin^2\theta_\mathbf{m} \bigr)^2 }
$$
which is <em>again</em> the same function, rearranged some more; you’re meant to calculate $\sin^2\theta_\mathbf{m}$
as the squared magnitude of the cross product $|\mathbf{n} \times \mathbf{m}|^2$. This has nothing to
do with slope space; I just thought it was neat and worth knowing.)</p>
<h2 id="conclusion"><a href="https://www.reedbeta.com/blog/slope-space-in-brdf-theory/#conclusion" title="Permalink to this section">Conclusion</a></h2>
<p>To recap, the most important thing to take away about slope space is that it provides an alternate
representation for unit vectors in the upper hemisphere, by projecting them out onto an infinite
plane. This enables us to work with distributions in plain old 2D space, and then map them back into
functions on the hemisphere. Slope space also provides convenient mappings from some linear
transformations of the microsurface to linear or affine transformations in the slope plane.</p>
<p>I hope this has demystified the concept of slope space a little bit, and now you won’t be confused
by it anymore when reading BRDF papers! 😄</p>Hash Functions for GPU Rendering
https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/
https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/Nathan ReedFri, 21 May 2021 17:52:07 -0700https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/#commentsCodingGPUGraphics<p>Back in 2013, I wrote a <a href="/blog/quick-and-easy-gpu-random-numbers-in-d3d11/">somewhat popular article</a>
about pseudorandom number generation on the GPU. In the eight years since, a number of new PRNGs and
hash functions have been developed; and a few months ago, an excellent paper on the topic appeared
in JCGT: <a href="http://jcgt.org/published/0009/03/02/">Hash Functions for GPU Rendering</a>, by Mark Jarzynski
and Marc Olano. I thought it was time to update my former post in light of this paper’s findings.</p>
<!--more-->
<p>Jarzynski and Olano’s paper compares GPU implementations of a large number of different hash functions
along dual axes of performance (measured by time to render a quad evaluating the hash at each pixel)
and statistical quality (quantified by the count of failures of
<a href="https://en.wikipedia.org/wiki/TestU01">TESTU01 “Big Crush”</a> tests). Naturally, there is quite a
spread of results in both performance and quality. Jarzynski and Olano then identify the few hash
functions that lie along the Pareto frontier—meaning they are the best choices along the whole
spectrum of performance/quality trade-offs.</p>
<p>When choosing a hash function, we might sometimes prioritize performance, and other times might
prefer to sacrifice performance in favor of higher quality (real-time versus offline applications,
for example). The Pareto frontier provides the set of optimal choices for any point along that
balance—ranging from LCGs at the extreme performance-oriented end, to some quite expensive but
very high-quality hashes at the other end.</p>
<p>In my 2013 article, I recommended the “Wang hash” as a general-purpose 32-bit-to-32-bit integer hash
function. The Wang hash was among those tested by Jarzynski and Olano, but unfortunately it did not
lie along the Pareto frontier—not even close! The solution that dominates it—and one of the best
balanced choices between performance and quality overall—is <strong>PCG</strong>. In particular, the 32-bit PCG
hash used by Jarzynski and Olano goes as follows:</p>
<div class="codehilite"><pre><span></span><code><span class="kt">uint</span><span class="w"> </span><span class="n">pcg_hash</span><span class="p">(</span><span class="kt">uint</span><span class="w"> </span><span class="n">input</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">uint</span><span class="w"> </span><span class="n">state</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">input</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">747796405</span><span class="n">u</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">2891336453</span><span class="n">u</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">uint</span><span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">((</span><span class="n">state</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="p">((</span><span class="n">state</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="mi">28</span><span class="n">u</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">4</span><span class="n">u</span><span class="p">))</span><span class="w"> </span><span class="o">^</span><span class="w"> </span><span class="n">state</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">277803737</span><span class="n">u</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span><span class="n">word</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="mi">22</span><span class="n">u</span><span class="p">)</span><span class="w"> </span><span class="o">^</span><span class="w"> </span><span class="n">word</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>This has slightly better performance and <em>much</em> better statistical quality than the Wang hash. It’s
fast enough to be useful for real-time, while also being high-quality enough for almost any graphics
use-case (if you’re not using precomputed blue noise, or low-discrepancy sequences). It should
probably be your default GPU hash function.</p>
<p>Just to prove it works, here’s the bit pattern generated by a few thousand invocations of the above
function on consecutive inputs:</p>
<p><img alt="Bit pattern generated by PCG hash" src="https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/pcg.png" title="Bit pattern generated by PCG hash" /></p>
<p>Yep, looks random! 👍</p>
<h2 id="pcg-variants"><a href="https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/#pcg-variants" title="Permalink to this section">PCG Variants</a></h2>
<p>Incidentally, you might notice that the PCG function posted above doesn’t match that found in other
sources, such as the <a href="https://www.pcg-random.org/download.html">minimal C implementation on the PCG website</a>.
This is because “PCG” isn’t a single function, but more of a recipe for constructing PRNG functions.
It works by starting with an LCG, and then applying a permutation function to mix around the bits
and improve the quality of the results. There many possible permutation functions, and
<a href="https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf">O’Neill’s original PCG paper</a>
provides a set of building blocks that can be combined in various ways to get generators with
different characteristics. In particular, the PCG used by Jarzynski and Olano corresponds to the
32-bit “RXS-M-XS” variant described in §6.3.4 of O’Neill. (See also the list of variants on
<a href="https://en.wikipedia.org/wiki/Permuted_congruential_generator#Variants">Wikipedia</a>).</p>
<h2 id="hash-or-prng"><a href="https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/#hash-or-prng" title="Permalink to this section">Hash or PRNG?</a></h2>
<p>One of the main points I discussed in my 2013 article was the distinction between PRNGs and hash
functions: the former are designed for a good distribution <em>within</em> a single stateful stream, but do
not necessarily provide good distribution <em>across</em> streams with consecutive seeds; hash functions are
stateless and designed to give a good distribution even with consecutive (or otherwise highly
correlated) inputs.</p>
<p>PCG is actually designed to be a PRNG, <em>not</em> a hash function, so it may surprise you to see it being
used as a hash here. What gives? Well, apparently PCG is just so good that it works well as a hash
function too! ¯\_(ツ)_/¯</p>
<p>It’s worth noting that PCG <em>does</em> support more or less efficient jump-ahead, owing to the LCG at its
core; it’s possible to advance an LCG by $n$ steps in only $O(\log n)$ work using
<a href="https://www.nayuki.io/page/fast-skipping-in-a-linear-congruential-generator">modular exponentiation</a>.
However, that is not what Jarzynski and Olano’s code does: it’s not jumping ahead to the $n$th
value in a single PCG sequence, but essentially just taking the first value from each of $n$
sequences with consecutive initial states. The fact that this works at all is somewhat surprising,
and a testament to the power of permutation functions.</p>
<p>In my previous article, I also recommended that if you need multiple random values per pixel, you
could start with a hash function and then iterate either LCG or Xorshift using the hash output as an
initial state. You can still do that, using PCG as the initial hash—but it might be just as fast
to iterate PCG. The interesting thing about PCG’s design is that only the LCG portion of it actually
carries data dependencies from one iteration to the next, and LCGs are super fast. The permutation
parts are independent of each other and can be pipelined to exploit instruction-level parallelism
when doing multiple iterations.</p>
<p>For completeness, the “PRNG form” of the above PCG variant looks like:</p>
<div class="codehilite"><pre><span></span><code><span class="kt">uint</span><span class="w"> </span><span class="n">rng_state</span><span class="p">;</span><span class="w"></span>
<span class="kt">uint</span><span class="w"> </span><span class="n">rand_pcg</span><span class="p">()</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">uint</span><span class="w"> </span><span class="n">state</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rng_state</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">rng_state</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rng_state</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">747796405</span><span class="n">u</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">2891336453</span><span class="n">u</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">uint</span><span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">((</span><span class="n">state</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="p">((</span><span class="n">state</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="mi">28</span><span class="n">u</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">4</span><span class="n">u</span><span class="p">))</span><span class="w"> </span><span class="o">^</span><span class="w"> </span><span class="n">state</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">277803737</span><span class="n">u</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span><span class="n">word</span><span class="w"> </span><span class="o">>></span><span class="w"> </span><span class="mi">22</span><span class="n">u</span><span class="p">)</span><span class="w"> </span><span class="o">^</span><span class="w"> </span><span class="n">word</span><span class="p">;</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>That’s about it! Be sure to check out <a href="http://jcgt.org/published/0009/03/02/">Jarzynski and Olano’s paper</a>
for some more tidbits, including a discussion of hashes with multi-dimensional inputs and outputs.</p>Making Your Own Container Compatible With C++20 Ranges
https://www.reedbeta.com/blog/ranges-compatible-containers/
https://www.reedbeta.com/blog/ranges-compatible-containers/Nathan ReedSat, 20 Mar 2021 17:23:15 -0700https://www.reedbeta.com/blog/ranges-compatible-containers/#commentsCoding<p>With some of my spare time lately, I’ve been enjoying learning about some of the new features in
C++20. <a href="https://en.cppreference.com/w/cpp/language/constraints">Concepts</a> and the closely-related
<a href="https://akrzemi1.wordpress.com/2020/03/26/requires-clause/"><code>requires</code> clauses</a> are two great
extensions to template syntax that remove the necessity for all the SFINAE junk we used to have to
do, making our code both more readable and more precise, and providing much better error messages
(although MSVC has sadly been <a href="https://developercommunity.visualstudio.com/t/786814">lagging in the error messages department</a>,
at the time of this writing).</p>
<p>Another interesting C++20 feature is the addition of the <a href="https://en.cppreference.com/w/cpp/ranges">ranges library</a>
(also <a href="https://en.cppreference.com/w/cpp/algorithm/ranges">ranges algorithms</a>), which provides a
nicer, more composable abstraction for operating on containers and sequences of objects. At the most
basic level, a range wraps an iterator begin/end pair, but there’s much more to it than that. This
article isn’t going to be a tutorial on ranges, but <a href="https://www.youtube.com/watch?v=VmWS-9idT3s">here’s a talk</a>
to watch if you want to see more of what it’s all about.</p>
<p>What I’m going to discuss today is the process of adding “ranges compatibility” to your own container
class. Many of the C++ codebases we work in have their own set of container classes beyond the STL
ones, for a variety of reasons—<a href="/blog/data-oriented-hash-table/">better performance</a>, more control
over memory layouts, more customized interfaces, and so on. With a little work, it’s possible to
make your custom containers also function as ranges and interoperate with the C++20 ranges library.
Here’s how to do it.</p>
<!--more-->
<div class="toc">
<ul>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#making-your-container-an-input-range">Making Your Container an Input Range</a><ul>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#range-concepts">Range Concepts</a></li>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#defining-range-compatible-iterators">Defining Range-Compatible Iterators</a></li>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#begin-end-size">Begin, End, Size</a></li>
</ul>
</li>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#accepting-output-from-ranges">Accepting Output From Ranges</a><ul>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#constructor-from-a-range">Constructor From A Range</a></li>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#output-iterators">Output Iterators</a></li>
</ul>
</li>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="making-your-container-an-input-range"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#making-your-container-an-input-range" title="Permalink to this section">Making Your Container an Input Range</a></h2>
<p>At the high level, there are two basic ways that a container class can interact with ranges. First,
it can be <em>readable</em> as a range, meaning that we can iterate over it, pipe it into views and pass it
to range algorithms, and so forth. In the parlance of the ranges library, this is known as being an
<em>input range</em>: a range that can provide input to other things.</p>
<p>The other direction is to accept output <em>from</em> ranges, storing the output into your container.
We’ll do that later. To begin with, let’s see how to make your container act as an input range.</p>
<h3 id="range-concepts"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#range-concepts" title="Permalink to this section">Range Concepts</a></h3>
<p>The first decision we have to make is what particular kind of input range we can model. The C++20
STL defines a number of different <a href="https://en.cppreference.com/w/cpp/ranges#Range_concepts">concepts for ranges</a>,
depending on the capabilities of their iterators and other things. Several of these form a hierarchy
from more general to more specific kinds of ranges with tighter requirements. Generally speaking, it’s
best for your container to implement the most specific range concept it’s able to. This enables code
that works with ranges to make better decisions and use more optimal code paths. (We’ll see some
examples of this in a minute.)</p>
<p>The relevant input range concepts are:</p>
<ul>
<li><code>std::ranges::input_range</code>: the most bare-bones version. It requires only that you have iterators
that can retrieve the contents of the range. In particular, it <em>doesn’t</em> require that the range
can be iterated more than once: iterators are not required to be copyable, and <code>begin</code>/<code>end</code> are
not required to give you the iterators more than once. This could be an appropriate concept for
ranges that are actually generating their contents as the result of some algorithm that’s not
easily/cheaply repeatable, or receiving data from a network connection or suchlike.</li>
<li><code>std::ranges::forward_range</code>: the range can be iterated as many times as you like, but only in
the forward direction. Iterators can be copied and saved off to later resume iteration from an
earlier point, for example.</li>
<li><code>std::ranges::bidirectional_range</code>: iterators can be decremented as well as incremented.</li>
<li><code>std::ranges::random_access_range</code>: you can efficiently do arithmetic on iterators—you can
offset them forward or backward by a given number of steps, or subtract them to find the number
of steps between.</li>
<li><code>std::ranges::contiguous_range</code>: the elements are actually stored as a contiguous array in memory;
the iterators are essentially fancy pointers (or literally <em>are</em> just pointers).</li>
</ul>
<p>In addition to this hierarchy of input range concepts, there are a couple of other standalone ones
worth mentioning:</p>
<ul>
<li><code>std::ranges::sized_range</code>: you can efficiently get the size of the range, i.e. how many elements
from begin to end. Note that this is a much looser constraint than <code>random_access_range</code>: the
latter requires you be able to efficiently measure the distance between <em>any pair</em> of iterators
inside the range, while <code>sized_range</code> only requires that the size of the <em>whole range</em> is known.</li>
<li><code>std::ranges::borrowed_range</code>: indicates that a range doesn’t own its data, i.e. it’s referencing
(“borrowing”) data that lives somewhere else. This can be useful because it allows references/iterators
into the data to survive beyond the lifetime of the range object itself.</li>
</ul>
<p>The reason all these concepts are important is that if I’m writing code that operates on ranges, I might need to
require some of these concepts in order to do my work efficiently. For example, a sorting routine
would be very difficult to write for anything less than a <code>random_access_range</code> (and indeed you’ll
see that <a href="https://en.cppreference.com/w/cpp/algorithm/ranges/sort"><code>std::ranges::sort</code> requires that</a>).
In other cases, I might be able to do things more optimally when the range satisfies certain
concepts—for instance, if it’s a <code>sized_range</code>, I could preallocate some storage for results,
while if it’s only an <code>input_range</code> and no more, then I’ll have to dynamically reallocate, as I have
no idea how many elements there are going to be.</p>
<p>The rest of the ranges library is written in terms of these concepts (and you can write your own
code that operates generically on ranges using these concepts as well). So, once your container
satisfies the relevant concepts, it will automatically be recognized and function as a range!</p>
<p>In C++20, concepts act as boolean expressions, so you can check whether your container satisfies the
concepts you expect by just writing asserts for them:</p>
<div class="codehilite"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf"><ranges></span><span class="cp"></span>
<span class="k">static_assert</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">forward_range</span><span class="o"><</span><span class="n">MyCoolContainer</span><span class="o"><</span><span class="kt">int</span><span class="o">>></span><span class="p">);</span><span class="w"></span>
<span class="c1">// int is just an arbitrarily chosen element type, since we</span>
<span class="c1">// can't assert a concept for an uninstantiated template</span>
</code></pre></div>
<p>Checks like this are great to add to your test suite—I’m big in favor of writing <em>compile-time</em>
tests for generic/metaprogramming stuff, in addition to the usual runtime tests.</p>
<p>However, when you first drop that assert into your code, it will almost certainly fail. Let’s see
now what you need to do to actually satisfy the range concepts.</p>
<h3 id="defining-range-compatible-iterators"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#defining-range-compatible-iterators" title="Permalink to this section">Defining Range-Compatible Iterators</a></h3>
<p>In order to satisfy the input range concepts, you need to do two things:</p>
<ul>
<li>Have <code>begin</code> and <code>end</code> functions that return some iterator and sentinel types. (We’ll discuss
these in a little bit.)</li>
<li>The iterator type must satisfy the iterator concept that matches your range concept.</li>
</ul>
<p>Each one of the concepts from <code>input_range</code> down to <code>contiguous_range</code> has a corresponding
<a href="https://en.cppreference.com/w/cpp/header/iterator#Iterator_concepts">iterator concept</a>:
<code>std::input_iterator</code>, <code>std::forward_iterator</code>, and so on. It’s these concepts that contain the real
meat of the requirements that define the different types of ranges: they list all the operations
each kind of iterator must support.</p>
<p>To begin with, there are a couple of member type aliases that any iterator class will need to define:</p>
<ul>
<li><code>difference_type</code>: some signed integer type, usually <code>std::ptrdiff_t</code></li>
<li><code>value_type</code>: the type of elements that the iterator references</li>
</ul>
<p>The second one seems pretty understandable, but I honestly have no idea why the <code>difference_type</code>
requirement is here. Taking the difference between iterators doesn’t make sense until you get to
random-access iterators, which actually define that operation. As far as I can tell, the
<code>difference_type</code> for more general iterators isn’t actually <em>used</em> by anything. Nevertheless,
according to the C++ standard, it has to be there. It seems that the usual idiom is to set it to
<code>std::ptrdiff_t</code> in such cases, although it can be any signed integer type.</p>
<p>(Technically you can also define these types by specializing <code>std::iterator_traits</code> for your iterator,
but here we’re just going to put them in the class.)</p>
<p>Beyond that, the requirements for <code>std::input_iterator</code> are pretty straightforward:</p>
<ul>
<li>The iterator must be default-initializable and movable. (It doesn’t have to be copyable.)</li>
<li>It must be equality-comparable with its sentinel (the value marking the end of the range). It
doesn’t have to be equality-comparable with other iterators.</li>
<li>It must implement <code>operator ++</code>, in <em>both</em> preincrement and postincrement positions. However, the
postincrement version does not have to return anything.</li>
<li>It must have an <code>operator *</code> that returns a reference to whatever the <code>value_type</code> is.</li>
</ul>
<p>One point of interest here is that the default-initializable requirement means that the iterator class
can’t contain references, e.g. a reference to the container it comes from. It can store pointers,
though.</p>
<p>A prototype input iterator class could look like this:</p>
<div class="codehilite"><pre><span></span><code><span class="k">template</span><span class="w"> </span><span class="o"><</span><span class="k">typename</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w"></span>
<span class="k">class</span><span class="w"> </span><span class="nc">Iterator</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="k">public</span><span class="o">:</span><span class="w"></span>
<span class="w"> </span><span class="k">using</span><span class="w"> </span><span class="n">difference_type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="kt">ptrdiff_t</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">using</span><span class="w"> </span><span class="n">value_type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">T</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">Iterator</span><span class="p">();</span><span class="w"> </span><span class="c1">// default-initializable</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="k">operator</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">Sentinel</span><span class="o">&</span><span class="p">)</span><span class="w"> </span><span class="k">const</span><span class="p">;</span><span class="w"> </span><span class="c1">// equality with sentinel</span>
<span class="w"> </span><span class="n">T</span><span class="o">&</span><span class="w"> </span><span class="k">operator</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="p">;</span><span class="w"> </span><span class="c1">// dereferenceable</span>
<span class="w"> </span><span class="n">Iterator</span><span class="o">&</span><span class="w"> </span><span class="k">operator</span><span class="w"> </span><span class="o">++</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="c1">// pre-incrementable</span>
<span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="cm">/*do stuff...*/</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">*</span><span class="k">this</span><span class="p">;</span><span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="k">operator</span><span class="w"> </span><span class="o">++</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="c1">// post-incrementable</span>
<span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="o">++*</span><span class="k">this</span><span class="p">;</span><span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="k">private</span><span class="o">:</span><span class="w"></span>
<span class="w"> </span><span class="c1">// implementation...</span>
<span class="p">};</span><span class="w"></span>
</code></pre></div>
<p>For a <code>std::forward_iterator</code>, the requirements are just slightly tighter:</p>
<ul>
<li>The iterator must be copyable.</li>
<li>It must be equality-comparable with other iterators of the same container.</li>
<li>The postincrement operator must return a copy of the iterator before modification.</li>
</ul>
<p>A prototype forward iterator class could look like:</p>
<div class="codehilite"><pre><span></span><code><span class="k">template</span><span class="w"> </span><span class="o"><</span><span class="k">typename</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w"></span>
<span class="k">class</span><span class="w"> </span><span class="nc">Iterator</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="k">public</span><span class="o">:</span><span class="w"></span>
<span class="w"> </span><span class="c1">// ...same as the previous one, except:</span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="k">operator</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">Iterator</span><span class="o">&</span><span class="p">)</span><span class="w"> </span><span class="k">const</span><span class="p">;</span><span class="w"> </span><span class="c1">// equality with iterators</span>
<span class="w"> </span><span class="n">Iterator</span><span class="w"> </span><span class="k">operator</span><span class="w"> </span><span class="o">++</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="c1">// post-incrementable, returns prev value</span>
<span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">Iterator</span><span class="w"> </span><span class="n">temp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">*</span><span class="k">this</span><span class="p">;</span><span class="w"> </span><span class="o">++*</span><span class="k">this</span><span class="p">;</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">temp</span><span class="p">;</span><span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">};</span><span class="w"></span>
</code></pre></div>
<p>I’m not going to go through the rest of them in detail; you can read the details
<a href="https://en.cppreference.com/w/cpp/header/iterator#Iterator_concepts">on cppreference</a>.</p>
<h3 id="begin-end-size"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#begin-end-size" title="Permalink to this section">Begin, End, Size</a></h3>
<p>Once your container is equipped with an iterator class that satisfies the relevant concepts, you’ll
need to provide <code>begin</code> and <code>end</code> functions to get those iterators. There are three ways to do this:
they can be member functions on the container, they can be free functions that live next to the
container in the same namespace, or they can be <a href="https://www.justsoftwaresolutions.co.uk/cplusplus/hidden-friends.html">“hidden friends”</a>;
they just need to be findable by <a href="https://en.cppreference.com/w/cpp/language/adl">ADL</a>.</p>
<p>The return types from <code>begin</code> and <code>end</code> don’t have to be the same. In some cases, it can be useful
to have <code>end</code> return a different type of object, a “sentinel”, which isn’t actually an iterator; it
just needs to be equality-comparable with iterators, so you can tell when you’ve gotten to the end
of the container.</p>
<p>Also, these are the same <code>begin</code>/<code>end</code> used for <a href="https://en.cppreference.com/w/cpp/language/range-for">range-based <code>for</code> loops</a>.</p>
<p>One oddity worth mentioning here is that if you go the free/friend functions route, you’ll need to
add overloads for both const and non-const versions of your container:</p>
<div class="codehilite"><pre><span></span><code><span class="k">class</span><span class="w"> </span><span class="nc">MyCoolContainer</span><span class="p">;</span><span class="w"></span>
<span class="k">auto</span><span class="w"> </span><span class="n">begin</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">MyCoolContainer</span><span class="o">&</span><span class="w"> </span><span class="n">c</span><span class="p">);</span><span class="w"></span>
<span class="k">auto</span><span class="w"> </span><span class="n">end</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">MyCoolContainer</span><span class="o">&</span><span class="w"> </span><span class="n">c</span><span class="p">);</span><span class="w"></span>
<span class="k">auto</span><span class="w"> </span><span class="n">begin</span><span class="p">(</span><span class="n">MyCoolContainer</span><span class="o">&</span><span class="w"> </span><span class="n">c</span><span class="p">);</span><span class="w"></span>
<span class="k">auto</span><span class="w"> </span><span class="n">end</span><span class="p">(</span><span class="n">MyCoolContainer</span><span class="o">&</span><span class="w"> </span><span class="n">c</span><span class="p">);</span><span class="w"></span>
</code></pre></div>
<p>You might think it would be enough to provide just the const overloads, but if you do that, only the
const version of the container will be recognized as a range! The non-const overloads must be
present as well for non-const containers to work.</p>
<p>Curiously, if you provide <code>begin</code>/<code>end</code> as member functions instead, then this doesn’t come up:
const overloads will work for both.</p>
<p>This behavior is surprising, and I’m not sure if it was intended. However, it’s worth noting that
iterators generally need to remember the constness of the container they came from: a const
container should give you a “const iterator” that doesn’t allow mutating its elements. Therefore,
the const and non-const overloads of <code>begin</code>/<code>end</code> will generally need to return <em>different</em>
iterator types, and so you’ll need to have both in any case. (The exception would be if you’re
building an immutable container; then it only needs a const iterator type.)</p>
<p>In addition to <code>begin</code> and <code>end</code>, you’ll also want to implement a <code>size</code> function, if applicable.
Again, this can be either a member function, a free function, or a hidden friend. The
presence of this function satisfies <code>std::ranges::sized_range</code>, which (as mentioned earlier) can
enable range algorithms to operate more efficiently.</p>
<p>So, to sum up: to allow your custom container class to be readable as a range, you’ll need to:</p>
<ol>
<li>Decide which range concept(s) you can model, which mainly comes down to what level of iterator
capabilities you can provide;</li>
<li>Implement iterator classes (both const and non-const, if applicable) that fulfill all the
requirements of the chosen iterator concept;</li>
<li>Implement <code>begin</code>, <code>end</code>, and <code>size</code> functions.</li>
</ol>
<p>Once we’ve done this, the ranges library should recognize your container as a range. It will
automatically be accepted by range algorithms, we can take views of it, we can iterate over it in
range-for loops, and so on.</p>
<p>As before, you can test that you’ve done everything correctly by asserting that your container
satisfies the expected range concepts. If you’re working with gcc or clang, this will even give you
some pretty reasonable error messages if you didn’t get it right! (In MSVC, for the time being, you’ll
have to narrow down errors by popping open the hood and asserting each of the concept’s sub-clauses
one at a time, to see which one(s) failed.)</p>
<h2 id="accepting-output-from-ranges"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#accepting-output-from-ranges" title="Permalink to this section">Accepting Output From Ranges</a></h2>
<p>We’ve discussed how to make a custom container serve as input <em>to</em> the C++20 ranges library. Now, we
need to come back to the other direction: how to let your container capture output <em>from</em> the
ranges library.</p>
<p>There are a couple of different forms this can take. One way is to accept generic ranges as
parameters to a constructor (or other methods, such as append or insert methods) of your container
class. This allows, for example, easily converting other containers (that are also range-compatible)
to your container. It also allows capturing the output of a ranges “pipeline” (a series of views
chained together).</p>
<p>Another form of range output, which comes up with certain of the <a href="https://en.cppreference.com/w/cpp/algorithm/ranges">range algorithms</a>,
is via <em>output iterators</em>, which are iterators that allow storing or inserting values into your
container.</p>
<h3 id="constructor-from-a-range"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#constructor-from-a-range" title="Permalink to this section">Constructor From A Range</a></h3>
<p>To write a constructor (or other method) that takes a generic range parameter, we can use the same
range concepts we saw earlier. One neat new feature in C++20 is writing functions with a parameter
type (or return type) constrained to match a given concept. The syntax looks like this:</p>
<div class="codehilite"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf"><ranges></span><span class="cp"></span>
<span class="k">class</span><span class="w"> </span><span class="nc">MyCoolContainer</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="k">public</span><span class="o">:</span><span class="w"></span>
<span class="w"> </span><span class="k">explicit</span><span class="w"> </span><span class="n">MyCoolContainer</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">input_range</span><span class="w"> </span><span class="k">auto</span><span class="o">&&</span><span class="w"> </span><span class="n">range</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="o">&&</span><span class="w"> </span><span class="n">item</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">range</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// process the item</span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">};</span><span class="w"></span>
</code></pre></div>
<p>The syntax <code>concept-name auto</code> for the parameter type reminds us that concepts aren’t types; this
is still, under the hood, a template function that’s performing argument type deduction (hence the
<code>auto</code>). In other words, the above is syntactic sugar for:</p>
<div class="codehilite"><pre><span></span><code><span class="k">template</span><span class="w"> </span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">input_range</span><span class="w"> </span><span class="n">R</span><span class="o">></span><span class="w"></span>
<span class="k">explicit</span><span class="w"> </span><span class="n">MyCoolContainer</span><span class="p">(</span><span class="n">R</span><span class="o">&&</span><span class="w"> </span><span class="n">range</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// ...</span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>which is in turn sugar for:</p>
<div class="codehilite"><pre><span></span><code><span class="k">template</span><span class="w"> </span><span class="o"><</span><span class="k">typename</span><span class="w"> </span><span class="nc">R</span><span class="o">></span><span class="w"></span>
<span class="k">requires</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">input_range</span><span class="o"><</span><span class="n">R</span><span class="o">></span><span class="p">)</span><span class="w"></span>
<span class="k">explicit</span><span class="w"> </span><span class="n">MyCoolContainer</span><span class="p">(</span><span class="n">R</span><span class="o">&&</span><span class="w"> </span><span class="n">range</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// ...</span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>I prefer the shorthand <code>std::ranges::input_range auto</code> syntax, but <del>at the time of this writing
MSVC’s support for it is still shaky</del>. (<em>Update: fixed in 16.10!</em> 😊) If in doubt, use
the syntax <code>template <std::ranges::input_range R></code>.</p>
<p>In any case, constraining the parameter type to satisfy <code>input_range</code> allows this constructor
overload to accept anything out there that implements <code>begin</code>, <code>end</code>, and iterators, as we’ve seen
in previous sections. You can then iterate over it generically and do whatever you want with the
results.</p>
<p>The range parameter is declared as <code>auto&&</code> to make it a <a href="https://isocpp.org/blog/2012/11/universal-references-in-c11-scott-meyers">universal reference</a>,
meaning that it can accept either lvalues or rvalues; in particular, it can accept the result of a
function call returning a range, and it can accept the result of a pipeline:</p>
<div class="codehilite"><pre><span></span><code><span class="n">MyCoolContainer</span><span class="w"> </span><span class="n">c</span><span class="p">{</span><span class="w"> </span><span class="n">another_range</span><span class="w"> </span><span class="o">|</span><span class="w"></span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">views</span><span class="o">::</span><span class="n">transform</span><span class="p">(</span><span class="n">blah</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="w"></span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">views</span><span class="o">::</span><span class="n">filter</span><span class="p">(</span><span class="n">blah</span><span class="p">)</span><span class="w"> </span><span class="p">};</span><span class="w"></span>
</code></pre></div>
<p>A completely generic range-accepting method like this might not be the most useful thing. If we have
a container storing <code>int</code> values, for example, it wouldn’t make a lot of sense for us to accept
ranges of strings or other arbitrary types. We’d like to be able to put some additional constraints
on the <em>element type</em> of the range: perhaps we only want element types that are convertible to <code>int</code>.</p>
<p>Helpfully, the ranges library provides a template <a href="https://en.cppreference.com/w/cpp/ranges/iterator_t"><code>range_value_t</code></a>
that retrieves the element type of a range—namely, the <code>value_type</code> declared by the range’s
iterator. With this, we can state additional constraints like so:</p>
<div class="codehilite"><pre><span></span><code><span class="k">explicit</span><span class="w"> </span><span class="n">MyCoolContainer</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">input_range</span><span class="w"> </span><span class="k">auto</span><span class="o">&&</span><span class="w"> </span><span class="n">range</span><span class="p">)</span><span class="w"></span>
<span class="k">requires</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">convertible_to</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">range_value_t</span><span class="o"><</span><span class="k">decltype</span><span class="p">(</span><span class="n">range</span><span class="p">)</span><span class="o">></span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="o">></span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// ...</span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>We can even define a concept that wraps up these requirements:</p>
<div class="codehilite"><pre><span></span><code><span class="k">template</span><span class="w"> </span><span class="o"><</span><span class="k">typename</span><span class="w"> </span><span class="nc">R</span><span class="p">,</span><span class="w"> </span><span class="k">typename</span><span class="w"> </span><span class="nc">T</span><span class="o">></span><span class="w"></span>
<span class="k">concept</span><span class="w"> </span><span class="nc">input_range_of</span><span class="w"> </span><span class="o">=</span><span class="w"></span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">input_range</span><span class="o"><</span><span class="n">R</span><span class="o">></span><span class="w"> </span><span class="o">&&</span><span class="w"></span>
<span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">convertible_to</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">range_value_t</span><span class="o"><</span><span class="n">R</span><span class="o">></span><span class="p">,</span><span class="w"> </span><span class="n">T</span><span class="o">></span><span class="p">;</span><span class="w"></span>
</code></pre></div>
<p>and then use it as follows:</p>
<div class="codehilite"><pre><span></span><code><span class="k">explicit</span><span class="w"> </span><span class="n">MyCoolContainer</span><span class="p">(</span><span class="n">input_range_of</span><span class="o"><</span><span class="kt">int</span><span class="o">></span><span class="w"> </span><span class="k">auto</span><span class="o">&&</span><span class="w"> </span><span class="n">range</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// ...</span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Something like this should be in the standard library, IMO.</p>
<p>You can also choose to require one of the more specialized concepts, like <code>forward_range</code> or
<code>random_access_range</code>, if you need those extra capabilities for whatever you’re doing.
However, just as a container should generally implement the most <em>specific</em> range concept it can
provide, a function that takes a range parameter should generally require the most <em>general</em> range
concept it can deal with, or it will unduly restrict what kind of ranges can be passed to it.</p>
<p>That said, there might be cases where you can switch to a more efficient implementation if the range
satisfies some extra requirements. For example, if it’s a <code>sized_range</code>, then you might be able to
reserve storage before inserting the elements. You can test for this inside your function body using
<code>if constexpr</code>:</p>
<div class="codehilite"><pre><span></span><code><span class="k">explicit</span><span class="w"> </span><span class="n">MyCoolContainer</span><span class="p">(</span><span class="n">input_range_of</span><span class="o"><</span><span class="kt">int</span><span class="o">></span><span class="w"> </span><span class="k">auto</span><span class="o">&&</span><span class="w"> </span><span class="n">range</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="k">constexpr</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">sized_range</span><span class="o"><</span><span class="k">decltype</span><span class="p">(</span><span class="n">range</span><span class="p">)</span><span class="o">></span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">reserve</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">size</span><span class="p">(</span><span class="n">range</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="o">&&</span><span class="w"> </span><span class="n">item</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">range</span><span class="p">)</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// process the item</span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>Here, <a href="https://en.cppreference.com/w/cpp/ranges/size"><code>std::ranges::size</code></a> is a convenience wrapper
that knows how to call the range’s associated <code>size</code> function, whether it’s implemented as a method
or a free function.</p>
<p>You could also do things like: check if the range is a <code>contiguous_range</code> and the item is something
trivially copyable, and switch to <code>memcpy</code> rather than iterating over all the items.</p>
<h3 id="output-iterators"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#output-iterators" title="Permalink to this section">Output Iterators</a></h3>
<p>Range views and pipelines operate on a “pull” model, where the pipeline is represented by a proxy
range object that generates its results lazily when you iterate it. Taking generic range objects as
parameters to your container is an easy and useful way to consume such objects, and that probably suffices
for most uses. However, there are a handful of bits in the ranges library that operate on a “push”
model, where you call a function that wants to store values into your container via an output
iterator. This comes up with <a href="https://en.cppreference.com/w/cpp/algorithm/ranges#Modifying_sequence_operations">certain ranges algorithms</a>
like <code>ranges::copy</code>, <code>ranges::transform</code>, and <code>ranges::generate</code>.</p>
<p>Personally, I don’t see a hugely compelling reason to worry about these, as it’s also possible to
use views to express the same operations; but for the sake of completeness, I’ll discuss them
briefly here.</p>
<p>At this point, it won’t surprise you to learn that just as there were concepts for input ranges,
there are also concepts <code>std::ranges::output_range</code> and <a href="https://en.cppreference.com/w/cpp/iterator/output_iterator"><code>std::output_iterator</code></a>.
In this case there’s just that one concept, not a hierarchy of refinements of them; however, if you
peruse the definitions of some of the ranges algorithms, you’ll find that many of them don’t actually
use <code>output_iterator</code>, but state slightly different, less- or more-specific requirements of their
own. (This part of the standard library feels a little less fully baked than the rest; I wouldn’t be
surprised if some of this gets elaborated or polished a bit more in C++23 or later revisions.)</p>
<p>The requirements for an output iterator (broadly construed) are very similar to those for an input
iterator, only adding that the value returned by dereferencing the iterator must be writable by
assigning to it: you must be able to do <code>*iter = foo;</code> for some appropriate type of <code>foo</code>. If you’ve
implemented a non-const input iterator, it probably satisfies the requirement already.</p>
<p>It’s also possible to do slightly more exotic things with an output iterator, like returning a proxy
object that accepts assignment and does “something” with the value assigned. An example of this is
the STL’s <a href="https://en.cppreference.com/w/cpp/iterator/back_insert_iterator"><code>std::back_insert_iterator</code></a>,
which takes whatever is assigned to it and <em>appends</em> to its container (as opposed to overwriting an
existing value in the container). The STL has a few more things like that, including an iterator
that writes characters out to an <code>ostream</code>.</p>
<p>There are also some cases amongst the ranges algorithms of “input-output” iterators, such as for
operations that reorder a range in place, like sorting. These often have a bidirectional or
random-access iterator requirement, plus needing the dereferenced types to be swappable, movable,
and varying other constraints. Those details probably aren’t going to be relevant to you unless
you’re doing something tricky, like making a container that generates elements on the fly somehow,
or returns proxy objects rather than direct references to elements (like <code>std::vector<bool></code>).</p>
<h2 id="conclusion"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#conclusion" title="Permalink to this section">Conclusion</a></h2>
<p>The C++20 ranges library provides a lot of powerful, composable tools for manipulating sequences of
objects, and a range of specificity from the most generic and abstract container-shaped things down
to the very concrete, efficient, and practical. When working with your own container types, it
would be nice to be able to take advantage of these tools.</p>
<p>As we’ve seen, it’s hardly an onerous task to implement ranges compatibility for your own containers.
Most of the necessaries are things you were probably already doing: you probably already had an
iterator class and begin/end methods. It only takes a little bit of attention to satisfying certain
details—like adding the <code>difference_type</code> and <code>value_type</code> aliases, and making sure you can both
preincrement and postincrement—to make your iterators satisfy the STL iterator concepts, and thus
have your containers recognized as ranges. It’s also no sweat to write functions accepting generic
ranges as input, letting you store the output of other range operations into your container.</p>
<p>I hope this has been a useful peek under the hood and has given you some ideas about how your
container classes can benefit from the new C++20 features.</p>Python-Like enumerate() In C++17
https://www.reedbeta.com/blog/python-like-enumerate-in-cpp17/
http://reedbeta.com/blog/python-like-enumerate-in-cpp17/Nathan ReedSat, 24 Nov 2018 22:42:04 -0800https://www.reedbeta.com/blog/python-like-enumerate-in-cpp17/#commentsCoding<p>Python has a handy built-in function called <a href="https://docs.python.org/3/library/functions.html?highlight=enumerate#enumerate"><code>enumerate()</code></a>,
which lets you iterate over an object (e.g. a list) and have access to both the <em>index</em> and the
<em>item</em> in each iteration. You use it in a <code>for</code> loop, like this:</p>
<div class="codehilite"><pre><span></span><code><span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">thing</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">listOfThings</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"The </span><span class="si">%d</span><span class="s2">th thing is </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">thing</span><span class="p">))</span>
</code></pre></div>
<p>Iterating over <code>listOfThings</code> directly would give you <code>thing</code>, but not <code>i</code>, and there are plenty of
situations where you’d want both (looking up the index in another data structure, progress reports,
error messages, generating output filenames, etc).</p>
<p>C++ <a href="https://en.cppreference.com/w/cpp/language/range-for">range-based <code>for</code> loops</a> work a lot like
Python’s <code>for</code> loops. Can we implement an analogue of Python’s <code>enumerate()</code> in C++? We can!</p>
<!--more-->
<p>C++17 added <a href="https://en.cppreference.com/w/cpp/language/structured_binding">structured bindings</a>
(also known as “destructuring” in other languages), which allow you to pull apart a tuple type and
assign the pieces to different variables, in a single statement. It turns out that this is also
allowed in range <code>for</code> loops. If the iterator returns a tuple, you can pull it apart and assign the
pieces to different loop variables.</p>
<p>The syntax for this looks like:</p>
<div class="codehilite"><pre><span></span><code><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">ThingA</span><span class="p">,</span><span class="w"> </span><span class="n">ThingB</span><span class="o">>></span><span class="w"> </span><span class="n">things</span><span class="p">;</span><span class="w"></span>
<span class="p">...</span><span class="w"></span>
<span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">]</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">things</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// a gets the ThingA and b gets the ThingB from each tuple</span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>So, we can implement <code>enumerate()</code> by creating an iterable object that wraps another iterable and
generates the indices during iteration. Then we can use it like this:</p>
<div class="codehilite"><pre><span></span><code><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">Thing</span><span class="o">></span><span class="w"> </span><span class="n">things</span><span class="p">;</span><span class="w"></span>
<span class="p">...</span><span class="w"></span>
<span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">thing</span><span class="p">]</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">enumerate</span><span class="p">(</span><span class="n">things</span><span class="p">))</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="c1">// i gets the index and thing gets the Thing in each iteration</span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>The implementation of <code>enumerate()</code> is pretty short, and I present it here for your use:</p>
<div class="codehilite"><pre><span></span><code><span class="cp">#include</span><span class="w"> </span><span class="cpf"><tuple></span><span class="cp"></span>
<span class="k">template</span><span class="w"> </span><span class="o"><</span><span class="k">typename</span><span class="w"> </span><span class="nc">T</span><span class="p">,</span><span class="w"></span>
<span class="w"> </span><span class="k">typename</span><span class="w"> </span><span class="nc">TIter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">decltype</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">begin</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">declval</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">())),</span><span class="w"></span>
<span class="w"> </span><span class="k">typename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">decltype</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">end</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">declval</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">()))</span><span class="o">></span><span class="w"></span>
<span class="k">constexpr</span><span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">enumerate</span><span class="p">(</span><span class="n">T</span><span class="w"> </span><span class="o">&&</span><span class="w"> </span><span class="n">iterable</span><span class="p">)</span><span class="w"></span>
<span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">iterator</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">i</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">TIter</span><span class="w"> </span><span class="n">iter</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="k">operator</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">iterator</span><span class="w"> </span><span class="o">&</span><span class="w"> </span><span class="n">other</span><span class="p">)</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">iter</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">other</span><span class="p">.</span><span class="n">iter</span><span class="p">;</span><span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="k">operator</span><span class="w"> </span><span class="o">++</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="o">++</span><span class="n">i</span><span class="p">;</span><span class="w"> </span><span class="o">++</span><span class="n">iter</span><span class="p">;</span><span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="k">operator</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">tie</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">iter</span><span class="p">);</span><span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">};</span><span class="w"></span>
<span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">iterable_wrapper</span><span class="w"></span>
<span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="n">T</span><span class="w"> </span><span class="n">iterable</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">begin</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">iterator</span><span class="p">{</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">begin</span><span class="p">(</span><span class="n">iterable</span><span class="p">)</span><span class="w"> </span><span class="p">};</span><span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">end</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">iterator</span><span class="p">{</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">end</span><span class="p">(</span><span class="n">iterable</span><span class="p">)</span><span class="w"> </span><span class="p">};</span><span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">};</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">iterable_wrapper</span><span class="p">{</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">forward</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">iterable</span><span class="p">)</span><span class="w"> </span><span class="p">};</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</code></pre></div>
<p>This uses SFINAE to ensure it can only be applied to iterable types, and will generate readable
error messages if used on something else. It accepts its parameter as an rvalue reference so you can
apply it to temporary values (e.g. directly to the return value of a function call) as well as to
variables and members.</p>
<p>This compiles without warnings in C++17 mode on gcc 8.2, clang 6.0, and MSVC 15.9. I’ve banged on it
a bit to ensure it doesn’t incur any extra copies, and it works as expected with either const or
non-const containers. It seems to optimize away pretty cleanly, too! 🤘</p>Using A Custom Toolchain In Visual Studio With MSBuild
https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/
http://reedbeta.com/blog/custom-toolchain-with-msbuild/Nathan ReedTue, 20 Nov 2018 13:34:01 -0800https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#commentsCoding<p>Like many of you, when I work on a graphics project I sometimes have a need to compile some shaders.
Usually, I’m writing in C++ using Visual Studio, and I’d like to get my shaders built using the
same workflow as the rest of my code. Visual Studio these days has built-in support for HLSL via
<code>fxc</code>, but what if we want to use the next-gen <a href="https://github.com/Microsoft/DirectXShaderCompiler"><code>dxc</code></a>
compiler?</p>
<p>This post is a how-to for adding support for a custom toolchain—such as <code>dxc</code>, or any other
command-line-invokable tool—to a Visual Studio project, by scripting MSBuild (the underlying build
system Visual Studio uses). We won’t quite make it to parity with a natively integrated language,
but we’re going to get as close as we can.</p>
<!--more-->
<p>If you don’t want to read all the explanation but just want some working code to look at, jump down
to the <a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#example-project">Example Project</a> section.</p>
<p>This article is written against Visual Studio 2017, but it may also work in some earlier VSes
(I haven’t tested).</p>
<div class="toc">
<ul>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#msbuild">MSBuild</a></li>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#adding-a-custom-target">Adding A Custom Target</a></li>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#invoking-the-tool">Invoking The Tool</a></li>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#incremental-builds">Incremental Builds</a></li>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#header-dependencies">Header Dependencies</a></li>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#errorwarning-parsing">Error/Warning Parsing</a></li>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#example-project">Example Project</a></li>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#the-next-level">The Next Level</a></li>
</ul>
</div>
<h2 id="msbuild"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#msbuild" title="Permalink to this section">MSBuild</a></h2>
<p>Before we begin, it’s important you understand what we’re getting into. Not to mince words, but
MSBuild is a <a href="http://wiki.c2.com/?StringlyTyped">stringly typed</a>, semi-documented, XML-guzzling,
paradigmatically muddled, cursed hellmaze. However, it <em>does</em> ship with Visual Studio, so if you
can use it for your custom build steps, then you don’t need to deal with any extra add-ins or
software installs.</p>
<p>To be fair, MSBuild is <a href="https://github.com/Microsoft/msbuild">open-source on GitHub</a>, so at least
in principle you can dive into it and see what the cursed hellmaze is doing. However, I’ll warn you
up front that many of the most interesting parts vis-à-vis Visual Studio integration are <em>not</em>
included in the Git repo, but are hidden away in VS’s build extension DLLs. (More about that later.)</p>
<p>My jumping-off point for this enterprise was <a href="http://miken-1gam.blogspot.com/2013/01/visual-studio-and-custom-build-rules.html">this blog post by Mike Nicolella</a>.
Mike showed how to set up an MSBuild <code>.targets</code> file to create an association between a specific file
extension in your project, and a build rule (“target”, in MSBuild parlance) to process those files.
We’ll review how that works, then extend it and jazz it up a bit to get some more quality-of-life
features.</p>
<p>MSBuild docs (such as they are) can be found <a href="https://docs.microsoft.com/en-us/visualstudio/msbuild/msbuild?view=vs-2017">on MSDN here</a>.
Some more information can be gleaned by looking at the C++ build rules installed with Visual
Studio; on my machine they’re in <code>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets</code>.
For example, the file <code>Microsoft.CppCommon.targets</code> in that directory contains most of the target
definitions for C++ compilation, linking, resources and manifests, and so on.</p>
<h2 id="adding-a-custom-target"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#adding-a-custom-target" title="Permalink to this section">Adding A Custom Target</a></h2>
<p>As shown in Mike’s blog post, we can define our own build rule using a couple of XML files which
will be imported into the VS project. (I’ll keep using shader compilation with <code>dxc</code> as my running
example, but this approach can be adapted for a lot of other things, too.)</p>
<p>First, create a file <code>dxc.targets</code>—in your project directory, or really anywhere—containing
the following:</p>
<div class="codehilite"><pre><span></span><code><span class="cp"><?xml version="1.0" encoding="utf-8"?></span>
<span class="nt"><Project</span> <span class="na">xmlns=</span><span class="s">"http://schemas.microsoft.com/developer/msbuild/2003"</span><span class="nt">></span>
<span class="nt"><ItemGroup></span>
<span class="cm"><!-- Include definitions from dxc.xml, which defines the DXCShader item. --></span>
<span class="nt"><PropertyPageSchema</span> <span class="na">Include=</span><span class="s">"$(MSBuildThisFileDirectory)dxc.xml"</span> <span class="nt">/></span>
<span class="cm"><!-- Hook up DXCShader items to be built by the DXC target. --></span>
<span class="nt"><AvailableItemName</span> <span class="na">Include=</span><span class="s">"DXCShader"</span><span class="nt">></span>
<span class="nt"><Targets></span>DXC<span class="nt"></Targets></span>
<span class="nt"></AvailableItemName></span>
<span class="nt"></ItemGroup></span>
<span class="nt"><Target</span>
<span class="na">Name=</span><span class="s">"DXC"</span>
<span class="na">Condition=</span><span class="s">"'@(DXCShader)' != ''"</span>
<span class="na">BeforeTargets=</span><span class="s">"ClCompile"</span><span class="nt">></span>
<span class="nt"><Message</span> <span class="na">Importance=</span><span class="s">"High"</span> <span class="na">Text=</span><span class="s">"Building shaders!!!"</span> <span class="nt">/></span>
<span class="nt"></Target></span>
<span class="nt"></Project></span>
</code></pre></div>
<p>And another file <code>dxc.xml</code> containing:</p>
<div class="codehilite"><pre><span></span><code><span class="cp"><?xml version="1.0" encoding="utf-8"?></span>
<span class="nt"><ProjectSchemaDefinitions</span> <span class="na">xmlns=</span><span class="s">"http://schemas.microsoft.com/build/2009/properties"</span><span class="nt">></span>
<span class="cm"><!-- Associate DXCShader item type with .hlsl files --></span>
<span class="nt"><ItemType</span> <span class="na">Name=</span><span class="s">"DXCShader"</span> <span class="na">DisplayName=</span><span class="s">"DXC Shader"</span> <span class="nt">/></span>
<span class="nt"><ContentType</span> <span class="na">Name=</span><span class="s">"DXCShader"</span> <span class="na">ItemType=</span><span class="s">"DXCShader"</span> <span class="na">DisplayName=</span><span class="s">"DXC Shader"</span> <span class="nt">/></span>
<span class="nt"><FileExtension</span> <span class="na">Name=</span><span class="s">".hlsl"</span> <span class="na">ContentType=</span><span class="s">"DXCShader"</span> <span class="nt">/></span>
<span class="nt"></ProjectSchemaDefinitions></span>
</code></pre></div>
<p>Let’s pause for a moment and take stock of what’s going on here. First, we’re creating a new “item
type”, called <code>DXCShader</code>, and associating it with the extension <code>.hlsl</code>. That way, any files we
add to our project with that extension will automatically have this item type applied.</p>
<p>Second, we’re instructing MSBuild that <code>DXCShader</code> items are to be built with the <code>DXC</code> target, and
we’re defining what that target does. For now, all it does is print a message in the build output,
but we’ll get it doing some actual work shortly.</p>
<p>A few miscellaneous syntax notes:</p>
<ul>
<li>Yes, you need two separate files. No, there’s no way to combine them, AFAICT. This is just the
way MSBuild works.</li>
<li>The syntax <code>@(DXCShader)</code> means “the list of all <code>DXCShader</code> items in the project”. The <code>Condition</code>
attribute on a target says under what conditions that target should execute: if the condition is
false, the target is skipped. Here, we’re executing the target if the list <code>@(DXCShader)</code> is non-empty.</li>
<li><code>BeforeTargets="ClCompile"</code> means this target will run before the <code>ClCompile</code> target, i.e. before
C/C++ source files are compiled with <code>cl.exe</code>. This is because we’re going to output our shader
bytecode to headers which will get included into C++, so the shader compile step needs to run
earlier.</li>
<li><code>Importance="High"</code> is needed on the <code><Message></code> task for it to show up in the VS IDE on the
default verbosity setting. Lower importances will be masked unless you turn up the verbosity.</li>
</ul>
<p>To get this into your project, in the VS IDE right-click the project → Build Dependencies… → Build Customizations,
then click “Find Existing” and point it at <code>dxc.targets</code>. Alternatively, add this line to your <code>.vcxproj</code>
(as a child of the root <code><Project></code> element, doesn’t matter where):</p>
<div class="codehilite"><pre><span></span><code><span class="nt"><Import</span> <span class="na">Project=</span><span class="s">"dxc.targets"</span> <span class="nt">/></span>
</code></pre></div>
<p>Now, if you add a <code>.hlsl</code> file to your project it should automatically show up as type “DXC Shader”
in the properties; and when you build, you should see the message <code>Building shaders!!!</code> in the
output.</p>
<p>Incidentally, in <code>dxc.xml</code> you can also set up property pages that will show up in the VS IDE on
<code>DXCShader</code>-type files. This lets you define your own metadata and let users configure it per
file. I haven’t done this, but for example, you could have properties to indicate which shader
stages or profiles the file should be compiled for. The <code><Target></code> element can then have logic that refers
to those properties. Many examples of the XML to define property pages can be found in <code>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\1033</code>
(or a corresponding location depending on which version of VS you have). For example,
<code>custom_build_tool.xml</code> in that directory defines the properties for the built-in Custom Build
Tool item type.</p>
<h2 id="invoking-the-tool"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#invoking-the-tool" title="Permalink to this section">Invoking The Tool</a></h2>
<p>Okay, now it’s time to get our custom target to actually do something. Mike’s blog post used the MSBuild
<a href="https://docs.microsoft.com/en-us/visualstudio/msbuild/exec-task?view=vs-2017"><code><Exec></code> task</a> to
run a command on each source file. However, we’re going to take a different tack and use the
Visual Studio <code><CustomBuild></code> task instead.</p>
<p>The <code><CustomBuild></code> task is the same one that ends up getting executed if you manually set your
files to “Custom Build Tool” and fill in the command/inputs/outputs metadata in the property pages.
But instead of putting that in by hand, we’re going to set up our target to <em>generate</em> the metadata
and then pass it in to <code><CustomBuild></code>. Doing it this way is going to let us access a couple handy
features later that we wouldn’t get with the plain <code><Exec></code> task.</p>
<p>Add this inside the DXC <code><Target></code> element:</p>
<div class="codehilite"><pre><span></span><code><span class="cm"><!-- Setup metadata for custom build tool --></span>
<span class="nt"><ItemGroup></span>
<span class="nt"><DXCShader></span>
<span class="nt"><Message></span>%(Filename)%(Extension)<span class="nt"></Message></span>
<span class="nt"><Command></span>
"$(WDKBinRoot)\x86\dxc.exe" -T vs_6_0 -E vs_main %(Identity) -Fh %(Filename).vs.h -Vn %(Filename)_vs
"$(WDKBinRoot)\x86\dxc.exe" -T ps_6_0 -E ps_main %(Identity) -Fh %(Filename).ps.h -Vn %(Filename)_ps
<span class="nt"></Command></span>
<span class="nt"><Outputs></span>%(Filename).vs.h;%(Filename).ps.h<span class="nt"></Outputs></span>
<span class="nt"></DXCShader></span>
<span class="nt"></ItemGroup></span>
<span class="cm"><!-- Compile by forwarding to the Custom Build Tool infrastructure --></span>
<span class="nt"><CustomBuild</span> <span class="na">Sources=</span><span class="s">"@(DXCShader)"</span> <span class="nt">/></span>
</code></pre></div>
<p>Now, given some valid HLSL source files in the project, this will invoke <code>dxc.exe</code> twice on each
one—first compiling a vertex shader, then a pixel shader. The bytecode will be output as C arrays in
header files (<code>-Fh</code> option). I’ve just put the output headers in the main project directory, but
in production you’d probably want to put them in a subdirectory somewhere.</p>
<p>Let’s back up and look at the syntax in this snippet. First, the <code><ItemGroup><DXCShader></code> combo
basically says “iterate over the <code>DXCShader</code> items”, i.e. the HLSL source files in the project.
Then what we’re doing is adding metadata: each of the child elements—<code><Message></code>, <code><Command></code>,
and <code><Outputs></code>—becomes a metadata key/value pair attached to a <code>DXCShader</code>.</p>
<p>The <code>%(Foo)</code> syntax accesses item metadata (within a previously established context for “which item”,
which is here created by the iteration over the shaders). All MSBuild items have certain
<a href="https://docs.microsoft.com/en-us/visualstudio/msbuild/msbuild-well-known-item-metadata?view=vs-2017">built-in metadata</a>
like path, filename, and extension; we’re building on those to construct additional
metadata, in the format expected by the <code><CustomBuild></code> task. (It matches the metadata that would be
created if you set up the command line etc. manually in the Custom Build Tool property pages.)</p>
<p>Incidentally, the <code>$(WDKBinRoot)</code> variable (“property”, in MSBuild-ese) is the path to the Windows
SDK <code>bin</code> folder, where lots of tools like <code>dxc</code> live. It needs to be quoted because it can (and
usually does) contain spaces. You can find out these things by running MSBuild with “diagnostic”
verbosity (in VS, go to Tools → Options → Projects and Solutions → Build and Run → “MSBuild project
build output verbosity”)—this will spit out all the defined properties plus a ton of logging about
which targets are running and what they’re doing.</p>
<p>Finally, after setting up all the required metadata, we simply pass it to the <code><CustomBuild></code> task.
(This task isn’t part of core MSBuild, but is defined in <code>Microsoft.Build.CPPTasks.Common.dll</code>—an
extension plugin to MSBuild that comes with Visual Studio.) Again we see the <code>@(DXCShader)</code> syntax,
meaning to pass in the list of all <code>DXCShader</code> items in the project. Internally, <code><CustomBuild></code>
iterates over it and invokes your specified command lines.</p>
<h2 id="incremental-builds"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#incremental-builds" title="Permalink to this section">Incremental Builds</a></h2>
<p>At this point, we have a working custom build! We can simply add <code>.hlsl</code> files to our project, and
they’ll automatically be compiled by <code>dxc</code> as part of the build process, without us having to do
anything. <em>Hurrah!</em></p>
<p>However, while working with this setup you will notice a couple of problems.</p>
<ol>
<li>When you modify an HLSL source file, Visual Studio will <em>not</em> reliably detect that it
needs to recompile it. If the project was up-to-date before, hitting Build will do nothing!
However, if you have also modified something else (such as a C++ source file), <em>then</em> the build
will pick up the shaders in addition.</li>
<li>Anytime anything else gets built, <em>all</em> the shaders get built. In other words, MSBuild doesn’t
yet understand that if an individual shader is already up-to-date then it can be skipped.</li>
</ol>
<p>Fortunately, we can easily fix these. But first, why are these problems happening at all?</p>
<p>VS and MSBuild depend on <a href="https://docs.microsoft.com/en-us/visualstudio/extensibility/visual-cpp-project-extensibility?view=vs-2017#tlog-files"><code>.tlog</code> (tracker log) files</a>
to cache information about source file dependencies and efficiently determine whether a build is
up-to-date. Somewhere inside your build output directory there will be a folder full of these logs,
listing what source files have gotten built, what inputs they depended on (e.g. headers), and
what outputs they generated (e.g. object files). The problem is that our custom target isn’t
producing any <code>.tlog</code>s.</p>
<p>Conveniently for us, the <code><CustomBuild></code> task supports <code>.tlog</code> handling right out of the box; we
just have to turn it on! Change the <code><CustomBuild></code> invocation in the targets file to this:</p>
<div class="codehilite"><pre><span></span><code><span class="cm"><!-- Compile by forwarding to the Custom Build Tool infrastructure,</span>
<span class="cm"> so it will take care of .tlogs --></span>
<span class="nt"><CustomBuild</span>
<span class="na">Sources=</span><span class="s">"@(DXCShader)"</span>
<span class="na">MinimalRebuildFromTracking=</span><span class="s">"true"</span>
<span class="na">TrackerLogDirectory=</span><span class="s">"$(TLogLocation)"</span> <span class="nt">/></span>
</code></pre></div>
<p>That’s all there is to it—now, modified HLSL files will be properly detected and rebuilt, and
<em>unmodified</em> ones will be properly detected and <em>not</em> rebuilt. This also takes care of deleting the
previous output files when you do a clean build. This is one reason to prefer using the <code><CustomBuild></code>
task rather than the simpler <code><Exec></code> task (we’ll see another reason a bit later).</p>
<p><em>Thanks to Olga Arkhipova at Microsoft for helping me figure out this part!</em></p>
<h2 id="header-dependencies"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#header-dependencies" title="Permalink to this section">Header Dependencies</a></h2>
<p>Now that we have dependencies hooked up for our custom toolchain, a logical next step is to look
into how we can specify extra input dependencies—so that our shaders can have <code>#include</code>s, for
example, and modifications to the headers will automatically trigger rebuilds properly.</p>
<p>The good news is that yes, we can do this by adding an <code><AdditionalInputs></code> metadata key to our
<code>DXCShader</code> items. Files listed there will get registered as inputs in the <code>.tlog</code>, and the build
system will do the rest. The bad news is that there doesn’t seem to be an easy way to detect <em>on
a file-by-file level</em> which additional inputs are needed.</p>
<p>This is frustrating because Visual Studio actually includes a utility for tracking
file accesses in an external tool! It’s called <code>tracker.exe</code> and lives somewhere in your VS
installation. You give it a command line, and it’ll detect all files opened for reading by the
launched process (presumably by injecting a DLL and detouring <code>CreateFile()</code>, or something along
those lines). I believe this is what VS uses internally to track <code>#include</code>s for C++—and it
would be perfect if we could get access to the same functionality for custom toolchains as well.</p>
<p>Unfortunately, the <code><CustomBuild></code> task <em>explicitly disables</em> this tracking functionality. I was
able to find this out by using <a href="https://github.com/icsharpcode/ILSpy">ILSpy</a> to decompile the
<code>Microsoft.Build.CPPTasks.Common.dll</code>. It’s a .NET assembly, so it decompiles pretty cleanly, and
you can examine the innards of the <code>CustomBuild</code> class. It contains this snippet, in the
<code>ExecuteTool()</code> method:</p>
<div class="codehilite"><pre><span></span><code><span class="kt">bool</span><span class="w"> </span><span class="n">trackFileAccess</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">base</span><span class="p">.</span><span class="n">TrackFileAccess</span><span class="p">;</span><span class="w"></span>
<span class="k">base</span><span class="p">.</span><span class="n">TrackFileAccess</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">false</span><span class="p">;</span><span class="w"></span>
<span class="n">num</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">base</span><span class="p">.</span><span class="n">TrackerExecuteTool</span><span class="p">(</span><span class="n">pathToTool2</span><span class="p">,</span><span class="w"> </span><span class="n">responseFileCommands</span><span class="p">,</span><span class="w"> </span><span class="n">commandLineCommands</span><span class="p">);</span><span class="w"></span>
<span class="k">base</span><span class="p">.</span><span class="n">TrackFileAccess</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">trackFileAccess</span><span class="p">;</span><span class="w"></span>
</code></pre></div>
<p>That is, it’s turning off file access tracking before calling the base class
method that would otherwise invoke the tracker. I’m sure there’s a reason why they did that, but
sadly it’s stymied my attempts to get automatic <code>#include</code> tracking to work for shaders.</p>
<p>(We could also invoke <code>tracker.exe</code> manually in our command line, but then we face the problem of
merging the tracker-generated <code>.tlog</code> into that of the <code><CustomBuild></code> task. They’re just text files,
so it’s potentially doable…but that is <em>way</em> more programming than I’m prepared to attempt in an
XML-based scripting language.)</p>
<p>Although we can’t get fine-grained file-by-file header dependencies, we can still set up <em>conservative</em>
dependencies by making every HLSL source file depend on every header. This will result in rebuilding
all the shaders whenever any header is modified—but better to rebuild too much than not enough.
We can find all the headers using a wildcard pattern and an <code><ItemGroup></code>. Add this to the DXC
<code><Target></code>, before the “setup metadata” section:</p>
<div class="codehilite"><pre><span></span><code><span class="cm"><!-- Find all shader headers (.hlsli files) --></span>
<span class="nt"><ItemGroup></span>
<span class="nt"><ShaderHeader</span> <span class="na">Include=</span><span class="s">"*.hlsli"</span> <span class="nt">/></span>
<span class="nt"></ItemGroup></span>
<span class="nt"><PropertyGroup></span>
<span class="nt"><ShaderHeaders></span>@(ShaderHeader)<span class="nt"></ShaderHeaders></span>
<span class="nt"></PropertyGroup></span>
</code></pre></div>
<p>You could also set this to find <code>.h</code> files under a <code>Shaders</code> subdirectory, or whatever you prefer.
The <code>**</code> wildcard is available for recursively searching subdirectories, too.</p>
<p>Then add this inside the <code><ItemGroup><DXCShader></code> section:</p>
<div class="codehilite"><pre><span></span><code><span class="nt"><AdditionalInputs></span>$(ShaderHeaders)<span class="nt"></AdditionalInputs></span>
</code></pre></div>
<p>We have to do a little dance here, first forming the <code>ShaderHeader</code> item list, then expanding it
into the <code>ShaderHeaders</code> <em>property</em>, and finally referencing that in the metadata. I’m not sure why,
but if I try to use <code>@(ShaderHeader)</code> directly in the metadata it just comes out blank. Perhaps
it’s not allowed to have nested iteration over item lists in MSBuild.</p>
<p>In any case, after making these changes and rebuilding, the build should now pick up any changes to
shader headers. <em>Woohoo!</em></p>
<h2 id="errorwarning-parsing"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#errorwarning-parsing" title="Permalink to this section">Error/Warning Parsing</a></h2>
<p>There’s just one more bit of sparkle we can easily add. When you compile C++ and you get an error
or warning, the VS IDE recognizes it and produces a clickable link that takes you to the source
location. If a custom build step emits error messages in the same format, they’ll be picked up as
well—but what if your custom toolchain has a different format?</p>
<p>The <code>dxc</code> compiler emits errors and warnings in gcc/clang format, looking something like this:</p>
<div class="codehilite"><pre><span></span><code>Shader.hlsl:12:15: error: cannot convert from 'float3' to 'float4'
</code></pre></div>
<p>It turns out that Visual Studio already does recognize this format (at least as of version 15.9),
which is great! But if it didn’t, or in case you’ve got a tool with some other message format, it turns
out you can provide a regular expression to find errors and warnings in the tool output. The regex
can even supply source file/line information, and the errors will become clickable in the IDE, just
as with C++. (This is all <em>totally undocumented</em> and I only know about it because I spotted the
code while browsing through the decompiled CPPTasks DLL. If you want to take a look for yourself,
the juicy bit is the <code>VCToolTask.ParseLine()</code> method.)</p>
<p>This will use <a href="https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference">.NET regex syntax</a>,
and in particular, expects a certain set of <a href="https://docs.microsoft.com/en-us/dotnet/standard/base-types/grouping-constructs-in-regular-expressions#named_matched_subexpression">named captures</a>
to provide metadata. By way of example, here’s the regex I wrote for gcc/clang-format errors:</p>
<div class="codehilite"><pre><span></span><code>(?'FILENAME'.+):(?'LINE'\d+):(?'COLUMN'\d+): (?'CATEGORY'error|warning): (?'TEXT'.*)
</code></pre></div>
<p><code>FILENAME</code>, <code>LINE</code>, etc. are the names the parsing code expects for the metadata. There’s one more
I didn’t use: <code>CODE</code>, for an error code (like <a href="https://docs.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2440?view=vs-2017">C2440</a>,
etc.). The only required one is <code>CATEGORY</code>, without which the message won’t be clickable (and it
must be one of the words “error”, “warning”, or “note”); all the others are optional.</p>
<p>To use it, pass the regex to the <code><CustomBuild></code> task like so:</p>
<div class="codehilite"><pre><span></span><code><span class="nt"><CustomBuild</span>
<span class="na">Sources=</span><span class="s">"@(DXCShader)"</span>
<span class="na">MinimalRebuildFromTracking=</span><span class="s">"true"</span>
<span class="na">TrackerLogDirectory=</span><span class="s">"$(TLogLocation)"</span>
<span class="na">ErrorListRegex=</span><span class="s">"(?'FILENAME'.+):(?'LINE'\d+):(?'COLUMN'\d+): (?'CATEGORY'error|warning): (?'TEXT'.*)"</span> <span class="nt">/></span>
</code></pre></div>
<h2 id="example-project"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#example-project" title="Permalink to this section">Example Project</a></h2>
<p>Here’s a complete VS2017 project with all the features we’ve discussed, a couple demo shaders, and a
C++ file that includes the compiled bytecode (just to show that works).</p>
<p><a class="biglink" href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/buildcust3.zip">Download Example Project (.zip, 4.3 KB)</a></p>
<p>And for completeness, here’s the final contents of <code>dxc.targets</code>:</p>
<div class="codehilite"><pre><span></span><code><span class="cp"><?xml version="1.0" encoding="utf-8"?></span>
<span class="nt"><Project</span> <span class="na">xmlns=</span><span class="s">"http://schemas.microsoft.com/developer/msbuild/2003"</span><span class="nt">></span>
<span class="nt"><ItemGroup></span>
<span class="cm"><!-- Include definitions from dxc.xml, which defines the DXCShader item. --></span>
<span class="nt"><PropertyPageSchema</span> <span class="na">Include=</span><span class="s">"$(MSBuildThisFileDirectory)dxc.xml"</span> <span class="nt">/></span>
<span class="cm"><!-- Hook up DXCShader items to be built by the DXC target. --></span>
<span class="nt"><AvailableItemName</span> <span class="na">Include=</span><span class="s">"DXCShader"</span><span class="nt">></span>
<span class="nt"><Targets></span>DXC<span class="nt"></Targets></span>
<span class="nt"></AvailableItemName></span>
<span class="nt"></ItemGroup></span>
<span class="nt"><Target</span>
<span class="na">Name=</span><span class="s">"DXC"</span>
<span class="na">Condition=</span><span class="s">"'@(DXCShader)' != ''"</span>
<span class="na">BeforeTargets=</span><span class="s">"ClCompile"</span><span class="nt">></span>
<span class="nt"><Message</span> <span class="na">Importance=</span><span class="s">"High"</span> <span class="na">Text=</span><span class="s">"Building shaders!!!"</span> <span class="nt">/></span>
<span class="cm"><!-- Find all shader headers (.hlsli files) --></span>
<span class="nt"><ItemGroup></span>
<span class="nt"><ShaderHeader</span> <span class="na">Include=</span><span class="s">"*.hlsli"</span> <span class="nt">/></span>
<span class="nt"></ItemGroup></span>
<span class="nt"><PropertyGroup></span>
<span class="nt"><ShaderHeaders></span>@(ShaderHeader)<span class="nt"></ShaderHeaders></span>
<span class="nt"></PropertyGroup></span>
<span class="cm"><!-- Setup metadata for custom build tool --></span>
<span class="nt"><ItemGroup></span>
<span class="nt"><DXCShader></span>
<span class="nt"><Message></span>%(Filename)%(Extension)<span class="nt"></Message></span>
<span class="nt"><Command></span>
"$(WDKBinRoot)\x86\dxc.exe" -T vs_6_0 -E vs_main %(Identity) -Fh %(Filename).vs.h -Vn %(Filename)_vs
"$(WDKBinRoot)\x86\dxc.exe" -T ps_6_0 -E ps_main %(Identity) -Fh %(Filename).ps.h -Vn %(Filename)_ps
<span class="nt"></Command></span>
<span class="nt"><AdditionalInputs></span>$(ShaderHeaders)<span class="nt"></AdditionalInputs></span>
<span class="nt"><Outputs></span>%(Filename).vs.h;%(Filename).ps.h<span class="nt"></Outputs></span>
<span class="nt"></DXCShader></span>
<span class="nt"></ItemGroup></span>
<span class="cm"><!-- Compile by forwarding to the Custom Build Tool infrastructure,</span>
<span class="cm"> so it will take care of .tlogs and error/warning parsing --></span>
<span class="nt"><CustomBuild</span>
<span class="na">Sources=</span><span class="s">"@(DXCShader)"</span>
<span class="na">MinimalRebuildFromTracking=</span><span class="s">"true"</span>
<span class="na">TrackerLogDirectory=</span><span class="s">"$(TLogLocation)"</span>
<span class="na">ErrorListRegex=</span><span class="s">"(?'FILENAME'.+):(?'LINE'\d+):(?'COLUMN'\d+): (?'CATEGORY'error|warning): (?'TEXT'.*)"</span> <span class="nt">/></span>
<span class="nt"></Target></span>
<span class="nt"></Project></span>
</code></pre></div>
<h2 id="the-next-level"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#the-next-level" title="Permalink to this section">The Next Level</a></h2>
<p>At this point, we have a pretty usable MSBuild customization for compiling shaders, or using other
kinds of custom toolchains! I’m pretty happy with it. However, there’s still a couple of areas for
improvement.</p>
<ul>
<li>As mentioned before, I’d like to get file access tracking to work so we can have exact
dependencies for included files, rather than conservative (overly broad) dependencies.</li>
<li>I haven’t done anything with parallel building. Currently, <code><CustomBuild></code> tasks are run one at a
time. There <em>is</em> a <code><ParallelCustomBuild></code> task in the CPPTasks assembly…unfortunately, it
doesn’t support <code>.tlog</code> updating or the error/warning regex, so it’s not directly usable here.</li>
</ul>
<p>To obtain these features, I think I’d need to write my own build extension in C#, defining a custom
task and calling it in place of <code><CustomBuild></code> in the targets file. It might not be too hard to get
that working, but I haven’t attempted it yet.</p>
<p>In the meantime, now that the hard work of circumventing the weird gotchas and reverse-engineering
the undocumented innards has been done, it should be pretty easy to adapt this <code>.targets</code> setup to
other needs for code generation or external tools, and have them act mostly like first-class
citizens in our Visual Studio builds. Cheers!</p>Mesh Shader Possibilities
https://www.reedbeta.com/blog/mesh-shader-possibilities/
http://reedbeta.com/blog/mesh-shader-possibilities/Nathan ReedSat, 29 Sep 2018 11:42:26 -0700https://www.reedbeta.com/blog/mesh-shader-possibilities/#commentsCodingGPUGraphics<p>NVIDIA recently announced their latest GPU architecture, called Turing. Although its headlining feature is
<a href="https://arstechnica.com/gadgets/2018/08/microsoft-announces-the-next-step-in-gaming-graphics-directx-raytracing/">hardware-accelerated ray tracing</a>,
Turing also includes <a href="https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/">several other developments</a>
that look quite intriguing in their own right.</p>
<p>One of these is the new concept of <a href="https://devblogs.nvidia.com/introduction-turing-mesh-shaders/"><em>mesh shaders</em></a>,
details of which dropped a couple weeks ago—and the graphics programming community was agog, with many
enthusiastic discussions taking place on Twitter and elsewhere. So what are mesh shaders (and their
counterparts, task shaders), why are graphics programmers so excited about them, and what might we
be able to do with them?</p>
<!--more-->
<h2 id="the-gpu-geometry-pipeline-has-gotten-cluttered"><a href="https://www.reedbeta.com/blog/mesh-shader-possibilities/#the-gpu-geometry-pipeline-has-gotten-cluttered" title="Permalink to this section">The GPU Geometry Pipeline Has Gotten Cluttered</a></h2>
<p>The process of submitting geometry—triangles to be drawn—to the GPU has a simple underlying
paradigm: you put your vertices into a buffer, point the GPU at it, and issue a draw call to say
how many primitives to render. The vertices get slurped linearly out of the buffer, each is
processed by a vertex shader, the triangles are rasterized and shaded, and Bob’s your uncle.</p>
<p>But over decades of GPU development, various extra features have gotten bolted onto this basic pipeline
in the name of greater performance and efficiency. Indexed triangles and vertex caches were created to exploit
vertex reuse. Complex vertex stream format descriptions are needed to prepare data for shading.
Instancing, and later multi-draw, allowed certain sets of draw calls to be combined together;
indirect draws could be generated on the GPU itself. Then came
the extra shader stages: geometry shaders, to allow programmable operations on primitives and even
inserting or deleting primitives on the fly, and then tessellation shaders, letting you submit a
low-res mesh and dynamically subdivide it to a programmable level.</p>
<p>While these features and more were all added for good reasons (or at least what <em>seemed</em> like
good reasons at the time), the compound of all of them has become unwieldy. Which subset of the
many available options do you reach for in a given situation? Will your choice be efficient across
all the GPU architectures your software must run on?</p>
<p>Moreover, this elaborate pipeline is still not as flexible as we would sometimes like—or, where
flexible, it is not performant. Instancing can only draw copies of a single mesh at a time;
multi-draw is still inefficient for large numbers of small draws. Geometry shaders’ programming model is <a href="http://www.joshbarczak.com/blog/?p=667">not
conducive to efficient implementation</a> on wide SIMD cores in
GPUs, and its <a href="https://fgiesen.wordpress.com/2011/07/20/a-trip-through-the-graphics-pipeline-2011-part-10/">input/output buffering presents difficulties too</a>.
Hardware tessellation, though very handy for certain things, is often <a href="https://www.sebastiansylvan.com/post/the-problem-with-tessellation-in-directx-11/">difficult to use well</a>
due to the limited granularity at which you can set tessellation factors, the limited set of baked-in
<a href="/blog/tess-quick-ref/">tessellation modes</a>, and performance issues on some GPU architectures.</p>
<h2 id="simplicity-is-golden"><a href="https://www.reedbeta.com/blog/mesh-shader-possibilities/#simplicity-is-golden" title="Permalink to this section">Simplicity Is Golden</a></h2>
<p>Mesh shaders represent a radical simplification of the geometry pipeline. With a mesh shader
enabled, all the shader stages and fixed-function features described above are swept away. Instead, we get
a clean, straightforward pipeline using a compute-shader-like programming model. Importantly, this
new pipeline is both highly flexible—enough to handle the existing geometry tasks in a typical game,
plus enable new techniques that are challenging to do on the GPU today—<em>and</em> it looks
like it should be quite performance-friendly, with no apparent architectural barriers to efficient
GPU execution.</p>
<p>Like a compute shader, a mesh shader defines work groups of parallel-running threads, and they can
communicate via on-chip shared memory as well as <a href="http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2017/07/GDC2017-Wave-Programming-D3D12-Vulkan.pdf">wave intrinsics</a>.
In lieu of a draw call, the app launches some number of mesh shader work groups. Each work group
is responsible for writing out a small, self-contained chunk of geometry, called a
“meshlet”, expressed in arrays of vertex attributes and corresponding indices. These meshlets
then get tossed directly into the rasterizer, and Bob’s your uncle.</p>
<p>(More details can be found in <a href="https://devblogs.nvidia.com/introduction-turing-mesh-shaders/">NVIDIA’s blog post</a>,
a <a href="http://on-demand.gputechconf.com/siggraph/2018/video/sig1811-3-christoph-kubisch-mesh-shaders.html">talk by Christoph Kubisch</a>,
and the <a href="https://www.khronos.org/registry/OpenGL/extensions/NV/NV_mesh_shader.txt">OpenGL extension spec</a>.)</p>
<p>The appealing thing about this model is how data-driven and freeform it is. The mesh shader pipeline
has very relaxed expectations about the shape of your data and the kinds of things you’re doing to do.
Everything’s up to the programmer: you can pull the vertex and index data from buffers, generate
them algorithmically, or any combination.</p>
<p>At the same time, the mesh shader model sidesteps the issues that hampered geometry shaders, by explicitly embracing
SIMD execution (in the form of the compute “work group” abstraction). Instead of each shader <em>thread</em>
generating geometry on its own—which leads to divergence, and large input/output data sizes—we
have the whole work group outputting a meshlet cooperatively. This mean we can use
compute-style tricks, like: first do some work on the vertices in parallel, then have a barrier, then work on
the triangles in parallel. It also means the input/output bandwidth needs are a lot more reasonable.
And, because meshlets are indexed triangle lists, they don’t break vertex reuse, as geometry shaders often did.</p>
<h2 id="an-upgrade-path"><a href="https://www.reedbeta.com/blog/mesh-shader-possibilities/#an-upgrade-path" title="Permalink to this section">An Upgrade Path</a></h2>
<p>The other really neat thing about mesh shaders is that they don’t require you to drastically rework
how your game engine handles geometry to take advantage of them. It looks like it should be pretty
easy to convert most common geometry types to mesh shaders, making it an approachable upgrade path for
developers.</p>
<p>(You don’t have to convert <em>everything</em> to mesh shaders straight away, though; it’s possible
to switch between the old geometry pipeline and the new mesh-shader-based one at different points in
the frame.)</p>
<p>Suppose you have an ordinary authored mesh that you want to load and render. You’ll
need to break it up into meshlets, which have a static maximum size declared in the
shader—NVIDIA’s blog post recommends 64 vertices and 126 triangles as a default. How do we do this?</p>
<p>Fortunately, most game engines currently do some form of <a href="https://tomforsyth1000.github.io/papers/fast_vert_cache_opt.html">vertex cache optimization</a>,
which already organizes the primitives by locality—triangles sharing one or two vertices will tend
to be close together in the index buffer. So, a quite viable
strategy for creating meshlets is: just scan the index buffer linearly, accumulating the set of
vertices used, until you hit either 64 vertices or 126 triangles; reset and repeat until you’ve gone
through the whole mesh. This could be done at art build time, or it’s simple enough that you could even do it
in the engine at level load time.</p>
<p>Alternatively, vertex cache optimization algorithms can probably be modified to produce meshlets directly.
For GPUs without mesh shader support, you can concatenate all the meshlet vertex buffers
together, and rapidly generate a traditional index buffer by offsetting and concatenating all the
meshlet index buffers. It’s pretty easy to go back and forth.</p>
<p>In either case, the mesh shader would be mostly just acting as a vertex shader, with some extra
code to fetch vertex and index data from their buffers and plug them into the mesh outputs.</p>
<p>What about other kinds of geometry found in games?</p>
<p>Instanced draws are straightforward: multiply the meshlet count and put in a bit of
shader logic to hook up instance parameters. A more interesting case is multi-draw, where we want
to draw a lot of meshes that <em>aren’t</em> all copies of the same thing. For this, we can employ
<em>task shaders</em>—a secondary feature of the mesh shader pipeline. Task shaders
add an extra layer of compute-style work groups, running before the mesh shader, and they control
<em>how many</em> mesh shader work groups to launch. They can also write output variables to be consumed by the
mesh shader. A very efficient multi-draw should be possible by launching task shaders with a thread
per draw, which in turn launch the mesh shaders for all the individual draws.</p>
<p>If we need to draw a lot of <em>very</em> small meshes, such as quads for particles/imposters/text/point-based rendering,
or boxes for occlusion tests / projected decals and whatnot, then we can pack a bunch of them
into each mesh shader workgroup. The geometry can be generated entirely in-shader rather than relying
on a pre-initialized index buffer from the CPU. (This was one of the original use cases that, it was
hoped, could be done with geometry shaders—e.g. submitting point primitives, and having the GS expand them
into quads.) There’s also a lot of flexibility to do stuff with variable topology, like particle
beams/strips/ribbons, which would otherwise need to be generated either on the CPU or in a separate
compute pre-pass.</p>
<p>(By the way, the <em>other</em> original use case that, it was hoped, could be done with geometry shaders
was multi-view rendering: drawing the same geometry to, say, multiple faces of a cubemap or slices
of a cascaded shadow map within a single draw call. You could do that with mesh shaders, too—but
Turing actually has a separate hardware multi-view capability for these applications.)</p>
<p>What about tessellated meshes?</p>
<p>The two-layer structure of task and mesh shaders is broadly
similar to that of tessellation hull and domain shaders. While it doesn’t appear that mesh shaders
have any kind of access to the fixed-function tessellator unit, it’s also
not too hard to imagine that we could write code in task/mesh shaders to reproduce tessellation
functionality (or at least some of it). Figuring out the details would be a bit of a research project
for sure—maybe someone has already worked on this?—and perf would be a question mark. However,
we’d get the benefit of being able to <em>change</em> how tessellation works, instead of being stuck with
whatever Microsoft decided on in the late 2000s.</p>
<h2 id="new-possibilities"><a href="https://www.reedbeta.com/blog/mesh-shader-possibilities/#new-possibilities" title="Permalink to this section">New Possibilities</a></h2>
<p>It’s great that mesh shaders can subsume our current geometry tasks, and in some cases make them
more efficient. But mesh shaders also open up possibilities for new kinds of geometry processing
that wouldn’t have been feasible on the GPU before, or would have required expensive compute
pre-passes storing data out to memory and then reading it back in through the traditional geometry
pipeline.</p>
<p>With our meshes already in meshlet form, we can do <a href="https://www.slideshare.net/gwihlidal/optimizing-the-graphics-pipeline-with-compute-gdc-2016">finer-grained culling</a>
at the meshlet level, and even at the triangle level within each meshlet. With task shaders, we can
potentially do mesh LOD selection on the GPU, and if we want to get fancy we could even try dynamically
packing together very small draws (from coarse LODs) to get better meshlet utilization.</p>
<p>In place of tile-based forward lighting, or as an extension to it, it might be useful to cull
lights (and projected decals, etc.) per meshlet, assuming there’s a good way to pass the variable-size
light list from a mesh shader down to the fragment shader. (This suggestion from <a href="https://twitter.com/sebaaltonen">Seb Aaltonen</a>.)</p>
<p>Having access to the topology in the mesh shader should enable us to calculate dynamic normals,
tangents, and curvatures for a mesh that’s deforming due to complex skinning, displacement mapping,
or procedural vertex animation. We can also do voxel meshing, or isosurface extraction—marching
cubes or tetrahedra, plus generating normals etc. for the isosurface—directly in a mesh shader,
for rendering fluids and volumetric data.</p>
<p>Geometry for hair/fur, foliage, or other surface cover might be feasible to generate on the fly,
with view-dependent detail.</p>
<p>3D modeling and CAD apps may be able to apply mesh shaders to dynamically triangulate quad meshes or
n-gon meshes, as well as things like dynamically insetting/outsetting geometry for
visualizations.</p>
<p>For rendering displacement-mapped terrain, water, and so forth, mesh shaders may be able to assist
us with <a href="https://developer.nvidia.com/gpugems/GPUGems2/gpugems2_chapter02.html">geometry clipmaps</a>
and geomorphing; they might also be interesting for <a href="http://hhoppe.com/proj/vdrpm/">progressive meshing</a>
schemes.</p>
<p>And last but not least, we might be able to render <a href="https://ia601908.us.archive.org/16/items/GDC2014Brainerd/GDC2014-Brainerd.pdf">Catmull–Clark subdivision surfaces</a>,
or other subdivision schemes, more easily and efficiently than it can be done on the GPU today.</p>
<p>To be clear, a great deal of the above is speculation and handwaving on my part—I don’t want to
mislead you that all of these things are <em>for sure</em> doable with the new mesh and task shader
pipeline. There will certainly be algorithmic difficulties and architectural hindrances that will
come up as graphics programmers have a chance to dig into this. Still, I’m quite excited to see what
people will do with this capability over the next few years, and I hope and expect that it won’t be
an NVIDIA-exclusive feature for too long.</p>Normals and the Inverse Transpose, Part 3: Grassmann On Duals
https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/
http://reedbeta.com/blog/normals-inverse-transpose-part-3/Nathan ReedSun, 22 Jul 2018 22:18:10 -0700https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#commentsGraphicsMath<p>Welcome back! In the last couple of articles, we learned about different ways to understand normal
vectors in 3D space—either as bivectors (<a href="/blog/normals-inverse-transpose-part-1/">part 1</a>), or as
dual vectors (<a href="/blog/normals-inverse-transpose-part-2/">part 2</a>). Both can be valid interpretations,
but they carry different units, and react differently to transformations.</p>
<p>In this third and final installment, we’re going leave behind the focus on normal vectors, and explore
a couple of other unitful vector quantities. We’ve seen how Grassmann bivectors and trivectors act as
oriented areas and volumes, respectively; and we saw how dual vectors act as oriented <em>line densities</em>, with
units of inverse length. Now, we’re going to put these two geometric concepts together, and find out
what they can accomplish with their combined powers. (Get it? Powers? Like powers of a scale factor?
Uh, you know what, never mind.)</p>
<!--more-->
<p>I’m going to dive right in, so if you need a refresher on either Grassmann algebra or dual spaces,
you may want to re-skim the previous articles.</p>
<div class="toc">
<ul>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#wedge-products-of-dual-vectors">Wedge Products of Dual Vectors</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#dual-bivectors">Dual Bivectors</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#dual-trivectors">Dual Trivectors</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#a-few-more-topics">A Few More Topics</a><ul>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#the-interior-product">The Interior Product</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#the-hodge-star">The Hodge Star</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#the-inner-product-or-forgetting-about-duals">The Inner Product, or Forgetting About Duals</a></li>
</ul>
</li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#whats-the-use-of-all-this">What’s The Use of All This?</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#organizing-the-zoo">Organizing the Zoo</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="wedge-products-of-dual-vectors"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#wedge-products-of-dual-vectors" title="Permalink to this section">Wedge Products of Dual Vectors</a></h2>
<p>Grassmann algebra allows us to take wedge products of vectors, producing higher-grade algebraic
entities such as bivectors and trivectors. Just as we can do this with base vectors, we can do the
same thing on dual vectors, producing <em>dual bivectors</em> and <em>dual trivectors</em>.</p>
<p>A dual bivector is formed by wedging two dual vectors, like:
$$
{\bf e_x^*} \wedge {\bf e_y^*} = {\bf e_{xy}^*}
$$
and a dual trivector is the product of three:
$$
{\bf e_x^*} \wedge {\bf e_y^*} \wedge {\bf e_z^*} = {\bf e_{xy}^*} \wedge {\bf e_z^*} = {\bf e_{xyz}^*}
$$
This works exactly the same way that wedge products of ordinary vectors do; in particular, the same
anticommutative law applies.</p>
<p>So what’s the geometric meaning of these dual $k$-vectors? Recall that a dual vector is defined as
a linear form—a function from some vector space $V$ to scalars $\Bbb R$. Conveniently, the wedge
products of dual vectors turn out to be isomorphic to the duals of wedge products of vectors.
(Mathematically, we can say, for finite-dimensional $V$:
$$
\textstyle
\bigwedge^k \bigl( V^* \bigr) \cong \bigl(\bigwedge^k V \bigr)^*
$$
where $\bigwedge^k$ is the operation to construct the set of $k$-vectors over a given base
vector space.)</p>
<p>The upshot is that dual $k$-vectors can be understood as <em>linear forms on $k$-vectors</em>: a dual
bivector is a linear function from bivectors to scalars, and a dual trivector is a linear function
from trivectors to scalars. Let’s see how this works in more detail.</p>
<h2 id="dual-bivectors"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#dual-bivectors" title="Permalink to this section">Dual Bivectors</a></h2>
<p>In the previous article, we saw how a dual vector can be visualized as a field of parallel, uniformly
spaced planes, representing the level sets of a linear form:</p>
<figure alt="A dual vector in 3D, visualized as a set of parallel planes" class="invert-when-dark" style="height:15em" title="A dual vector in 3D, visualized as a set of parallel planes" >
<img src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/1-form.png"/> <figcaption><p class="attribution"><a href="https://commons.wikimedia.org/wiki/File:N-form.svg">Maschen (Wikipedia)</a></p></figcaption>
</figure>
<p>You can think of the discrete planes in this picture as representing intervals of one unit
in the output of the linear form. Keep in mind, though, that there are actually a <em>continuous
infinity</em> of these planes, filling space—one for every possible output value of the linear form.
When you evaluate the linear form—i.e. pair a dual vector with a vector—the result represents <em>how
many planes</em> the vector crosses, from its tail to its tip (in a continuous-measure sense of “how many”).
This will depend on both the length and orientation of the vector: for example, a vector parallel to
the planes will return zero, no matter its length.</p>
<p>A dual <em>bivector</em> can be thought of in a similar way—but instead of planes, we now picture a field
of parallel <em>lines</em>, uniformly spaced over the plane perpendicular to them.</p>
<figure alt="A dual bivector in 3D, visualized as a set of parallel lines, formed as the intersections of the planes of two dual vectors" class="invert-when-dark" style="height:20em" title="A dual bivector in 3D, visualized as a set of parallel lines, formed as the intersections of the planes of two dual vectors" >
<img src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/2-form.png"/> <figcaption><p class="attribution"><a href="https://commons.wikimedia.org/wiki/File:N-form.svg">Maschen (Wikipedia)</a></p></figcaption>
</figure>
<p>As suggested by this diagram, when you wedge two dual vectors, the resulting dual bivector consists
of all the <em>lines of intersection</em> of the two dual vectors’ respective planes.</p>
<p>What happens when we pair this dual bivector with a base bivector? As before, the
result is a scalar—this time representing <em>how many lines</em> the bivector crosses! If you visualize
the bivector as a parallelogram, or circle or any other shape, it will have a certain area. It
will therefore intersect some quantity of the continuous mass of lines. This quantity won’t depend on
the <em>shape</em> of the bivector—remember, bivectors don’t actually <em>have</em> any defined shape—only on
its area (magnitude) and orientation. A bivector whose plane runs parallel to the lines will return
zero, no matter its area.</p>
<p>Because dual vectors have units of inverse length, and a dual bivector is a product of dual vectors,
<strong>a dual bivector has units of inverse area</strong>. It represents an oriented areal
density, such as a probability density over a surface! When you pair the dual bivector with a
bivector, the result tells you how much probability (or whatever else) is covered by that bivector’s
area. And as implied by their units, dual bivectors scale as $1/a^2$. (If you scale an object <em>up</em> by
a factor of $a$, the probablity density on its surface goes <em>down</em> by a factor of $a^2$, because the
same total probability is now spread over an $a^2$-larger area.)</p>
<p>How about the transformation rule for dual bivectors? Well, we learned in part 1 that bivectors transform as
$\text{cofactor}(M)$; and in part 2, we found that dual vectors transform as the inverse transpose,
$M^{-T}$. It follows that dual bivectors transform as $\text{cofactor}\bigl(M^{-T}\bigr)$,
or equivalently $\bigl(\text{cofactor}(M)\bigr)^{-T}$. Startlingly, for 3×3 matrices these formulas
reduce to just
$$
\frac{M}{\det M}
$$
So, dual bivectors simply transform using $M$ divided by its own determinant.</p>
<h2 id="dual-trivectors"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#dual-trivectors" title="Permalink to this section">Dual Trivectors</a></h2>
<p>Follow the pattern: if a dual vector in 3D looks like a stack of parallel planes, and a dual bivector
looks like a field of parallel lines, then a dual <em>trivector</em> looks like a cloud of parallel <em>points</em>.
Well, drop the “parallel”—it doesn’t mean anything. It’s just uniformly spaced points.</p>
<figure alt="A dual trivector in 3D, visualized as a set of points formed as the intersections of the planes of three dual vectors" class="invert-when-dark" style="height:20em" title="A dual trivector in 3D, visualized as a set of points formed as the intersections of the planes of three dual vectors" >
<img src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/3-form.png"/> <figcaption><p class="attribution"><a href="https://commons.wikimedia.org/wiki/File:N-form.svg">Maschen (Wikipedia)</a></p></figcaption>
</figure>
<p>As before, the wedge product of three dual vectors—or a dual vector and dual bivector—constructs
the continuous point cloud made of all the intersection points of the wedge factors. This quantity
scales as $1/a^3$ and represents a volume density. When you pair it with a trivector, the result
tells you how much of the point cloud is enclosed in that trivector’s volume.</p>
<p>The transformation rule for this one is easy—dual trivectors in 3D just get multiplied by $1/\det M$.</p>
<h2 id="a-few-more-topics"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#a-few-more-topics" title="Permalink to this section">A Few More Topics</a></h2>
<p>With the introduction of dual bi- and trivectors, our “scaling zoo” is now complete! We’ve got the
full ecosystem of vectorial quantities with scaling powers from −3 to +3, each with its proper units
and matching transformation formula.</p>
<p>In the rest of this section, I’ll quickly touch on a few more mathematical aspects of this extended
Grassmann algebra with dual spaces.</p>
<h3 id="the-interior-product"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#the-interior-product" title="Permalink to this section">The Interior Product</a></h3>
<p>As we saw in part 2, a vector space and its dual have a “natural pairing” operation, much like
an inner product, between vectors and dual vectors. This pairing
extends to $k$-vectors and their duals, too. In fact, we can further extend the natural pairing to
work between $k$-vectors and duals <em>of different grades</em>. For example, we can define a
way to “pair” a dual vector $w$ with a bivector $B = u \wedge v$, yielding a vector:
$$
\langle w, B \rangle = \langle w, u \rangle v - u \langle w, v \rangle
$$
Geometrically, the resulting vector lies in the plane of $B$, and runs parallel to the level planes
of $w$. In some sense, $w$ is “eating” the dimension of $B$ that lies along the direction of $w$’s
density, and leaving the leftover dimension behind as a vector.</p>
<p>This extended pairing operation is known as the <a href="https://en.wikipedia.org/wiki/Exterior_algebra#Interior_product">interior product</a>
or contraction product, although different references often define it in slightly different ways
(there are various conventions in the literature). I’m not going to go into it too deeply.
The key point is that you can combine a $k$-vector with a dual $\ell$-vector, for any grades
$k$ and $\ell$; the result will be a $(k-\ell)$-vector, interpreting negative grades as duals.</p>
<h3 id="the-hodge-star"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#the-hodge-star" title="Permalink to this section">The Hodge Star</a></h3>
<p>In addition to the vector-space duality we’ve been talking about, Grassmann algebra contains another,
distinct notion of duality: Hodge duality, represented by the Hodge star operator, $\star$. (Note
that this is a different symbol from the asterisk $*$ used for the dual vector space!)</p>
<p>The vector-space notion of duality relates $k$-vectors to duals of <em>equal grade</em>—vectors to dual
vectors, bivectors to dual bivectors, and so on. Hodge duality, however, connects things to duals of a
complementary grade. Applying the Hodge star to a $k$-vector produces an element of grade $n - k$,
where $n$ is the dimension of space. In 3D, it interchanges vectors (grade 1) with bivectors (grade 2),
and scalars (grade 0) with trivectors (grade 3).</p>
<p>The way I’ll define the Hodge star initially is a bit different than the standard way. In fact,
there are actually <em>two</em> Hodge star operations: one that goes from $k$-vectors to dual $(n-k)$-vectors,
and another that goes the other way. I’ll denote these by $\star$ and $-\star$ respectively. The
two are inverses of each other (in 3D, at least). They’re defined as follows:
$$
\begin{aligned}
\star&: \textstyle\bigwedge^k V \to \textstyle\bigwedge^{n-k}V^* &&:
& v^\star &= \langle {\bf e_{xyz}^*}, v \rangle \\
-\star&: \textstyle\bigwedge^k V^* \to \textstyle\bigwedge^{n-k}V &&:
& v^{-\star} &= \langle v, {\bf e_{xyz}} \rangle
\end{aligned}
$$
The angle brackets on the right here are the interior product. What we’re saying is: to do
the Hodge star on a $k$-vector, we take its interior product with ${\bf e_{xyz}^*}$, the standard
unit dual trivector (or, in $n$ dimensions, the unit dual $n$-vector). This results in a dual
$(n-k)$-vector, which geometrically represents a density over all the dimensions <em>not</em> included in
the original $k$-vector.</p>
<p>Conversely, to do the anti-Hodge-star on a dual $k$-vector, we take its interior product with
${\bf e_{xyz}}$, giving an $(n-k)$-vector containing all the dimensions <em>not</em> represented by the
original dual $k$-vector, i.e. all the dimensions perpendicular to its level sets.</p>
<p>(These two operations are <em>almost</em> defined on disjoint domains, and could therefore be combined into
one “smart” star that automatically knows what to do based on the type of its argument…except for
the $k = 0$ case: when you hodge a scalar, does it go to a trivector, or to a dual trivector? Both
are possible; that’s why we need two distinct operations here.)</p>
<p>For 3D geometry, the interesting cases are vectors interchanging with bivectors:</p>
<ul>
<li>A vector $v$ hodges to a dual bivector whose “field lines” run parallel to $v$.</li>
<li>A bivector $B$ hodges to a dual vector whose level planes are parallel to $B$.</li>
<li>A dual vector $w$ unhodges to a bivector parallel to $w$’s level planes.</li>
<li>A dual bivector $D$ unhodges to a vector parallel to $D$’s field lines.</li>
</ul>
<p>Although the formal definition was somewhat involved, you can see that the geometric result of the
Hodge operations is actually pretty simple. It’s all about swapping between the geometry of a
$k$-vector and the corresponding level-set geometry of a dual $(n-k)$-vector. The Hodge stars are
a very useful tool for working with Grassmann and dual-Grassmann quantities in practice.</p>
<h3 id="the-inner-product-or-forgetting-about-duals"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#the-inner-product-or-forgetting-about-duals" title="Permalink to this section">The Inner Product, or Forgetting About Duals</a></h3>
<p>In most treatments of Grassmann or geometric algebra, dual spaces are hardly mentioned. The more conventional
definition of the Hodge star has it mapping directly between $k$-vectors and $(n-k)$-vectors—no
duals in sight. How does this work?</p>
<p>It turns out that if we have an inner product defined on our vector space, we can use it to
convert back and forth between vectors and dual vectors, or $k$-vectors and their duals.</p>
<p>So far, we haven’t discussed any means of mapping individual vectors back and forth between the base and
dual spaces. Although they’re both vector spaces of the same dimension, there’s no natural isomorphism
that would enable us to map them in a non-arbitrary way. However, the presence of an inner
product does pick out a specific isomorphism with the dual space: that which maps each vector $v$ to
a dual vector $v^*$ that implements <em>dotting with $v$</em>, using the inner product.</p>
<p>Symbolically, for all vectors $u \in V$, we have $\langle v^*, u \rangle = v \cdot u$. This can be
extended to inner products and isomorphisms for all $k$-vectors as well (see
<a href="https://en.wikipedia.org/wiki/Exterior_algebra#Inner_product">Wikipedia</a> for details).</p>
<p>Note, however, that this map is <em>not</em> preserved by scaling, or by transformations in general, because
$v^*$ transforms as $M^{-T}$ while $v$ transforms as $M$.</p>
<p>With this correspondence, it becomes possible to largely ignore the existence of dual spaces and dual
elements altogether—we have the fiction that they’re not distinct from the base elements. In an
orthonormal basis, even the <em>coordinates</em> of a vector and its corresponding dual will be identical.</p>
<p>For an example of “forgetting” about duals: the Hodge star operations can be defined using the inner
product to invisibly dualize their input or output as well as hodging it. Then the two Hodge stars I
defined above collapse into one operation, mapping between $\bigwedge^k V$ and $\bigwedge^{n-k} V$.</p>
<h2 id="whats-the-use-of-all-this"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#whats-the-use-of-all-this" title="Permalink to this section">What’s The Use of All This?</a></h2>
<p>This is kind of a lot. We started with just vectors and normal vectors—two kinds of vector-shaped
things with different rules, which was confusing enough. But now we have <em>four</em>: vectors, dual vectors,
bivectors, and dual bivectors. And on top of that we have three scalar-shaped things, too: true
unitless scalars, trivectors, and dual trivectors.</p>
<p>Evidently, lots of people manage to get along well enough without being totally aware of
all these distinctions! Even texts on Grassmann or geometric algebra may not fully delve into
the “duals” story, instead treating $k$-vectors and their duals as the same thing (implicitly using
the isomorphism defined above). Their differing transformation behavior becomes sort of a curiosity,
an unsystematic ornamental detail. And this comes at the cost of making some aspects of the algebra
require an inner product or a metric, and only work properly in an orthonormal basis. In contrast,
when you’re “cooking with duals”, you can derive formulas that work properly in any basis.</p>
<p>As a quick example of this, let’s look at a concrete problem you might encounter in graphics. Let’s
say you have a triangle mesh and you want to select a random point on it, chosen uniformly over the
surface area. To do this, we must first select a random triangle, with probability
proportional to area. The standard technique is to precompute the areas of all the triangles
and build a prefix-sum table; then, to select a triangle, we take a uniform random value and
binary-search on it in the table.</p>
<p>Let’s throw in another wrinkle, though. What if the triangle mesh is transformed—possibly by a
nonuniform scaling, or a shear? In general, this will alter the areas of all the triangles, in an
orientation-dependent way. A uniform distribution over surface area in the mesh’s <em>local</em> space
will no longer be uniform in world space. We could address this by pre-transforming the whole mesh
into world space and doing the sampling process there—but that’s more expensive than necessary.</p>
<p>We can use bivectors to help. Instead of calculating just a scalar area for each triangle, calculate
the bivector representing its orientation and area. (If the triangle’s vertices are $p_1, p_2, p_3$,
this is $\tfrac{1}{2}(p_2 - p_1) \wedge (p_3 - p_1)$.) Now we can transform all the bivectors into
world space, using their transformation rule, and they will accurately represent the areas of the
transformed triangles. Then we can calculate their magnitudes and build the prefix-sum table, as
before.</p>
<p>Conversely, suppose we have an existing, non-uniform areal probability measure defined over our
triangle mesh. (Maybe it’s a light source with a texture defining its emissive brightness, and we
want to sample with respect to emitted power; or maybe we want to sample with respect to solid angle
subtended at some point, or some sort of visual importance, etc.) We can represent these
probability densities as dual bivectors, and again we can take them back and forth between local and
world space—even in the presence of shear or nonuniform scaling—with confidence that we’re still
representing the same distribution.</p>
<p>Some other examples where dual $k$-vectors show up:</p>
<ul>
<li>The derivative (gradient) of a scalar field, such as an SDF, is naturally a dual vector.</li>
<li>Dual vectors represent spatial frequencies (wavevectors) in Fourier analysis.</li>
<li>The radiance carried by a ray is a density with respect to projected area, and can therefore be
represented, at least in part, as a dual bivector.</li>
</ul>
<p>Like many theoretical math concepts, I think these ideas are mostly useful for enriching your own
mental models of geometry, strengthening your thought process, and deriving results that you can
then use in code in a more “conventional” way. I’m not <em>necessarily</em> suggesting we should
all go off and start implementing $k$-vectors and their duals as classes in our math libraries.
(Frankly, our math libraries are <a href="/blog/on-vector-math-libraries/">enough of a mess already</a>.)</p>
<h2 id="organizing-the-zoo"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#organizing-the-zoo" title="Permalink to this section">Organizing the Zoo</a></h2>
<p>One more thing to muse on before I leave you. We’ve seen that there is a “scaling zoo” of mathematical
elements with different physical, geometric interpretations and behaviors. Different branches of
science and math have distinct ways of conceptually organizing this zoo, and thinking about its
denizens and their relationships.</p>
<p>In computer science, for example, we would probably understand vectors, bivectors, dual vectors, and
so forth as different <em>types</em>. Each might have an internal structure as a composition of more elementary
values (real numbers), and a suite of allowed operations that define what you can do with them and
how they interact with one another.</p>
<p>Physicists, meanwhile, tend to take a more rough-and-ready approach: geometric elements are
thought of as simply matrices of real (or sometimes complex) numbers, together with <em>transformation
laws</em>—rules that define what happens to a given matrix under a change of coordinates. Algebraic
properties such as anticommutativity are obtained by constructing the matrices in such a way that matrix
multiplication implements the desired algebra. For example, a bivector can be represented as an
antisymmetric matrix; wedging two vectors $u, v$ to make a bivector corresponds to calculating the matrix
$$
uv^T - vu^T
$$
which has the same anticommutative property as a wedge product. Multiplying this matrix by a (dual)
vector $w$ then represents the interior product of the bivector with $w$. Meanwhile, a dual
bivector would be structurally similar, but have a different transformation law (“covariant” versus
“contravariant”).</p>
<p>Lastly, mathematicians like to formalize things by saying that different geometric
quantities are elements of different <em>spaces</em> and/or <em>algebras</em>. Both terms ultimately mean a
set (in the mathematical sense), together with some extra structure—such as algebraic operations,
a topology, a norm or metric, and so on—defined on top of the bare set. The exact kind of structures
you need depends on what you’re doing, and there’s a whole menagerie of such structures that
might be invoked in different contexts.</p>
<p>So which structure is behind the scaling zoo? We know we’ve got the vector space structure, and the
Grassmann algebraic structure. But neither of these fully accounts for the different scaling and transformation
behaviors of dual elements: dual spaces are isomorphic to their base spaces (in finite dimensions),
totally identical insofar as the vector and Grassmann structures are concerned.</p>
<p>I don’t have a fully developed answer yet—but I suspect it’s got to do with the <a href="https://en.wikipedia.org/wiki/Representation_of_a_Lie_group">representation theory of Lie groups</a>.
My guess is that the different types of scaling elements we’ve seen can be codified as vector spaces
acted on by different representations of $GL(n)$, the Lie group of all linear maps on $\Bbb R^n$.
But I’m not going to get into that here. (If you’d like to read more on this, here are
a couple web references: <a href="https://sbseminar.wordpress.com/2008/07/08/how-to-write-down-the-representations-of-gl_n/">one</a>,
<a href="http://www.maths.qmul.ac.uk/~whitty/LSBU/MathsStudyGroup/SeligGLn.pdf">two</a>. Also: <a href="http://inference-review.com/article/woits-way">Peter Woit’s book</a> on the role of representation theory in
particle physics.)</p>
<h2 id="conclusion"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#conclusion" title="Permalink to this section">Conclusion</a></h2>
<p>I hope this has been an entertaining and enlightening tour through some of the layers
beneath the surface of your favorite Euclidean geometry. We started with a seemingly simple
question—why do normal vectors transform using the inverse transpose matrix?—and found
that there was <em>much</em> more rich structure there than meets the eye.</p>
<p>The “scaling zoo” of $k$-vectors and their duals makes a pleasingly complete and symmetrical whole.
Even if I’m not going to be employing these things in practical work every day, I feel that studying
them has helped me understand some things that were vague and foggy in my mind before. It’s worth
appreciating that these subtle distinctions exist. One of my general axioms in life is that
everything is more complicated than it first appears, and nowhere is this more consummately borne
out than mathematics!</p>Normals and the Inverse Transpose, Part 2: Dual Spaces
https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/
http://reedbeta.com/blog/normals-inverse-transpose-part-2/Nathan ReedSat, 19 May 2018 16:13:37 -0700https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#commentsGraphicsMath<p>In the <a href="/blog/normals-inverse-transpose-part-1/">first part</a> of this series, we learned about
Grassmann algebra, and concluded that normal vectors in 3D can be interpreted as bivectors. To
transform bivectors, we need to use a different matrix (in general) than the one that transforms
ordinary vectors. Using a canonical basis for bivectors, we found that the matrix required is the <em>cofactor
matrix</em>, which is proportional to the inverse transpose. This provides at least a partial
explanation of why the inverse transpose is used to transform normal vectors.</p>
<p>However, we also left a few loose ends untied. We found out about the cofactor matrix,
but we didn’t really see how that connects to the
<a href="https://computergraphics.stackexchange.com/a/1506/48">algebraic derivation</a> that transforming a
plane equation $N \cdot x + d = 0$ involves the inverse transpose. I just sort of handwaved the
proportionality between the two.</p>
<!--more-->
<p>Moreover, we saw that Grassmann $k$-vectors provide vectorial geometric objects with a natural
interpretation as carrying units of length, area, and volume, owing to their scaling behavior. But
we didn’t find anything similar for densities—units of <em>inverse</em> length, area, or volume.</p>
<p>As we’ll see in this article, there’s one more geometric concept we need to complete the
picture. Putting this new concept together with the Grassmann algebra we’ve already learned will
turn out to clarify and resolve these remaining issues.</p>
<p>Without further ado, let’s dive in!</p>
<div class="toc">
<ul>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#functions-as-vectors">Functions As Vectors</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#linear-forms-and-the-dual-space">Linear Forms and the Dual Space</a><ul>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#the-natural-pairing">The Natural Pairing</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#the-dual-basis">The Dual Basis</a></li>
</ul>
</li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#transforming-dual-vectors">Transforming Dual Vectors</a><ul>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#uniform-scaling">Uniform Scaling</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#sheared-dual-vectors-and-the-inverse-transpose">Sheared Dual Vectors and the Inverse Transpose</a></li>
</ul>
</li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#so-whats-a-normal-vector-anyway">So What’s a Normal Vector, Anyway?</a></li>
</ul>
</div>
<h2 id="functions-as-vectors"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#functions-as-vectors" title="Permalink to this section">Functions As Vectors</a></h2>
<p>Most of this article will be concerned with functions taking and returning vectors of various kinds.
To understand what follows, it’s necessary to make a bit of a mental flip, which you
might find quite counterintuitive if you haven’t encountered it before.</p>
<p>The flip is this: <strong>functions that map into a vector space <em>are themselves</em> vectors</strong>.</p>
<p>That statement might not appear to make any sense at first! Vectors and functions are totally
different kinds of things, right, like apples and…chairs? How can a function literally <em>be</em> a
vector?</p>
<p>If you look up the <a href="https://en.wikipedia.org/wiki/Vector_space#Definition">technical definition of a vector space</a>,
you’ll find that it’s quite nonspecific about what the <em>structure</em> of a vector has to be. We often
think of them as arrows with a magnitude and direction, or as ordered lists of numbers (coordinates).
However, all you truly need for a vector space is a set of <em>things</em> that support two basic operations:
being added together, and being multiplied by scalars (here, real numbers). These operations just
need to obey a few reasonable axioms.</p>
<p>Well, functions can be added together! If we have two functions $f$ and $g$, we can add them
<em>pointwise</em> to produce a new function $h$, defined by $h(x) = f(x) + g(x)$ for every point $x$ in
the domain. Likewise, we can multiply a function pointwise by a scalar: $g(x) = a \cdot f(x)$.
These operations do satisfy the vector space axioms, and therefore any set of compatible
functions forms a vector space in its own right: a <em>function space</em>.</p>
<p>To put it a bit more formally: given a domain set $X$ (any kind of set, not necessarily a vector
space itself) and a range vector space $V$, the set of functions $f: X \to V$ forms a vector space
under pointwise addition and scalar multiplication. You need the range to be a vector space so you
can add and multiply the outputs of the functions, but the domain isn’t required to be a vector space—or
even a “space” per se at all; it could be a discrete set.</p>
<p>This realization that functions can be treated as vectors then lets us apply linear-algebra techniques
to understand and work with functions—a large branch of mathematics called
<a href="https://en.wikipedia.org/wiki/Functional_analysis">functional analysis</a>.</p>
<h2 id="linear-forms-and-the-dual-space"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#linear-forms-and-the-dual-space" title="Permalink to this section">Linear Forms and the Dual Space</a></h2>
<p>From this point forward, we’ll be concerned with a specific class of functions known as
<a href="https://en.wikipedia.org/wiki/Linear_form"><strong>linear forms</strong></a>.</p>
<p>If we have some vector space $V$ (such as 3D space $\Bbb R^3$, for instance), then a linear form
on $V$ is defined as a linear function $f: V \to \Bbb R$. That is, it’s a linear
function that takes a vector argument and returns a scalar.</p>
<p><em>(A note for the mathematicians: in this article I’m only talking about finite-dimensional vector
spaces over $\Bbb R$, so I may occasionally make a statement that doesn’t hold for general vector
spaces. Sorry!)</em></p>
<p>I like to visualize a linear form as a set of parallel, uniformly spaced planes (3D) or lines (2D): the
<a href="https://en.wikipedia.org/wiki/Level_set">level sets</a> of
the function at intervals of one unit in the output. Here are some examples:</p>
<p class="image-array"><img alt="A linear form, x + y" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/linear-form-1.png" style="width:14em" title="A linear form, x + y" />
<img alt="A linear form, −⅓x + ½y" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/linear-form-2.png" style="width:14em" title="A linear form, −⅓x + ½y" />
<img alt="A linear form, 2x + y" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/linear-form-3.png" style="width:14em" title="A linear form, 2x + y" /></p>
<p>The gradients here indicate the linear form’s orientation—the function is increasing with the
gradient’s opacity; the discrete lines mark where its output crosses an integer, and the opacity
wraps around to zero. Note that “bigger”
linear forms (in the sense of bigger output values) have more tightly-spaced lines, and vice versa.</p>
<p>As elaborated in the previous section, linear forms on a given vector space can themselves be treated as
vectors, in their own function space. Linear combinations of linear
functions are still linear, so they do form a closed vector space in their own right.</p>
<p>This vector space—the set of all linear forms on $V$—is important enough that
it has its own name: the <a href="https://en.wikipedia.org/wiki/Dual_space"><strong>dual space</strong></a> of $V$. It’s
denoted $V^*$. The elements of the dual space (the linear forms) are then called <strong>dual vectors</strong>,
or sometimes <em>covectors</em>.</p>
<h3 id="the-natural-pairing"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#the-natural-pairing" title="Permalink to this section">The Natural Pairing</a></h3>
<p>The fact that dual vectors are <em>linear</em> functions, and not general functions from $V$ to
$\Bbb R$, strongly restricts their behavior. Linear forms on an $n$-dimensional
vector space have only $n$ degrees of freedom, versus the infinite degrees of freedom that a
general function has. To put it another way, $V^*$ has the same dimensionality as $V$.</p>
<p>To see this more concretely: a linear form on $\Bbb R^n$ can be
fully specified by the values it returns when you evaluate it on the $n$ vectors of a basis.
The result it returns for any <em>other</em> vector can then be derived by linearity. For
example, if $f$ is a linear form on $\Bbb R^3$, and $v = (x, y, z)$ is an arbitrary vector, then:
$$
\begin{aligned}
f(v) &= f(x {\bf e_x} + y {\bf e_y} + z {\bf e_z}) \\
&= x \, f({\bf e_x}) + y \, f({\bf e_y}) + z \, f({\bf e_z})
\end{aligned}
$$
If you’re thinking that the above looks awfully like a dot product between $(x, y, z)$ and
$\bigl(f({\bf e_x}), f({\bf e_y}), f({\bf e_z}) \bigr)$—you’re right!</p>
<p>Indeed, the operation of evaluating a linear form has the properties of a <em>product</em> between the dual space and
the base vector space: $V^* \times V \to \Bbb R$. This product is called the <strong>natural pairing</strong>.</p>
<p>Like the vector dot product, the natural pairing results in a real number, and is bilinear—linear
on both sides. However, here we’re taking a product not of two vectors, but of a dual vector with
a “plain” vector. The linearity on the left side comes from pointwise adding/multiplying linear forms;
that on the right comes from the linear forms being, well, linear in their vector argument.</p>
<p>Going forward, I’ll denote the natural pairing by angle brackets, like this: $\langle w, v \rangle$.
Here $w$ is a dual vector in $V^*$, and $v$ is a vector in $V$. To reiterate, this is simply
<em>evaluating</em> the linear form $w$, as a function, on the vector $v$. But because functions are vectors,
and dual vectors in particular are <em>linear</em> functions, this operation also has the properties of a product.</p>
<p>The above equation looks like this in angle-bracket notation:
$$
\begin{aligned}
\langle w, v \rangle
&= \bigl\langle w, \, x {\bf e_x} + y {\bf e_y} + z {\bf e_z} \bigr\rangle \\
&= x \langle w, {\bf e_x} \rangle + y \langle w, {\bf e_y} \rangle + z \langle w, {\bf e_z} \rangle
\end{aligned}
$$
Note how this now looks like “just” an application of the distributive property—which it is!</p>
<h3 id="the-dual-basis"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#the-dual-basis" title="Permalink to this section">The Dual Basis</a></h3>
<p>The above construction can also be used to define a canonical basis for $V^*$, for a given basis
on $V$. Namely, we want to make the numbers $\langle w, {\bf e_x} \rangle, \langle w, {\bf e_y} \rangle, \langle w, {\bf e_z} \rangle$
be the <em>coordinates</em> of $w$ with respect to this basis, the same way that $x, y, z$ are coordinates
with respect to $V$’s basis. We can do this by defining <strong>dual basis vectors</strong> ${\bf e_x^*}, {\bf e_y^*}, {\bf e_z^*}$,
according to the following constraints:
$$
\begin{aligned}
\langle {\bf e_x^*}, {\bf e_x} \rangle &= 1 \\
\langle {\bf e_x^*}, {\bf e_y} \rangle &= 0 \\
\langle {\bf e_x^*}, {\bf e_z} \rangle &= 0
\end{aligned}
$$
and similarly for ${\bf e_y^*}, {\bf e_z^*}$. The nine total constraints can be summarized as:
$$
\langle {\bf e}_i^*, {\bf e}_j \rangle =
\begin{cases}
1 & \text{if } i = j, \\
0 & \text{if } i \neq j,
\end{cases}
\quad i, j \in \{ {\bf x, y, z} \}
$$
This dual basis always exists and is unique, given a valid basis on $V$ to start from.</p>
<p>Geometrically speaking, the dual basis consists of linear forms that measure the distance along
each axis—but the level sets of those linear forms are parallel to <em>all the other axes</em>. They’re
not necessarily perpendicular to the same axis that they’re measuring, unless the basis happens to be
orthonormal. This feature will be important a bit later!</p>
<p>By way of example, here are a couple of vector bases together with their corresponding dual bases:</p>
<p class="image-array"><img alt="An orthonormal basis and its corresponding dual basis" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/dual-basis-1.png" style="width:20em;padding:0 10pt" title="An orthonormal basis and its corresponding dual basis" />
<img alt="An non-orthonormal basis and its corresponding dual basis" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/dual-basis-2.png" style="width:20em;padding:0 10pt" title="An non-orthonormal basis and its corresponding dual basis" /></p>
<p>Here’s an example of a linear form decomposed into basis components, $w = p {\bf e_x^*} + q {\bf e_y^*}$:</p>
<p><img alt="A linear form as a sum of x and y basis components" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/dual-basis-3.png" style="width:20em" title="A linear form as a sum of x and y basis components" /></p>
<p>With the dual basis defined as above, if we express both a dual vector $w$ and a vector $v$ in terms
of their respective bases,
then the natural pairing $\langle w, v \rangle$ boils down to just the dot product of the
respective coordinates:
$$
\begin{aligned}
\langle w, v \rangle
&= \bigl\langle p {\bf e_x^*} + q {\bf e_y^*} + r {\bf e_z^*}, \; x {\bf e_x} + y {\bf e_y} + z {\bf e_z} \bigr\rangle \\
&= px + qy + rz
\end{aligned}
$$</p>
<h2 id="transforming-dual-vectors"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#transforming-dual-vectors" title="Permalink to this section">Transforming Dual Vectors</a></h2>
<p>In the preceding article, we learned that although vectors and <em>bivectors</em> may appear
structurally similar (they both have three components, in 3D space), they have different geometric
meanings and different behavior when subject to transformations—in particular, to scaling.</p>
<p>With dual vectors, we have a third example in this class! Dual vectors are again
“vectorial” objects (obeying the vector space axioms), again structurally similar to vectors and
bivectors (having three components, in 3D space), but with a different geometric meaning (linear
forms). This immediately suggests we look into dual vectors’ transformation behavior!</p>
<p>Dual vectors are linear forms, which are functions. So how do we transform a function?</p>
<p>The way I like to think about this is that the function’s output values are carried along with the
points of its domain when they’re transformed. Imagine labeling every point in the domain with the
function’s value at that point. Then apply the transformation to all the points; they move somewhere
else, but carry their label along with them. (Another way of thinking about it is that you’re
transforming the <em>graph</em> of the function, considered as a point-set in a one-higher-dimensional
space.)</p>
<p>To formalize this a bit more: suppose we transform vectors by some matrix $M$, and we want to
apply this transformation also to a function $f(v)$, yielding a new function $g(v)$. What we want is
that $g$ evaluated on a <em>transformed</em> vector should equal $f$ evaluated on the original vector:
$$
g(Mv) = f(v)
$$
Or, equivalently,
$$
g(v) = f(M^{-1}v)
$$
In other words, we can apply a transformation to a function by making a new function that first
applies the <em>inverse</em> transformation to its argument, then passes that to the old function.</p>
<p>Note that this only works if $M$ is invertible. If it isn’t, then our picture of “carrying the output
values along with the domain points” falls apart: a noninvertible $M$ can collapse many distinct domain
points into one, and then how could we decide what the function’s output should be at those points?</p>
<h3 id="uniform-scaling"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#uniform-scaling" title="Permalink to this section">Uniform Scaling</a></h3>
<p>Now that we understand how to apply a transformation to a function, let’s look at uniform scaling
as an example. We’ll scale by a factor $a > 0$, so that vectors transform as $v \mapsto av$. Then
functions will transform as $f(v) \mapsto f(v/a)$, per the previous section.</p>
<p>Let’s switch back to looking at this from a “dual vector” point of view instead of a
“function” point of view.
So, if $f(v) = \langle w, v \rangle$ for some dual vector $w$, then what happens when we scale by $a$?
$$
\begin{aligned}
\langle w, v \rangle \mapsto & \left\langle w, \frac{v}{a} \right\rangle \\
= & \left\langle \frac{w}{a}, v \right\rangle
\end{aligned}
$$
I’ve just moved the $1/a$ factor from one side of the angle brackets to the other, which is allowed
because it’s a bilinear operation. To summarize, we’ve found that the dual vector $w$ transforms as:
$$
w \mapsto \frac{w}{a}
$$</p>
<p>Hmm, interesting! When we scale vectors by $a$, then <strong>dual vectors scale by $\bm{1/a}$</strong>.
If you recall the previous article, we justified assigning units like “area” and “volume” to bivectors
and trivectors on the basis of their scaling behavior. Following that line of reasoning, we can now
conclude that <strong>dual vectors carry units of inverse length!</strong></p>
<p>In fact, dual vectors represent <em>oriented linear densities</em>. They provide a quantitative way
of talking about situations where some kind of scalar “stuff”—such as probability, texel count,
opacity, a change in voltage/temperature/pressure, etc.—is spread out along one dimension in space. When you
pair the dual vector with a vector (i.e. evaluate the linear form on a vector), you’re asking “how
much of that ‘stuff’ does this vector span?”</p>
<p>Under a scaling, we want to preserve the amount of “stuff”. If we’re scaling <em>up</em>, then the density
of “stuff” will need to go <em>down</em>, as the same amount of stuff is now spread over a longer distance;
and vice versa. This property is implemented by the inverse scaling behavior of dual vectors.</p>
<h3 id="sheared-dual-vectors-and-the-inverse-transpose"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#sheared-dual-vectors-and-the-inverse-transpose" title="Permalink to this section">Sheared Dual Vectors and the Inverse Transpose</a></h3>
<p>We’ve seen how uniform scaling applies inversely to dual vectors. We could study nonuniform scaling
now, too, but it turns out that axis-aligned nonuniform scaling isn’t that interesting—it just
applies inversely to each axis, as you might expect. It’ll be more illuminating at this point to
look at what happens with a <em>shear</em>.</p>
<p>I’ll stick to 2D for this one. As an example transformation, we’ll shear the $y$ axis toward $x$ a
little bit:
$$
M = \begin{bmatrix}
1 & \tfrac{1}{2} \\
0 & 1
\end{bmatrix}
$$
Here’s what it looks like:</p>
<p><img alt="The shear applied to a standard vector basis" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/dual-scaling-1.png" style="width:28em" title="The shear applied to a standard vector basis" /></p>
<p>When we perform this transformation on a dual vector, what happens? When you look at it
visually, it’s pretty straightforward—the level sets (isolines) of the linear form will tilt to
follow the shear.</p>
<p><img alt="Animation of a linear form shearing" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/dual-scaling-2.gif" style="width:20em" title="Animation of a linear form shearing" /></p>
<p>But how do we express this as a matrix acting on the dual vector’s coordinates? Let’s focus on the
$\bf e_x^*$ component. Note that our transformation $M$ doesn’t affect the $x$-axis—it maps
$\bf e_x$ to itself. But what about $\bf e_x^*$?</p>
<p><img alt="Animation of eₓ* shearing" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/dual-scaling-3.gif" style="width:20em" title="Animation of eₓ* shearing" /></p>
<p>The $\bf e_x^*$ component of a dual vector <em>does</em> change under this transformation, because
the isolines pick up the shear! Or, to put it another way:
although distances along the $x$ axis (which $\bf e_x^*$ measures) don’t change here,
$\bf e_x^*$ still cares about what the other axes are doing because <em>it has to stay parallel to them</em>.
That’s one of the defining conditions for the dual basis to do its job.</p>
<p>In particular, we have that $\bf e_x^*$ maps to ${\bf e_x^*} - \tfrac{1}{2}{\bf e_y^*}$.
If we work it out the rest of the way, the full matrix that applies to the coordinates of
a dual vector is:
$$
\begin{bmatrix}
1 & 0 \\
-\tfrac{1}{2} & 1
\end{bmatrix}
$$
This is the inverse transpose of $M$!</p>
<p>We can loosely relate the effect of the inverse transpose here to that of the cofactor
matrix for bivectors, as seen in the preceding article. Like a bivector, each dual
basis element cares about what’s happening to the <em>other</em> axes (because it needs to keep parallel
to them)—but it also must scale inversely along its <em>own</em> axis. The determinant of $M$
gives the cumulative scaling along <em>all</em> the axes:
$$
\det M = \text{scaling on my axis} \cdot \text{scaling on other axes}
$$
We can algebraically rearrange this to:
$$
\frac{1}{\text{scaling on my axis}} = \frac{1}{\det M} \cdot \text{scaling on other axes}
$$
This matches the relation between the inverse transpose and the cofactor matrix.
$$
M^{-T} = \frac{1}{\det M} \cdot \text{cofactor}(M)
$$
I’m handwaving a lot here—a detailed geometric demonstration would take us off
into the weeds—but hopefully this gives at least a little bit of intuition for why the inverse
transpose matrix is the right thing to use for dual vectors.</p>
<h2 id="so-whats-a-normal-vector-anyway"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#so-whats-a-normal-vector-anyway" title="Permalink to this section">So What’s a Normal Vector, Anyway?</a></h2>
<p>As we’ve seen, the level sets of a linear form are parallel lines in 2D, or planes in 3D. This implies
that we can define a plane by picking out a specific level set of a given dual vector:
$$
\langle w, v \rangle = d
$$
The dual vector $w$ is acting as a signed distance field for the plane.</p>
<p>We’ve also seen that when expressed in terms of matched basis-and-dual-basis components,
the natural pairing product $\langle w, v \rangle$ reduces to a dot product $w \cdot v$. And then
the above equation looks like the familiar plane equation:
$$
w \cdot v = d
$$
This shows that the dual vector’s coordinates with respect to the dual basis are <em>also</em> the coordinates
of a normal vector to the plane, in the standard vector basis.</p>
<p>So, normal vectors can be interpreted as dual vectors expressed in the dual basis, and that’s
why they transform with the inverse transpose!</p>
<p>But wait—in the last article, didn’t I just say that normal vectors should be interpreted as
bivectors, and therefore they transform with the cofactor matrix? Which one is it?</p>
<p>Ultimately, I don’t think there’s a definitive answer to this question! “Normal vector” as an idea is
a bit too vague—bivectors and dual vectors are <em>both</em> defensible ways to formalize the “normal vector”
concept. As we’ve seen, the way they transform is equivalent as far as <em>orientation</em>:
bivectors and dual vectors both transform to stay perpendicular to the plane they define, by either
$B \wedge v = d$ or $\langle w, v \rangle = d$, respectively. The difference between them is in
the units they carry and their scaling behavior: bivectors are areas, while dual vectors are inverse
lengths.</p>
<p>That’s all I have to say about transforming normal vectors! But we’ve got another question
still dangling. At the end of Part 1, I asked about vectorial quantities with negative scaling powers.
In dual vectors, we’ve now achieved scaling power −1. But what about −2 and −3? To find those,
we’re going to have to combine dual spaces with Grassmann algebra. We’ll do that in the
<a href="/blog/normals-inverse-transpose-part-3/">third and final part</a> of this series.</p>