Nathan Reed’s coding blog
https://www.reedbeta.com/
Latest posts on Nathan Reed’s coding blogen-usWed, 26 May 2021 02:27:04 +0000Hash Functions for GPU Rendering
https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/
https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/Nathan ReedFri, 21 May 2021 17:52:07 -0700https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/#commentsCodingGPUGraphics<p>Back in 2013, I wrote a <a href="/blog/quick-and-easy-gpu-random-numbers-in-d3d11/">somewhat popular article</a>
about pseudorandom number generation on the GPU. In the eight years since, a number of new PRNGs and
hash functions have been developed; and a few months ago, an excellent paper on the topic appeared
in JCGT: <a href="http://jcgt.org/published/0009/03/02/">Hash Functions for GPU Rendering</a>, by Mark Jarzynski
and Marc Olano. I thought it was time to update my former post in light of this paper’s findings.</p>
<!--more-->
<p>Jarzynski and Olano’s paper compares GPU implementations of a large number of different hash functions
along dual axes of performance (measured by time to render a quad evaluating the hash at each pixel)
and statistical quality (quantified by the count of failures of
<a href="https://en.wikipedia.org/wiki/TestU01">TESTU01 “Big Crush”</a> tests). Naturally, there is quite a
spread of results in both performance and quality. Jarzynski and Olano then identify the few hash
functions that lie along the Pareto frontier—meaning they are the best choices along the whole
spectrum of performance/quality trade-offs.</p>
<p>When choosing a hash function, we might sometimes prioritize performance, and other times might
prefer to sacrifice performance in favor of higher quality (real-time versus offline applications,
for example). The Pareto frontier provides the set of optimal choices for any point along that
balance—ranging from LCGs at the extreme performance-oriented end, to some quite expensive but
very high-quality hashes at the other end.</p>
<p>In my 2013 article, I recommended the “Wang hash” as a general-purpose 32-bit-to-32-bit integer hash
function. The Wang hash was among those tested by Jarzynski and Olano, but unfortunately it did not
lie along the Pareto frontier—not even close! The solution that dominates it—and one of the best
balanced choices between performance and quality overall—is <strong>PCG</strong>. In particular, the 32-bit PCG
hash used by Jarzynski and Olano goes as follows:</p>
<div class="codehilite"><pre><span></span><code><span class="kt">uint</span> <span class="n">pcg_hash</span><span class="p">(</span><span class="kt">uint</span> <span class="n">input</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint</span> <span class="n">state</span> <span class="o">=</span> <span class="n">input</span> <span class="o">*</span> <span class="mi">747796405</span><span class="n">u</span> <span class="o">+</span> <span class="mi">2891336453</span><span class="n">u</span><span class="p">;</span>
<span class="kt">uint</span> <span class="n">word</span> <span class="o">=</span> <span class="p">((</span><span class="n">state</span> <span class="o">>></span> <span class="p">((</span><span class="n">state</span> <span class="o">>></span> <span class="mi">28</span><span class="n">u</span><span class="p">)</span> <span class="o">+</span> <span class="mi">4</span><span class="n">u</span><span class="p">))</span> <span class="o">^</span> <span class="n">state</span><span class="p">)</span> <span class="o">*</span> <span class="mi">277803737</span><span class="n">u</span><span class="p">;</span>
<span class="k">return</span> <span class="p">(</span><span class="n">word</span> <span class="o">>></span> <span class="mi">22</span><span class="n">u</span><span class="p">)</span> <span class="o">^</span> <span class="n">word</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>This has slightly better performance and <em>much</em> better statistical quality than the Wang hash. It’s
fast enough to be useful for real-time, while also being high-quality enough for almost any graphics
use-case (if you’re not using precomputed blue noise, or low-discrepancy sequences). It should
probably be your default GPU hash function.</p>
<p>Just to prove it works, here’s the bit pattern generated by a few thousand invocations of the above
function on consecutive inputs:</p>
<p><img alt="Bit pattern generated by PCG hash" src="https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/pcg.png" title="Bit pattern generated by PCG hash" /></p>
<p>Yep, looks random! 👍</p>
<h2 id="pcg-variants"><a href="https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/#pcg-variants" title="Permalink to this section">PCG Variants</a></h2>
<p>Incidentally, you might notice that the PCG function posted above doesn’t match that found in other
sources, such as the <a href="https://www.pcg-random.org/download.html">minimal C implementation on the PCG website</a>.
This is because “PCG” isn’t a single function, but more of a recipe for constructing PRNG functions.
It works by starting with an LCG, and then applying a permutation function to mix around the bits
and improve the quality of the results. There many possible permutation functions, and
<a href="https://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf">O’Neill’s original PCG paper</a>
provides a set of building blocks that can be combined in various ways to get generators with
different characteristics. In particular, the PCG used by Jarzynski and Olano corresponds to the
32-bit “RXS-M-XS” variant described in §6.3.4 of O’Neill. (See also the list of variants on
<a href="https://en.wikipedia.org/wiki/Permuted_congruential_generator#Variants">Wikipedia</a>).</p>
<h2 id="hash-or-prng"><a href="https://www.reedbeta.com/blog/hash-functions-for-gpu-rendering/#hash-or-prng" title="Permalink to this section">Hash or PRNG?</a></h2>
<p>One of the main points I discussed in my 2013 article was the distinction between PRNGs and hash
functions: the former are designed for a good distribution <em>within</em> a single stateful stream, but do
not necessarily provide good distribution <em>across</em> streams with consecutive seeds; hash functions are
stateless and designed to give a good distribution even with consecutive (or otherwise highly
correlated) inputs.</p>
<p>PCG is actually designed to be a PRNG, <em>not</em> a hash function, so it may surprise you to see it being
used as a hash here. What gives? Well, apparently PCG is just so good that it works well as a hash
function too! ¯\_(ツ)_/¯</p>
<p>It’s worth noting that PCG <em>does</em> support more or less efficient jump-ahead, owing to the LCG at its
core; it’s possible to advance an LCG by $n$ steps in only $O(\log n)$ work using
<a href="https://www.nayuki.io/page/fast-skipping-in-a-linear-congruential-generator">modular exponentiation</a>.
However, that is not what Jarzynski and Olano’s code does: it’s not jumping ahead to the $n$th
value in a single PCG sequence, but essentially just taking the first value from each of $n$
sequences with consecutive initial states. The fact that this works at all is somewhat surprising,
and a testament to the power of permutation functions.</p>
<p>In my previous article, I also recommended that if you need multiple random values per pixel, you
could start with a hash function and then iterate either LCG or Xorshift using the hash output as an
initial state. You can still do that, using PCG as the initial hash—but it might be just as fast
to iterate PCG. The interesting thing about PCG’s design is that only the LCG portion of it actually
carries data dependencies from one iteration to the next, and LCGs are super fast. The permutation
parts are independent of each other and can be pipelined to exploit instruction-level parallelism
when doing multiple iterations.</p>
<p>For completeness, the “PRNG form” of the above PCG variant looks like:</p>
<div class="codehilite"><pre><span></span><code><span class="kt">uint</span> <span class="n">rng_state</span><span class="p">;</span>
<span class="kt">uint</span> <span class="n">rand_pcg</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">uint</span> <span class="n">state</span> <span class="o">=</span> <span class="n">rng_state</span><span class="p">;</span>
<span class="n">rng_state</span> <span class="o">=</span> <span class="n">rng_state</span> <span class="o">*</span> <span class="mi">747796405</span><span class="n">u</span> <span class="o">+</span> <span class="mi">2891336453</span><span class="n">u</span><span class="p">;</span>
<span class="kt">uint</span> <span class="n">word</span> <span class="o">=</span> <span class="p">((</span><span class="n">state</span> <span class="o">>></span> <span class="p">((</span><span class="n">state</span> <span class="o">>></span> <span class="mi">28</span><span class="n">u</span><span class="p">)</span> <span class="o">+</span> <span class="mi">4</span><span class="n">u</span><span class="p">))</span> <span class="o">^</span> <span class="n">state</span><span class="p">)</span> <span class="o">*</span> <span class="mi">277803737</span><span class="n">u</span><span class="p">;</span>
<span class="k">return</span> <span class="p">(</span><span class="n">word</span> <span class="o">>></span> <span class="mi">22</span><span class="n">u</span><span class="p">)</span> <span class="o">^</span> <span class="n">word</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div>
<p>That’s about it! Be sure to check out <a href="http://jcgt.org/published/0009/03/02/">Jarzynski and Olano’s paper</a>
for some more tidbits, including a discussion of hashes with multi-dimensional inputs and outputs.</p>Making Your Own Container Compatible With C++20 Ranges
https://www.reedbeta.com/blog/ranges-compatible-containers/
https://www.reedbeta.com/blog/ranges-compatible-containers/Nathan ReedSat, 20 Mar 2021 17:23:15 -0700https://www.reedbeta.com/blog/ranges-compatible-containers/#commentsCoding<p>With some of my spare time lately, I’ve been enjoying learning about some of the new features in
C++20. <a href="https://en.cppreference.com/w/cpp/language/constraints">Concepts</a> and the closely-related
<a href="https://akrzemi1.wordpress.com/2020/03/26/requires-clause/"><code>requires</code> clauses</a> are two great
extensions to template syntax that remove the necessity for all the SFINAE junk we used to have to
do, making our code both more readable and more precise, and providing much better error messages
(although MSVC has sadly been <a href="https://developercommunity.visualstudio.com/t/786814">lagging in the error messages department</a>,
at the time of this writing).</p>
<p>Another interesting C++20 feature is the addition of the <a href="https://en.cppreference.com/w/cpp/ranges">ranges library</a>
(also <a href="https://en.cppreference.com/w/cpp/algorithm/ranges">ranges algorithms</a>), which provides a
nicer, more composable abstraction for operating on containers and sequences of objects. At the most
basic level, a range wraps an iterator begin/end pair, but there’s much more to it than that. This
article isn’t going to be a tutorial on ranges, but <a href="https://www.youtube.com/watch?v=VmWS-9idT3s">here’s a talk</a>
to watch if you want to see more of what it’s all about.</p>
<p>What I’m going to discuss today is the process of adding “ranges compatibility” to your own container
class. Many of the C++ codebases we work in have their own set of container classes beyond the STL
ones, for a variety of reasons—<a href="/blog/data-oriented-hash-table/">better performance</a>, more control
over memory layouts, more customized interfaces, and so on. With a little work, it’s possible to
make your custom containers also function as ranges and interoperate with the C++20 ranges library.
Here’s how to do it.</p>
<!--more-->
<div class="toc">
<ul>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#making-your-container-an-input-range">Making Your Container an Input Range</a><ul>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#range-concepts">Range Concepts</a></li>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#defining-range-compatible-iterators">Defining Range-Compatible Iterators</a></li>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#begin-end-size">Begin, End, Size</a></li>
</ul>
</li>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#accepting-output-from-ranges">Accepting Output From Ranges</a><ul>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#constructor-from-a-range">Constructor From A Range</a></li>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#output-iterators">Output Iterators</a></li>
</ul>
</li>
<li><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="making-your-container-an-input-range"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#making-your-container-an-input-range" title="Permalink to this section">Making Your Container an Input Range</a></h2>
<p>At the high level, there are two basic ways that a container class can interact with ranges. First,
it can be <em>readable</em> as a range, meaning that we can iterate over it, pipe it into views and pass it
to range algorithms, and so forth. In the parlance of the ranges library, this is known as being an
<em>input range</em>: a range that can provide input to other things.</p>
<p>The other direction is to accept output <em>from</em> ranges, storing the output into your container.
We’ll do that later. To begin with, let’s see how to make your container act as an input range.</p>
<h3 id="range-concepts"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#range-concepts" title="Permalink to this section">Range Concepts</a></h3>
<p>The first decision we have to make is what particular kind of input range we can model. The C++20
STL defines a number of different <a href="https://en.cppreference.com/w/cpp/ranges#Range_concepts">concepts for ranges</a>,
depending on the capabilities of their iterators and other things. Several of these form a hierarchy
from more general to more specific kinds of ranges with tighter requirements. Generally speaking, it’s
best for your container to implement the most specific range concept it’s able to. This enables code
that works with ranges to make better decisions and use more optimal code paths. (We’ll see some
examples of this in a minute.)</p>
<p>The relevant input range concepts are:</p>
<ul>
<li><code>std::ranges::input_range</code>: the most bare-bones version. It requires only that you have iterators
that can retrieve the contents of the range. In particular, it <em>doesn’t</em> require that the range
can be iterated more than once: iterators are not required to be copyable, and <code>begin</code>/<code>end</code> are
not required to give you the iterators more than once. This could be an appropriate concept for
ranges that are actually generating their contents as the result of some algorithm that’s not
easily/cheaply repeatable, or receiving data from a network connection or suchlike.</li>
<li><code>std::ranges::forward_range</code>: the range can be iterated as many times as you like, but only in
the forward direction. Iterators can be copied and saved off to later resume iteration from an
earlier point, for example.</li>
<li><code>std::ranges::bidirectional_range</code>: iterators can be decremented as well as incremented.</li>
<li><code>std::ranges::random_access_range</code>: you can efficiently do arithmetic on iterators—you can
offset them forward or backward by a given number of steps, or subtract them to find the number
of steps between.</li>
<li><code>std::ranges::contiguous_range</code>: the elements are actually stored as a contiguous array in memory;
the iterators are essentially fancy pointers (or literally <em>are</em> just pointers).</li>
</ul>
<p>In addition to this hierarchy of input range concepts, there are a couple of other standalone ones
worth mentioning:</p>
<ul>
<li><code>std::ranges::sized_range</code>: you can efficiently get the size of the range, i.e. how many elements
from begin to end. Note that this is a much looser constraint than <code>random_access_range</code>: the
latter requires you be able to efficiently measure the distance between <em>any pair</em> of iterators
inside the range, while <code>sized_range</code> only requires that the size of the <em>whole range</em> is known.</li>
<li><code>std::ranges::borrowed_range</code>: indicates that a range doesn’t own its data, i.e. it’s referencing
(“borrowing”) data that lives somewhere else. This can be useful because it allows references/iterators
into the data to survive beyond the lifetime of the range object itself.</li>
</ul>
<p>The reason all these concepts are important is that if I’m writing code that operates on ranges, I might need to
require some of these concepts in order to do my work efficiently. For example, a sorting routine
would be very difficult to write for anything less than a <code>random_access_range</code> (and indeed you’ll
see that <a href="https://en.cppreference.com/w/cpp/algorithm/ranges/sort"><code>std::ranges::sort</code> requires that</a>).
In other cases, I might be able to do things more optimally when the range satisfies certain
concepts—for instance, if it’s a <code>sized_range</code>, I could preallocate some storage for results,
while if it’s only an <code>input_range</code> and no more, then I’ll have to dynamically reallocate, as I have
no idea how many elements there are going to be.</p>
<p>The rest of the ranges library is written in terms of these concepts (and you can write your own
code that operates generically on ranges using these concepts as well). So, once your container
satisfies the relevant concepts, it will automatically be recognized and function as a range!</p>
<p>In C++20, concepts act as boolean expressions, so you can check whether your container satisfies the
concepts you expect by just writing asserts for them:</p>
<div class="codehilite"><pre><span></span><code><span class="cp">#include</span> <span class="cpf"><ranges></span><span class="cp"></span>
<span class="k">static_assert</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">forward_range</span><span class="o"><</span><span class="n">MyCoolContainer</span><span class="o"><</span><span class="kt">int</span><span class="o">>></span><span class="p">);</span>
<span class="c1">// int is just an arbitrarily chosen element type, since we</span>
<span class="c1">// can't assert a concept for an uninstantiated template</span>
</code></pre></div>
<p>Checks like this are great to add to your test suite—I’m big in favor of writing <em>compile-time</em>
tests for generic/metaprogramming stuff, in addition to the usual runtime tests.</p>
<p>However, when you first drop that assert into your code, it will almost certainly fail. Let’s see
now what you need to do to actually satisfy the range concepts.</p>
<h3 id="defining-range-compatible-iterators"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#defining-range-compatible-iterators" title="Permalink to this section">Defining Range-Compatible Iterators</a></h3>
<p>In order to satisfy the input range concepts, you need to do two things:</p>
<ul>
<li>Have <code>begin</code> and <code>end</code> functions that return some iterator and sentinel types. (We’ll discuss
these in a little bit.)</li>
<li>The iterator type must satisfy the iterator concept that matches your range concept.</li>
</ul>
<p>Each one of the concepts from <code>input_range</code> down to <code>contiguous_range</code> has a corresponding
<a href="https://en.cppreference.com/w/cpp/header/iterator#Iterator_concepts">iterator concept</a>:
<code>std::input_iterator</code>, <code>std::forward_iterator</code>, and so on. It’s these concepts that contain the real
meat of the requirements that define the different types of ranges: they list all the operations
each kind of iterator must support.</p>
<p>To begin with, there are a couple of member type aliases that any iterator class will need to define:</p>
<ul>
<li><code>difference_type</code>: some signed integer type, usually <code>std::ptrdiff_t</code></li>
<li><code>value_type</code>: the type of elements that the iterator references</li>
</ul>
<p>The second one seems pretty understandable, but I honestly have no idea why the <code>difference_type</code>
requirement is here. Taking the difference between iterators doesn’t make sense until you get to
random-access iterators, which actually define that operation. As far as I can tell, the
<code>difference_type</code> for more general iterators isn’t actually <em>used</em> by anything. Nevertheless,
according to the C++ standard, it has to be there. It seems that the usual idiom is to set it to
<code>std::ptrdiff_t</code> in such cases, although it can be any signed integer type.</p>
<p>(Technically you can also define these types by specializing <code>std::iterator_traits</code> for your iterator,
but here we’re just going to put them in the class.)</p>
<p>Beyond that, the requirements for <code>std::input_iterator</code> are pretty straightforward:</p>
<ul>
<li>The iterator must be default-initializable and movable. (It doesn’t have to be copyable.)</li>
<li>It must be equality-comparable with its sentinel (the value marking the end of the range). It
doesn’t have to be equality-comparable with other iterators.</li>
<li>It must implement <code>operator ++</code>, in <em>both</em> preincrement and postincrement positions. However, the
postincrement version does not have to return anything.</li>
<li>It must have an <code>operator *</code> that returns a reference to whatever the <code>value_type</code> is.</li>
</ul>
<p>One point of interest here is that the default-initializable requirement means that the iterator class
can’t contain references, e.g. a reference to the container it comes from. It can store pointers,
though.</p>
<p>A prototype input iterator class could look like this:</p>
<div class="codehilite"><pre><span></span><code><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="o">></span>
<span class="k">class</span> <span class="nc">Iterator</span>
<span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="k">using</span> <span class="n">difference_type</span> <span class="o">=</span> <span class="n">std</span><span class="o">::</span><span class="kt">ptrdiff_t</span><span class="p">;</span>
<span class="k">using</span> <span class="n">value_type</span> <span class="o">=</span> <span class="n">T</span><span class="p">;</span>
<span class="n">Iterator</span><span class="p">();</span> <span class="c1">// default-initializable</span>
<span class="kt">bool</span> <span class="k">operator</span> <span class="o">==</span> <span class="p">(</span><span class="k">const</span> <span class="n">Sentinel</span><span class="o">&</span><span class="p">)</span> <span class="k">const</span><span class="p">;</span> <span class="c1">// equality with sentinel</span>
<span class="n">T</span><span class="o">&</span> <span class="k">operator</span> <span class="o">*</span> <span class="p">()</span> <span class="k">const</span><span class="p">;</span> <span class="c1">// dereferenceable</span>
<span class="n">Iterator</span><span class="o">&</span> <span class="k">operator</span> <span class="o">++</span> <span class="p">()</span> <span class="c1">// pre-incrementable</span>
<span class="p">{</span> <span class="cm">/*do stuff...*/</span> <span class="k">return</span> <span class="o">*</span><span class="k">this</span><span class="p">;</span> <span class="p">}</span>
<span class="kt">void</span> <span class="k">operator</span> <span class="o">++</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span> <span class="c1">// post-incrementable</span>
<span class="p">{</span> <span class="o">++*</span><span class="k">this</span><span class="p">;</span> <span class="p">}</span>
<span class="k">private</span><span class="o">:</span>
<span class="c1">// implementation...</span>
<span class="p">};</span>
</code></pre></div>
<p>For a <code>std::forward_iterator</code>, the requirements are just slightly tighter:</p>
<ul>
<li>The iterator must be copyable.</li>
<li>It must be equality-comparable with other iterators of the same container.</li>
<li>The postincrement operator must return a copy of the iterator before modification.</li>
</ul>
<p>A prototype forward iterator class could look like:</p>
<div class="codehilite"><pre><span></span><code><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="o">></span>
<span class="k">class</span> <span class="nc">Iterator</span>
<span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="c1">// ...same as the previous one, except:</span>
<span class="kt">bool</span> <span class="k">operator</span> <span class="o">==</span> <span class="p">(</span><span class="k">const</span> <span class="n">Iterator</span><span class="o">&</span><span class="p">)</span> <span class="k">const</span><span class="p">;</span> <span class="c1">// equality with iterators</span>
<span class="n">Iterator</span> <span class="k">operator</span> <span class="o">++</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span> <span class="c1">// post-incrementable, returns prev value</span>
<span class="p">{</span> <span class="n">Iterator</span> <span class="n">temp</span> <span class="o">=</span> <span class="o">*</span><span class="k">this</span><span class="p">;</span> <span class="o">++*</span><span class="k">this</span><span class="p">;</span> <span class="k">return</span> <span class="n">temp</span><span class="p">;</span> <span class="p">}</span>
<span class="p">};</span>
</code></pre></div>
<p>I’m not going to go through the rest of them in detail; you can read the details
<a href="https://en.cppreference.com/w/cpp/header/iterator#Iterator_concepts">on cppreference</a>.</p>
<h3 id="begin-end-size"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#begin-end-size" title="Permalink to this section">Begin, End, Size</a></h3>
<p>Once your container is equipped with an iterator class that satisfies the relevant concepts, you’ll
need to provide <code>begin</code> and <code>end</code> functions to get those iterators. There are three ways to do this:
they can be member functions on the container, they can be free functions that live next to the
container in the same namespace, or they can be <a href="https://www.justsoftwaresolutions.co.uk/cplusplus/hidden-friends.html">“hidden friends”</a>;
they just need to be findable by <a href="https://en.cppreference.com/w/cpp/language/adl">ADL</a>.</p>
<p>The return types from <code>begin</code> and <code>end</code> don’t have to be the same. In some cases, it can be useful
to have <code>end</code> return a different type of object, a “sentinel”, which isn’t actually an iterator; it
just needs to be equality-comparable with iterators, so you can tell when you’ve gotten to the end
of the container.</p>
<p>Also, these are the same <code>begin</code>/<code>end</code> used for <a href="https://en.cppreference.com/w/cpp/language/range-for">range-based <code>for</code> loops</a>.</p>
<p>One oddity worth mentioning here is that if you go the free/friend functions route, you’ll need to
add overloads for both const and non-const versions of your container:</p>
<div class="codehilite"><pre><span></span><code><span class="k">class</span> <span class="nc">MyCoolContainer</span><span class="p">;</span>
<span class="k">auto</span> <span class="n">begin</span><span class="p">(</span><span class="k">const</span> <span class="n">MyCoolContainer</span><span class="o">&</span> <span class="n">c</span><span class="p">);</span>
<span class="k">auto</span> <span class="n">end</span><span class="p">(</span><span class="k">const</span> <span class="n">MyCoolContainer</span><span class="o">&</span> <span class="n">c</span><span class="p">);</span>
<span class="k">auto</span> <span class="n">begin</span><span class="p">(</span><span class="n">MyCoolContainer</span><span class="o">&</span> <span class="n">c</span><span class="p">);</span>
<span class="k">auto</span> <span class="n">end</span><span class="p">(</span><span class="n">MyCoolContainer</span><span class="o">&</span> <span class="n">c</span><span class="p">);</span>
</code></pre></div>
<p>You might think it would be enough to provide just the const overloads, but if you do that, only the
const version of the container will be recognized as a range! The non-const overloads must be
present as well for non-const containers to work.</p>
<p>Curiously, if you provide <code>begin</code>/<code>end</code> as member functions instead, then this doesn’t come up:
const overloads will work for both.</p>
<p>This behavior is surprising, and I’m not sure if it was intended. However, it’s worth noting that
iterators generally need to remember the constness of the container they came from: a const
container should give you a “const iterator” that doesn’t allow mutating its elements. Therefore,
the const and non-const overloads of <code>begin</code>/<code>end</code> will generally need to return <em>different</em>
iterator types, and so you’ll need to have both in any case. (The exception would be if you’re
building an immutable container; then it only needs a const iterator type.)</p>
<p>In addition to <code>begin</code> and <code>end</code>, you’ll also want to implement a <code>size</code> function, if applicable.
Again, this can be either a member function, a free function, or a hidden friend. The
presence of this function satisfies <code>std::ranges::sized_range</code>, which (as mentioned earlier) can
enable range algorithms to operate more efficiently.</p>
<p>So, to sum up: to allow your custom container class to be readable as a range, you’ll need to:</p>
<ol>
<li>Decide which range concept(s) you can model, which mainly comes down to what level of iterator
capabilities you can provide;</li>
<li>Implement iterator classes (both const and non-const, if applicable) that fulfill all the
requirements of the chosen iterator concept;</li>
<li>Implement <code>begin</code>, <code>end</code>, and <code>size</code> functions.</li>
</ol>
<p>Once we’ve done this, the ranges library should recognize your container as a range. It will
automatically be accepted by range algorithms, we can take views of it, we can iterate over it in
range-for loops, and so on.</p>
<p>As before, you can test that you’ve done everything correctly by asserting that your container
satisfies the expected range concepts. If you’re working with gcc or clang, this will even give you
some pretty reasonable error messages if you didn’t get it right! (In MSVC, for the time being, you’ll
have to narrow down errors by popping open the hood and asserting each of the concept’s sub-clauses
one at a time, to see which one(s) failed.)</p>
<h2 id="accepting-output-from-ranges"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#accepting-output-from-ranges" title="Permalink to this section">Accepting Output From Ranges</a></h2>
<p>We’ve discussed how to make a custom container serve as input <em>to</em> the C++20 ranges library. Now, we
need to come back to the other direction: how to let your container capture output <em>from</em> the
ranges library.</p>
<p>There are a couple of different forms this can take. One way is to accept generic ranges as
parameters to a constructor (or other methods, such as append or insert methods) of your container
class. This allows, for example, easily converting other containers (that are also range-compatible)
to your container. It also allows capturing the output of a ranges “pipeline” (a series of views
chained together).</p>
<p>Another form of range output, which comes up with certain of the <a href="https://en.cppreference.com/w/cpp/algorithm/ranges">range algorithms</a>,
is via <em>output iterators</em>, which are iterators that allow storing or inserting values into your
container.</p>
<h3 id="constructor-from-a-range"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#constructor-from-a-range" title="Permalink to this section">Constructor From A Range</a></h3>
<p>To write a constructor (or other method) that takes a generic range parameter, we can use the same
range concepts we saw earlier. One neat new feature in C++20 is writing functions with a parameter
type (or return type) constrained to match a given concept. The syntax looks like this:</p>
<div class="codehilite"><pre><span></span><code><span class="cp">#include</span> <span class="cpf"><ranges></span><span class="cp"></span>
<span class="k">class</span> <span class="nc">MyCoolContainer</span>
<span class="p">{</span>
<span class="k">public</span><span class="o">:</span>
<span class="k">explicit</span> <span class="n">MyCoolContainer</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">input_range</span> <span class="k">auto</span><span class="o">&&</span> <span class="n">range</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="k">auto</span><span class="o">&&</span> <span class="nl">item</span> <span class="p">:</span> <span class="n">range</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// process the item</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">};</span>
</code></pre></div>
<p>The syntax <code>concept-name auto</code> for the parameter type reminds us that concepts aren’t types; this
is still, under the hood, a template function that’s performing argument type deduction (hence the
<code>auto</code>). In other words, the above is syntactic sugar for:</p>
<div class="codehilite"><pre><span></span><code><span class="k">template</span> <span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">input_range</span> <span class="n">R</span><span class="o">></span>
<span class="k">explicit</span> <span class="n">MyCoolContainer</span><span class="p">(</span><span class="n">R</span><span class="o">&&</span> <span class="n">range</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div>
<p>which is in turn sugar for:</p>
<div class="codehilite"><pre><span></span><code><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="nc">R</span><span class="o">></span>
<span class="k">requires</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">input_range</span><span class="o"><</span><span class="n">R</span><span class="o">></span><span class="p">)</span>
<span class="k">explicit</span> <span class="n">MyCoolContainer</span><span class="p">(</span><span class="n">R</span><span class="o">&&</span> <span class="n">range</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div>
<p>I prefer the shorthand <code>std::ranges::input_range auto</code> syntax, but <del>at the time of this writing
MSVC’s support for it is still shaky</del>. (<em>Update: fixed in 16.10!</em>) If in doubt, use
the syntax <code>template <std::ranges::input_range R></code>.</p>
<p>In any case, constraining the parameter type to satisfy <code>input_range</code> allows this constructor
overload to accept anything out there that implements <code>begin</code>, <code>end</code>, and iterators, as we’ve seen
in previous sections. You can then iterate over it generically and do whatever you want with the
results.</p>
<p>The range parameter is declared as <code>auto&&</code> to make it a <a href="https://isocpp.org/blog/2012/11/universal-references-in-c11-scott-meyers">universal reference</a>,
meaning that it can accept either lvalues or rvalues; in particular, it can accept the result of a
function call returning a range, and it can accept the result of a pipeline:</p>
<div class="codehilite"><pre><span></span><code><span class="n">MyCoolContainer</span> <span class="n">c</span><span class="p">{</span> <span class="n">another_range</span> <span class="o">|</span>
<span class="n">std</span><span class="o">::</span><span class="n">views</span><span class="o">::</span><span class="n">transform</span><span class="p">(</span><span class="n">blah</span><span class="p">)</span> <span class="o">|</span>
<span class="n">std</span><span class="o">::</span><span class="n">views</span><span class="o">::</span><span class="n">filter</span><span class="p">(</span><span class="n">blah</span><span class="p">)</span> <span class="p">};</span>
</code></pre></div>
<p>A completely generic range-accepting method like this might not be the most useful thing. If we have
a container storing <code>int</code> values, for example, it wouldn’t make a lot of sense for us to accept
ranges of strings or other arbitrary types. We’d like to be able to put some additional constraints
on the <em>element type</em> of the range: perhaps we only want element types that are convertible to <code>int</code>.</p>
<p>Helpfully, the ranges library provides a template <a href="https://en.cppreference.com/w/cpp/ranges/iterator_t"><code>range_value_t</code></a>
that retrieves the element type of a range—namely, the <code>value_type</code> declared by the range’s
iterator. With this, we can state additional constraints like so:</p>
<div class="codehilite"><pre><span></span><code><span class="k">explicit</span> <span class="n">MyCoolContainer</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">input_range</span> <span class="k">auto</span><span class="o">&&</span> <span class="n">range</span><span class="p">)</span>
<span class="k">requires</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">convertible_to</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">range_value_t</span><span class="o"><</span><span class="k">decltype</span><span class="p">(</span><span class="n">range</span><span class="p">)</span><span class="o">></span><span class="p">,</span> <span class="kt">int</span><span class="o">></span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div>
<p>We can even define a concept that wraps up these requirements:</p>
<div class="codehilite"><pre><span></span><code><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="nc">R</span><span class="p">,</span> <span class="k">typename</span> <span class="nc">T</span><span class="o">></span>
<span class="k">concept</span> <span class="nc">input_range_of</span> <span class="o">=</span>
<span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">input_range</span><span class="o"><</span><span class="n">R</span><span class="o">></span> <span class="o">&&</span>
<span class="n">std</span><span class="o">::</span><span class="n">convertible_to</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">range_value_t</span><span class="o"><</span><span class="n">R</span><span class="o">></span><span class="p">,</span> <span class="n">T</span><span class="o">></span><span class="p">;</span>
</code></pre></div>
<p>and then use it as follows:</p>
<div class="codehilite"><pre><span></span><code><span class="k">explicit</span> <span class="n">MyCoolContainer</span><span class="p">(</span><span class="n">input_range_of</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="k">auto</span><span class="o">&&</span> <span class="n">range</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// ...</span>
<span class="p">}</span>
</code></pre></div>
<p>Something like this should be in the standard library, IMO.</p>
<p>You can also choose to require one of the more specialized concepts, like <code>forward_range</code> or
<code>random_access_range</code>, if you need those extra capabilities for whatever you’re doing.
However, just as a container should generally implement the most <em>specific</em> range concept it can
provide, a function that takes a range parameter should generally require the most <em>general</em> range
concept it can deal with, or it will unduly restrict what kind of ranges can be passed to it.</p>
<p>That said, there might be cases where you can switch to a more efficient implementation if the range
satisfies some extra requirements. For example, if it’s a <code>sized_range</code>, then you might be able to
reserve storage before inserting the elements. You can test for this inside your function body using
<code>if constexpr</code>:</p>
<div class="codehilite"><pre><span></span><code><span class="k">explicit</span> <span class="n">MyCoolContainer</span><span class="p">(</span><span class="n">input_range_of</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="k">auto</span><span class="o">&&</span> <span class="n">range</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="k">constexpr</span> <span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">sized_range</span><span class="o"><</span><span class="k">decltype</span><span class="p">(</span><span class="n">range</span><span class="p">)</span><span class="o">></span><span class="p">)</span>
<span class="p">{</span>
<span class="n">reserve</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">ranges</span><span class="o">::</span><span class="n">size</span><span class="p">(</span><span class="n">range</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">for</span> <span class="p">(</span><span class="k">auto</span><span class="o">&&</span> <span class="nl">item</span> <span class="p">:</span> <span class="n">range</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// process the item</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div>
<p>Here, <a href="https://en.cppreference.com/w/cpp/ranges/size"><code>std::ranges::size</code></a> is a convenience wrapper
that knows how to call the range’s associated <code>size</code> function, whether it’s implemented as a method
or a free function.</p>
<p>You could also do things like: check if the range is a <code>contiguous_range</code> and the item is something
trivially copyable, and switch to <code>memcpy</code> rather than iterating over all the items.</p>
<h3 id="output-iterators"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#output-iterators" title="Permalink to this section">Output Iterators</a></h3>
<p>Range views and pipelines operate on a “pull” model, where the pipeline is represented by a proxy
range object that generates its results lazily when you iterate it. Taking generic range objects as
parameters to your container is an easy and useful way to consume such objects, and that probably suffices
for most uses. However, there are a handful of bits in the ranges library that operate on a “push”
model, where you call a function that wants to store values into your container via an output
iterator. This comes up with <a href="https://en.cppreference.com/w/cpp/algorithm/ranges#Modifying_sequence_operations">certain ranges algorithms</a>
like <code>ranges::copy</code>, <code>ranges::transform</code>, and <code>ranges::generate</code>.</p>
<p>Personally, I don’t see a hugely compelling reason to worry about these, as it’s also possible to
use views to express the same operations; but for the sake of completeness, I’ll discuss them
briefly here.</p>
<p>At this point, it won’t surprise you to learn that just as there were concepts for input ranges,
there are also concepts <code>std::ranges::output_range</code> and <a href="https://en.cppreference.com/w/cpp/iterator/output_iterator"><code>std::output_iterator</code></a>.
In this case there’s just that one concept, not a hierarchy of refinements of them; however, if you
peruse the definitions of some of the ranges algorithms, you’ll find that many of them don’t actually
use <code>output_iterator</code>, but state slightly different, less- or more-specific requirements of their
own. (This part of the standard library feels a little less fully baked than the rest; I wouldn’t be
surprised if some of this gets elaborated or polished a bit more in C++23 or later revisions.)</p>
<p>The requirements for an output iterator (broadly construed) are very similar to those for an input
iterator, only adding that the value returned by dereferencing the iterator must be writable by
assigning to it: you must be able to do <code>*iter = foo;</code> for some appropriate type of <code>foo</code>. If you’ve
implemented a non-const input iterator, it probably satisfies the requirement already.</p>
<p>It’s also possible to do slightly more exotic things with an output iterator, like returning a proxy
object that accepts assignment and does “something” with the value assigned. An example of this is
the STL’s <a href="https://en.cppreference.com/w/cpp/iterator/back_insert_iterator"><code>std::back_insert_iterator</code></a>,
which takes whatever is assigned to it and <em>appends</em> to its container (as opposed to overwriting an
existing value in the container). The STL has a few more things like that, including an iterator
that writes characters out to an <code>ostream</code>.</p>
<p>There are also some cases amongst the ranges algorithms of “input-output” iterators, such as for
operations that reorder a range in place, like sorting. These often have a bidirectional or
random-access iterator requirement, plus needing the dereferenced types to be swappable, movable,
and varying other constraints. Those details probably aren’t going to be relevant to you unless
you’re doing something tricky, like making a container that generates elements on the fly somehow,
or returns proxy objects rather than direct references to elements (like <code>std::vector<bool></code>).</p>
<h2 id="conclusion"><a href="https://www.reedbeta.com/blog/ranges-compatible-containers/#conclusion" title="Permalink to this section">Conclusion</a></h2>
<p>The C++20 ranges library provides a lot of powerful, composable tools for manipulating sequences of
objects, and a range of specificity from the most generic and abstract container-shaped things down
to the very concrete, efficient, and practical. When working with your own container types, it
would be nice to be able to take advantage of these tools.</p>
<p>As we’ve seen, it’s hardly an onerous task to implement ranges compatibility for your own containers.
Most of the necessaries are things you were probably already doing: you probably already had an
iterator class and begin/end methods. It only takes a little bit of attention to satisfying certain
details—like adding the <code>difference_type</code> and <code>value_type</code> aliases, and making sure you can both
preincrement and postincrement—to make your iterators satisfy the STL iterator concepts, and thus
have your containers recognized as ranges. It’s also no sweat to write functions accepting generic
ranges as input, letting you store the output of other range operations into your container.</p>
<p>I hope this has been a useful peek under the hood and has given you some ideas about how your
container classes can benefit from the new C++20 features.</p>Python-Like enumerate() In C++17
https://www.reedbeta.com/blog/python-like-enumerate-in-cpp17/
http://reedbeta.com/blog/python-like-enumerate-in-cpp17/Nathan ReedSat, 24 Nov 2018 22:42:04 -0800https://www.reedbeta.com/blog/python-like-enumerate-in-cpp17/#commentsCoding<p>Python has a handy built-in function called <a href="https://docs.python.org/3/library/functions.html?highlight=enumerate#enumerate"><code>enumerate()</code></a>,
which lets you iterate over an object (e.g. a list) and have access to both the <em>index</em> and the
<em>item</em> in each iteration. You use it in a <code>for</code> loop, like this:</p>
<div class="codehilite"><pre><span></span><code><span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">thing</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">listOfThings</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s2">"The </span><span class="si">%d</span><span class="s2">th thing is </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">thing</span><span class="p">))</span>
</code></pre></div>
<p>Iterating over <code>listOfThings</code> directly would give you <code>thing</code>, but not <code>i</code>, and there are plenty of
situations where you’d want both (looking up the index in another data structure, progress reports,
error messages, generating output filenames, etc).</p>
<p>C++ <a href="https://en.cppreference.com/w/cpp/language/range-for">range-based <code>for</code> loops</a> work a lot like
Python’s <code>for</code> loops. Can we implement an analogue of Python’s <code>enumerate()</code> in C++? We can!</p>
<!--more-->
<p>C++17 added <a href="https://en.cppreference.com/w/cpp/language/structured_binding">structured bindings</a>
(also known as “destructuring” in other languages), which allow you to pull apart a tuple type and
assign the pieces to different variables, in a single statement. It turns out that this is also
allowed in range <code>for</code> loops. If the iterator returns a tuple, you can pull it apart and assign the
pieces to different loop variables.</p>
<p>The syntax for this looks like:</p>
<div class="codehilite"><pre><span></span><code><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">ThingA</span><span class="p">,</span> <span class="n">ThingB</span><span class="o">>></span> <span class="n">things</span><span class="p">;</span>
<span class="p">...</span>
<span class="k">for</span> <span class="p">(</span><span class="k">auto</span> <span class="p">[</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">]</span> <span class="o">:</span> <span class="n">things</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// a gets the ThingA and b gets the ThingB from each tuple</span>
<span class="p">}</span>
</code></pre></div>
<p>So, we can implement <code>enumerate()</code> by creating an iterable object that wraps another iterable and
generates the indices during iteration. Then we can use it like this:</p>
<div class="codehilite"><pre><span></span><code><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">Thing</span><span class="o">></span> <span class="n">things</span><span class="p">;</span>
<span class="p">...</span>
<span class="k">for</span> <span class="p">(</span><span class="k">auto</span> <span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">thing</span><span class="p">]</span> <span class="o">:</span> <span class="n">enumerate</span><span class="p">(</span><span class="n">things</span><span class="p">))</span>
<span class="p">{</span>
<span class="c1">// i gets the index and thing gets the Thing in each iteration</span>
<span class="p">}</span>
</code></pre></div>
<p>The implementation of <code>enumerate()</code> is pretty short, and I present it here for your use:</p>
<div class="codehilite"><pre><span></span><code><span class="cp">#include</span> <span class="cpf"><tuple></span><span class="cp"></span>
<span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="nc">T</span><span class="p">,</span>
<span class="k">typename</span> <span class="nc">TIter</span> <span class="o">=</span> <span class="k">decltype</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">begin</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">declval</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">())),</span>
<span class="k">typename</span> <span class="o">=</span> <span class="k">decltype</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">end</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">declval</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">()))</span><span class="o">></span>
<span class="k">constexpr</span> <span class="k">auto</span> <span class="n">enumerate</span><span class="p">(</span><span class="n">T</span> <span class="o">&&</span> <span class="n">iterable</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">struct</span> <span class="nc">iterator</span>
<span class="p">{</span>
<span class="kt">size_t</span> <span class="n">i</span><span class="p">;</span>
<span class="n">TIter</span> <span class="n">iter</span><span class="p">;</span>
<span class="kt">bool</span> <span class="k">operator</span> <span class="o">!=</span> <span class="p">(</span><span class="k">const</span> <span class="n">iterator</span> <span class="o">&</span> <span class="n">other</span><span class="p">)</span> <span class="k">const</span> <span class="p">{</span> <span class="k">return</span> <span class="n">iter</span> <span class="o">!=</span> <span class="n">other</span><span class="p">.</span><span class="n">iter</span><span class="p">;</span> <span class="p">}</span>
<span class="kt">void</span> <span class="k">operator</span> <span class="o">++</span> <span class="p">()</span> <span class="p">{</span> <span class="o">++</span><span class="n">i</span><span class="p">;</span> <span class="o">++</span><span class="n">iter</span><span class="p">;</span> <span class="p">}</span>
<span class="k">auto</span> <span class="k">operator</span> <span class="o">*</span> <span class="p">()</span> <span class="k">const</span> <span class="p">{</span> <span class="k">return</span> <span class="n">std</span><span class="o">::</span><span class="n">tie</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="o">*</span><span class="n">iter</span><span class="p">);</span> <span class="p">}</span>
<span class="p">};</span>
<span class="k">struct</span> <span class="nc">iterable_wrapper</span>
<span class="p">{</span>
<span class="n">T</span> <span class="n">iterable</span><span class="p">;</span>
<span class="k">auto</span> <span class="n">begin</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="n">iterator</span><span class="p">{</span> <span class="mi">0</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">begin</span><span class="p">(</span><span class="n">iterable</span><span class="p">)</span> <span class="p">};</span> <span class="p">}</span>
<span class="k">auto</span> <span class="n">end</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="n">iterator</span><span class="p">{</span> <span class="mi">0</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">end</span><span class="p">(</span><span class="n">iterable</span><span class="p">)</span> <span class="p">};</span> <span class="p">}</span>
<span class="p">};</span>
<span class="k">return</span> <span class="n">iterable_wrapper</span><span class="p">{</span> <span class="n">std</span><span class="o">::</span><span class="n">forward</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">iterable</span><span class="p">)</span> <span class="p">};</span>
<span class="p">}</span>
</code></pre></div>
<p>This uses SFINAE to ensure it can only be applied to iterable types, and will generate readable
error messages if used on something else. It accepts its parameter as an rvalue reference so you can
apply it to temporary values (e.g. directly to the return value of a function call) as well as to
variables and members.</p>
<p>This compiles without warnings in C++17 mode on gcc 8.2, clang 6.0, and MSVC 15.9. I’ve banged on it
a bit to ensure it doesn’t incur any extra copies, and it works as expected with either const or
non-const containers. It seems to optimize away pretty cleanly, too! 🤘</p>Using A Custom Toolchain In Visual Studio With MSBuild
https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/
http://reedbeta.com/blog/custom-toolchain-with-msbuild/Nathan ReedTue, 20 Nov 2018 13:34:01 -0800https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#commentsCoding<p>Like many of you, when I work on a graphics project I sometimes have a need to compile some shaders.
Usually, I’m writing in C++ using Visual Studio, and I’d like to get my shaders built using the
same workflow as the rest of my code. Visual Studio these days has built-in support for HLSL via
<code>fxc</code>, but what if we want to use the next-gen <a href="https://github.com/Microsoft/DirectXShaderCompiler"><code>dxc</code></a>
compiler?</p>
<p>This post is a how-to for adding support for a custom toolchain—such as <code>dxc</code>, or any other
command-line-invokable tool—to a Visual Studio project, by scripting MSBuild (the underlying build
system Visual Studio uses). We won’t quite make it to parity with a natively integrated language,
but we’re going to get as close as we can.</p>
<!--more-->
<p>If you don’t want to read all the explanation but just want some working code to look at, jump down
to the <a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#example-project">Example Project</a> section.</p>
<p>This article is written against Visual Studio 2017, but it may also work in some earlier VSes
(I haven’t tested).</p>
<div class="toc">
<ul>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#msbuild">MSBuild</a></li>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#adding-a-custom-target">Adding A Custom Target</a></li>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#invoking-the-tool">Invoking The Tool</a></li>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#incremental-builds">Incremental Builds</a></li>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#header-dependencies">Header Dependencies</a></li>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#errorwarning-parsing">Error/Warning Parsing</a></li>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#example-project">Example Project</a></li>
<li><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#the-next-level">The Next Level</a></li>
</ul>
</div>
<h2 id="msbuild"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#msbuild" title="Permalink to this section">MSBuild</a></h2>
<p>Before we begin, it’s important you understand what we’re getting into. Not to mince words, but
MSBuild is a <a href="http://wiki.c2.com/?StringlyTyped">stringly typed</a>, semi-documented, XML-guzzling,
paradigmatically muddled, cursed hellmaze. However, it <em>does</em> ship with Visual Studio, so if you
can use it for your custom build steps, then you don’t need to deal with any extra add-ins or
software installs.</p>
<p>To be fair, MSBuild is <a href="https://github.com/Microsoft/msbuild">open-source on GitHub</a>, so at least
in principle you can dive into it and see what the cursed hellmaze is doing. However, I’ll warn you
up front that many of the most interesting parts vis-à-vis Visual Studio integration are <em>not</em>
included in the Git repo, but are hidden away in VS’s build extension DLLs. (More about that later.)</p>
<p>My jumping-off point for this enterprise was <a href="http://miken-1gam.blogspot.com/2013/01/visual-studio-and-custom-build-rules.html">this blog post by Mike Nicolella</a>.
Mike showed how to set up an MSBuild <code>.targets</code> file to create an association between a specific file
extension in your project, and a build rule (“target”, in MSBuild parlance) to process those files.
We’ll review how that works, then extend it and jazz it up a bit to get some more quality-of-life
features.</p>
<p>MSBuild docs (such as they are) can be found <a href="https://docs.microsoft.com/en-us/visualstudio/msbuild/msbuild?view=vs-2017">on MSDN here</a>.
Some more information can be gleaned by looking at the C++ build rules installed with Visual
Studio; on my machine they’re in <code>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets</code>.
For example, the file <code>Microsoft.CppCommon.targets</code> in that directory contains most of the target
definitions for C++ compilation, linking, resources and manifests, and so on.</p>
<h2 id="adding-a-custom-target"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#adding-a-custom-target" title="Permalink to this section">Adding A Custom Target</a></h2>
<p>As shown in Mike’s blog post, we can define our own build rule using a couple of XML files which
will be imported into the VS project. (I’ll keep using shader compilation with <code>dxc</code> as my running
example, but this approach can be adapted for a lot of other things, too.)</p>
<p>First, create a file <code>dxc.targets</code>—in your project directory, or really anywhere—containing
the following:</p>
<div class="codehilite"><pre><span></span><code><span class="cp"><?xml version="1.0" encoding="utf-8"?></span>
<span class="nt"><Project</span> <span class="na">xmlns=</span><span class="s">"http://schemas.microsoft.com/developer/msbuild/2003"</span><span class="nt">></span>
<span class="nt"><ItemGroup></span>
<span class="c"><!-- Include definitions from dxc.xml, which defines the DXCShader item. --></span>
<span class="nt"><PropertyPageSchema</span> <span class="na">Include=</span><span class="s">"$(MSBuildThisFileDirectory)dxc.xml"</span> <span class="nt">/></span>
<span class="c"><!-- Hook up DXCShader items to be built by the DXC target. --></span>
<span class="nt"><AvailableItemName</span> <span class="na">Include=</span><span class="s">"DXCShader"</span><span class="nt">></span>
<span class="nt"><Targets></span>DXC<span class="nt"></Targets></span>
<span class="nt"></AvailableItemName></span>
<span class="nt"></ItemGroup></span>
<span class="nt"><Target</span>
<span class="na">Name=</span><span class="s">"DXC"</span>
<span class="na">Condition=</span><span class="s">"'@(DXCShader)' != ''"</span>
<span class="na">BeforeTargets=</span><span class="s">"ClCompile"</span><span class="nt">></span>
<span class="nt"><Message</span> <span class="na">Importance=</span><span class="s">"High"</span> <span class="na">Text=</span><span class="s">"Building shaders!!!"</span> <span class="nt">/></span>
<span class="nt"></Target></span>
<span class="nt"></Project></span>
</code></pre></div>
<p>And another file <code>dxc.xml</code> containing:</p>
<div class="codehilite"><pre><span></span><code><span class="cp"><?xml version="1.0" encoding="utf-8"?></span>
<span class="nt"><ProjectSchemaDefinitions</span> <span class="na">xmlns=</span><span class="s">"http://schemas.microsoft.com/build/2009/properties"</span><span class="nt">></span>
<span class="c"><!-- Associate DXCShader item type with .hlsl files --></span>
<span class="nt"><ItemType</span> <span class="na">Name=</span><span class="s">"DXCShader"</span> <span class="na">DisplayName=</span><span class="s">"DXC Shader"</span> <span class="nt">/></span>
<span class="nt"><ContentType</span> <span class="na">Name=</span><span class="s">"DXCShader"</span> <span class="na">ItemType=</span><span class="s">"DXCShader"</span> <span class="na">DisplayName=</span><span class="s">"DXC Shader"</span> <span class="nt">/></span>
<span class="nt"><FileExtension</span> <span class="na">Name=</span><span class="s">".hlsl"</span> <span class="na">ContentType=</span><span class="s">"DXCShader"</span> <span class="nt">/></span>
<span class="nt"></ProjectSchemaDefinitions></span>
</code></pre></div>
<p>Let’s pause for a moment and take stock of what’s going on here. First, we’re creating a new “item
type”, called <code>DXCShader</code>, and associating it with the extension <code>.hlsl</code>. That way, any files we
add to our project with that extension will automatically have this item type applied.</p>
<p>Second, we’re instructing MSBuild that <code>DXCShader</code> items are to be built with the <code>DXC</code> target, and
we’re defining what that target does. For now, all it does is print a message in the build output,
but we’ll get it doing some actual work shortly.</p>
<p>A few miscellaneous syntax notes:</p>
<ul>
<li>Yes, you need two separate files. No, there’s no way to combine them, AFAICT. This is just the
way MSBuild works.</li>
<li>The syntax <code>@(DXCShader)</code> means “the list of all <code>DXCShader</code> items in the project”. The <code>Condition</code>
attribute on a target says under what conditions that target should execute: if the condition is
false, the target is skipped. Here, we’re executing the target if the list <code>@(DXCShader)</code> is non-empty.</li>
<li><code>BeforeTargets="ClCompile"</code> means this target will run before the <code>ClCompile</code> target, i.e. before
C/C++ source files are compiled with <code>cl.exe</code>. This is because we’re going to output our shader
bytecode to headers which will get included into C++, so the shader compile step needs to run
earlier.</li>
<li><code>Importance="High"</code> is needed on the <code><Message></code> task for it to show up in the VS IDE on the
default verbosity setting. Lower importances will be masked unless you turn up the verbosity.</li>
</ul>
<p>To get this into your project, in the VS IDE right-click the project → Build Dependencies… → Build Customizations,
then click “Find Existing” and point it at <code>dxc.targets</code>. Alternatively, add this line to your <code>.vcxproj</code>
(as a child of the root <code><Project></code> element, doesn’t matter where):</p>
<div class="codehilite"><pre><span></span><code><span class="nt"><Import</span> <span class="na">Project=</span><span class="s">"dxc.targets"</span> <span class="nt">/></span>
</code></pre></div>
<p>Now, if you add a <code>.hlsl</code> file to your project it should automatically show up as type “DXC Shader”
in the properties; and when you build, you should see the message <code>Building shaders!!!</code> in the
output.</p>
<p>Incidentally, in <code>dxc.xml</code> you can also set up property pages that will show up in the VS IDE on
<code>DXCShader</code>-type files. This lets you define your own metadata and let users configure it per
file. I haven’t done this, but for example, you could have properties to indicate which shader
stages or profiles the file should be compiled for. The <code><Target></code> element can then have logic that refers
to those properties. Many examples of the XML to define property pages can be found in <code>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\1033</code>
(or a corresponding location depending on which version of VS you have). For example,
<code>custom_build_tool.xml</code> in that directory defines the properties for the built-in Custom Build
Tool item type.</p>
<h2 id="invoking-the-tool"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#invoking-the-tool" title="Permalink to this section">Invoking The Tool</a></h2>
<p>Okay, now it’s time to get our custom target to actually do something. Mike’s blog post used the MSBuild
<a href="https://docs.microsoft.com/en-us/visualstudio/msbuild/exec-task?view=vs-2017"><code><Exec></code> task</a> to
run a command on each source file. However, we’re going to take a different tack and use the
Visual Studio <code><CustomBuild></code> task instead.</p>
<p>The <code><CustomBuild></code> task is the same one that ends up getting executed if you manually set your
files to “Custom Build Tool” and fill in the command/inputs/outputs metadata in the property pages.
But instead of putting that in by hand, we’re going to set up our target to <em>generate</em> the metadata
and then pass it in to <code><CustomBuild></code>. Doing it this way is going to let us access a couple handy
features later that we wouldn’t get with the plain <code><Exec></code> task.</p>
<p>Add this inside the DXC <code><Target></code> element:</p>
<div class="codehilite"><pre><span></span><code><span class="c"><!-- Setup metadata for custom build tool --></span>
<span class="nt"><ItemGroup></span>
<span class="nt"><DXCShader></span>
<span class="nt"><Message></span>%(Filename)%(Extension)<span class="nt"></Message></span>
<span class="nt"><Command></span>
"$(WDKBinRoot)\x86\dxc.exe" -T vs_6_0 -E vs_main %(Identity) -Fh %(Filename).vs.h -Vn %(Filename)_vs
"$(WDKBinRoot)\x86\dxc.exe" -T ps_6_0 -E ps_main %(Identity) -Fh %(Filename).ps.h -Vn %(Filename)_ps
<span class="nt"></Command></span>
<span class="nt"><Outputs></span>%(Filename).vs.h;%(Filename).ps.h<span class="nt"></Outputs></span>
<span class="nt"></DXCShader></span>
<span class="nt"></ItemGroup></span>
<span class="c"><!-- Compile by forwarding to the Custom Build Tool infrastructure --></span>
<span class="nt"><CustomBuild</span> <span class="na">Sources=</span><span class="s">"@(DXCShader)"</span> <span class="nt">/></span>
</code></pre></div>
<p>Now, given some valid HLSL source files in the project, this will invoke <code>dxc.exe</code> twice on each
one—first compiling a vertex shader, then a pixel shader. The bytecode will be output as C arrays in
header files (<code>-Fh</code> option). I’ve just put the output headers in the main project directory, but
in production you’d probably want to put them in a subdirectory somewhere.</p>
<p>Let’s back up and look at the syntax in this snippet. First, the <code><ItemGroup><DXCShader></code> combo
basically says “iterate over the <code>DXCShader</code> items”, i.e. the HLSL source files in the project.
Then what we’re doing is adding metadata: each of the child elements—<code><Message></code>, <code><Command></code>,
and <code><Outputs></code>—becomes a metadata key/value pair attached to a <code>DXCShader</code>.</p>
<p>The <code>%(Foo)</code> syntax accesses item metadata (within a previously established context for “which item”,
which is here created by the iteration over the shaders). All MSBuild items have certain
<a href="https://docs.microsoft.com/en-us/visualstudio/msbuild/msbuild-well-known-item-metadata?view=vs-2017">built-in metadata</a>
like path, filename, and extension; we’re building on those to construct additional
metadata, in the format expected by the <code><CustomBuild></code> task. (It matches the metadata that would be
created if you set up the command line etc. manually in the Custom Build Tool property pages.)</p>
<p>Incidentally, the <code>$(WDKBinRoot)</code> variable (“property”, in MSBuild-ese) is the path to the Windows
SDK <code>bin</code> folder, where lots of tools like <code>dxc</code> live. It needs to be quoted because it can (and
usually does) contain spaces. You can find out these things by running MSBuild with “diagnostic”
verbosity (in VS, go to Tools → Options → Projects and Solutions → Build and Run → “MSBuild project
build output verbosity”)—this will spit out all the defined properties plus a ton of logging about
which targets are running and what they’re doing.</p>
<p>Finally, after setting up all the required metadata, we simply pass it to the <code><CustomBuild></code> task.
(This task isn’t part of core MSBuild, but is defined in <code>Microsoft.Build.CPPTasks.Common.dll</code>—an
extension plugin to MSBuild that comes with Visual Studio.) Again we see the <code>@(DXCShader)</code> syntax,
meaning to pass in the list of all <code>DXCShader</code> items in the project. Internally, <code><CustomBuild></code>
iterates over it and invokes your specified command lines.</p>
<h2 id="incremental-builds"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#incremental-builds" title="Permalink to this section">Incremental Builds</a></h2>
<p>At this point, we have a working custom build! We can simply add <code>.hlsl</code> files to our project, and
they’ll automatically be compiled by <code>dxc</code> as part of the build process, without us having to do
anything. <em>Hurrah!</em></p>
<p>However, while working with this setup you will notice a couple of problems.</p>
<ol>
<li>When you modify an HLSL source file, Visual Studio will <em>not</em> reliably detect that it
needs to recompile it. If the project was up-to-date before, hitting Build will do nothing!
However, if you have also modified something else (such as a C++ source file), <em>then</em> the build
will pick up the shaders in addition.</li>
<li>Anytime anything else gets built, <em>all</em> the shaders get built. In other words, MSBuild doesn’t
yet understand that if an individual shader is already up-to-date then it can be skipped.</li>
</ol>
<p>Fortunately, we can easily fix these. But first, why are these problems happening at all?</p>
<p>VS and MSBuild depend on <a href="https://docs.microsoft.com/en-us/visualstudio/extensibility/visual-cpp-project-extensibility?view=vs-2017#tlog-files"><code>.tlog</code> (tracker log) files</a>
to cache information about source file dependencies and efficiently determine whether a build is
up-to-date. Somewhere inside your build output directory there will be a folder full of these logs,
listing what source files have gotten built, what inputs they depended on (e.g. headers), and
what outputs they generated (e.g. object files). The problem is that our custom target isn’t
producing any <code>.tlog</code>s.</p>
<p>Conveniently for us, the <code><CustomBuild></code> task supports <code>.tlog</code> handling right out of the box; we
just have to turn it on! Change the <code><CustomBuild></code> invocation in the targets file to this:</p>
<div class="codehilite"><pre><span></span><code><span class="c"><!-- Compile by forwarding to the Custom Build Tool infrastructure,</span>
<span class="c"> so it will take care of .tlogs --></span>
<span class="nt"><CustomBuild</span>
<span class="na">Sources=</span><span class="s">"@(DXCShader)"</span>
<span class="na">MinimalRebuildFromTracking=</span><span class="s">"true"</span>
<span class="na">TrackerLogDirectory=</span><span class="s">"$(TLogLocation)"</span> <span class="nt">/></span>
</code></pre></div>
<p>That’s all there is to it—now, modified HLSL files will be properly detected and rebuilt, and
<em>unmodified</em> ones will be properly detected and <em>not</em> rebuilt. This also takes care of deleting the
previous output files when you do a clean build. This is one reason to prefer using the <code><CustomBuild></code>
task rather than the simpler <code><Exec></code> task (we’ll see another reason a bit later).</p>
<p><em>Thanks to Olga Arkhipova at Microsoft for helping me figure out this part!</em></p>
<h2 id="header-dependencies"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#header-dependencies" title="Permalink to this section">Header Dependencies</a></h2>
<p>Now that we have dependencies hooked up for our custom toolchain, a logical next step is to look
into how we can specify extra input dependencies—so that our shaders can have <code>#include</code>s, for
example, and modifications to the headers will automatically trigger rebuilds properly.</p>
<p>The good news is that yes, we can do this by adding an <code><AdditionalInputs></code> metadata key to our
<code>DXCShader</code> items. Files listed there will get registered as inputs in the <code>.tlog</code>, and the build
system will do the rest. The bad news is that there doesn’t seem to be an easy way to detect <em>on
a file-by-file level</em> which additional inputs are needed.</p>
<p>This is frustrating because Visual Studio actually includes a utility for tracking
file accesses in an external tool! It’s called <code>tracker.exe</code> and lives somewhere in your VS
installation. You give it a command line, and it’ll detect all files opened for reading by the
launched process (presumably by injecting a DLL and detouring <code>CreateFile()</code>, or something along
those lines). I believe this is what VS uses internally to track <code>#include</code>s for C++—and it
would be perfect if we could get access to the same functionality for custom toolchains as well.</p>
<p>Unfortunately, the <code><CustomBuild></code> task <em>explicitly disables</em> this tracking functionality. I was
able to find this out by using <a href="https://github.com/icsharpcode/ILSpy">ILSpy</a> to decompile the
<code>Microsoft.Build.CPPTasks.Common.dll</code>. It’s a .NET assembly, so it decompiles pretty cleanly, and
you can examine the innards of the <code>CustomBuild</code> class. It contains this snippet, in the
<code>ExecuteTool()</code> method:</p>
<div class="codehilite"><pre><span></span><code><span class="kt">bool</span> <span class="n">trackFileAccess</span> <span class="p">=</span> <span class="k">base</span><span class="p">.</span><span class="n">TrackFileAccess</span><span class="p">;</span>
<span class="k">base</span><span class="p">.</span><span class="n">TrackFileAccess</span> <span class="p">=</span> <span class="k">false</span><span class="p">;</span>
<span class="n">num</span> <span class="p">=</span> <span class="k">base</span><span class="p">.</span><span class="n">TrackerExecuteTool</span><span class="p">(</span><span class="n">pathToTool2</span><span class="p">,</span> <span class="n">responseFileCommands</span><span class="p">,</span> <span class="n">commandLineCommands</span><span class="p">);</span>
<span class="k">base</span><span class="p">.</span><span class="n">TrackFileAccess</span> <span class="p">=</span> <span class="n">trackFileAccess</span><span class="p">;</span>
</code></pre></div>
<p>That is, it’s turning off file access tracking before calling the base class
method that would otherwise invoke the tracker. I’m sure there’s a reason why they did that, but
sadly it’s stymied my attempts to get automatic <code>#include</code> tracking to work for shaders.</p>
<p>(We could also invoke <code>tracker.exe</code> manually in our command line, but then we face the problem of
merging the tracker-generated <code>.tlog</code> into that of the <code><CustomBuild></code> task. They’re just text files,
so it’s potentially doable…but that is <em>way</em> more programming than I’m prepared to attempt in an
XML-based scripting language.)</p>
<p>Although we can’t get fine-grained file-by-file header dependencies, we can still set up <em>conservative</em>
dependencies by making every HLSL source file depend on every header. This will result in rebuilding
all the shaders whenever any header is modified—but better to rebuild too much than not enough.
We can find all the headers using a wildcard pattern and an <code><ItemGroup></code>. Add this to the DXC
<code><Target></code>, before the “setup metadata” section:</p>
<div class="codehilite"><pre><span></span><code><span class="c"><!-- Find all shader headers (.hlsli files) --></span>
<span class="nt"><ItemGroup></span>
<span class="nt"><ShaderHeader</span> <span class="na">Include=</span><span class="s">"*.hlsli"</span> <span class="nt">/></span>
<span class="nt"></ItemGroup></span>
<span class="nt"><PropertyGroup></span>
<span class="nt"><ShaderHeaders></span>@(ShaderHeader)<span class="nt"></ShaderHeaders></span>
<span class="nt"></PropertyGroup></span>
</code></pre></div>
<p>You could also set this to find <code>.h</code> files under a <code>Shaders</code> subdirectory, or whatever you prefer.
The <code>**</code> wildcard is available for recursively searching subdirectories, too.</p>
<p>Then add this inside the <code><ItemGroup><DXCShader></code> section:</p>
<div class="codehilite"><pre><span></span><code><span class="nt"><AdditionalInputs></span>$(ShaderHeaders)<span class="nt"></AdditionalInputs></span>
</code></pre></div>
<p>We have to do a little dance here, first forming the <code>ShaderHeader</code> item list, then expanding it
into the <code>ShaderHeaders</code> <em>property</em>, and finally referencing that in the metadata. I’m not sure why,
but if I try to use <code>@(ShaderHeader)</code> directly in the metadata it just comes out blank. Perhaps
it’s not allowed to have nested iteration over item lists in MSBuild.</p>
<p>In any case, after making these changes and rebuilding, the build should now pick up any changes to
shader headers. <em>Woohoo!</em></p>
<h2 id="errorwarning-parsing"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#errorwarning-parsing" title="Permalink to this section">Error/Warning Parsing</a></h2>
<p>There’s just one more bit of sparkle we can easily add. When you compile C++ and you get an error
or warning, the VS IDE recognizes it and produces a clickable link that takes you to the source
location. If a custom build step emits error messages in the same format, they’ll be picked up as
well—but what if your custom toolchain has a different format?</p>
<p>The <code>dxc</code> compiler emits errors and warnings in gcc/clang format, looking something like this:</p>
<div class="codehilite"><pre><span></span><code>Shader.hlsl:12:15: error: cannot convert from 'float3' to 'float4'
</code></pre></div>
<p>It turns out that Visual Studio already does recognize this format (at least as of version 15.9),
which is great! But if it didn’t, or in case you’ve got a tool with some other message format, it turns
out you can provide a regular expression to find errors and warnings in the tool output. The regex
can even supply source file/line information, and the errors will become clickable in the IDE, just
as with C++. (This is all <em>totally undocumented</em> and I only know about it because I spotted the
code while browsing through the decompiled CPPTasks DLL. If you want to take a look for yourself,
the juicy bit is the <code>VCToolTask.ParseLine()</code> method.)</p>
<p>This will use <a href="https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference">.NET regex syntax</a>,
and in particular, expects a certain set of <a href="https://docs.microsoft.com/en-us/dotnet/standard/base-types/grouping-constructs-in-regular-expressions#named_matched_subexpression">named captures</a>
to provide metadata. By way of example, here’s the regex I wrote for gcc/clang-format errors:</p>
<div class="codehilite"><pre><span></span><code>(?'FILENAME'.+):(?'LINE'\d+):(?'COLUMN'\d+): (?'CATEGORY'error|warning): (?'TEXT'.*)
</code></pre></div>
<p><code>FILENAME</code>, <code>LINE</code>, etc. are the names the parsing code expects for the metadata. There’s one more
I didn’t use: <code>CODE</code>, for an error code (like <a href="https://docs.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2440?view=vs-2017">C2440</a>,
etc.). The only required one is <code>CATEGORY</code>, without which the message won’t be clickable (and it
must be one of the words “error”, “warning”, or “note”); all the others are optional.</p>
<p>To use it, pass the regex to the <code><CustomBuild></code> task like so:</p>
<div class="codehilite"><pre><span></span><code><span class="nt"><CustomBuild</span>
<span class="na">Sources=</span><span class="s">"@(DXCShader)"</span>
<span class="na">MinimalRebuildFromTracking=</span><span class="s">"true"</span>
<span class="na">TrackerLogDirectory=</span><span class="s">"$(TLogLocation)"</span>
<span class="na">ErrorListRegex=</span><span class="s">"(?'FILENAME'.+):(?'LINE'\d+):(?'COLUMN'\d+): (?'CATEGORY'error|warning): (?'TEXT'.*)"</span> <span class="nt">/></span>
</code></pre></div>
<h2 id="example-project"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#example-project" title="Permalink to this section">Example Project</a></h2>
<p>Here’s a complete VS2017 project with all the features we’ve discussed, a couple demo shaders, and a
C++ file that includes the compiled bytecode (just to show that works).</p>
<p><a class="biglink" href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/buildcust3.zip">Download Example Project (.zip, 4.3 KB)</a></p>
<p>And for completeness, here’s the final contents of <code>dxc.targets</code>:</p>
<div class="codehilite"><pre><span></span><code><span class="cp"><?xml version="1.0" encoding="utf-8"?></span>
<span class="nt"><Project</span> <span class="na">xmlns=</span><span class="s">"http://schemas.microsoft.com/developer/msbuild/2003"</span><span class="nt">></span>
<span class="nt"><ItemGroup></span>
<span class="c"><!-- Include definitions from dxc.xml, which defines the DXCShader item. --></span>
<span class="nt"><PropertyPageSchema</span> <span class="na">Include=</span><span class="s">"$(MSBuildThisFileDirectory)dxc.xml"</span> <span class="nt">/></span>
<span class="c"><!-- Hook up DXCShader items to be built by the DXC target. --></span>
<span class="nt"><AvailableItemName</span> <span class="na">Include=</span><span class="s">"DXCShader"</span><span class="nt">></span>
<span class="nt"><Targets></span>DXC<span class="nt"></Targets></span>
<span class="nt"></AvailableItemName></span>
<span class="nt"></ItemGroup></span>
<span class="nt"><Target</span>
<span class="na">Name=</span><span class="s">"DXC"</span>
<span class="na">Condition=</span><span class="s">"'@(DXCShader)' != ''"</span>
<span class="na">BeforeTargets=</span><span class="s">"ClCompile"</span><span class="nt">></span>
<span class="nt"><Message</span> <span class="na">Importance=</span><span class="s">"High"</span> <span class="na">Text=</span><span class="s">"Building shaders!!!"</span> <span class="nt">/></span>
<span class="c"><!-- Find all shader headers (.hlsli files) --></span>
<span class="nt"><ItemGroup></span>
<span class="nt"><ShaderHeader</span> <span class="na">Include=</span><span class="s">"*.hlsli"</span> <span class="nt">/></span>
<span class="nt"></ItemGroup></span>
<span class="nt"><PropertyGroup></span>
<span class="nt"><ShaderHeaders></span>@(ShaderHeader)<span class="nt"></ShaderHeaders></span>
<span class="nt"></PropertyGroup></span>
<span class="c"><!-- Setup metadata for custom build tool --></span>
<span class="nt"><ItemGroup></span>
<span class="nt"><DXCShader></span>
<span class="nt"><Message></span>%(Filename)%(Extension)<span class="nt"></Message></span>
<span class="nt"><Command></span>
"$(WDKBinRoot)\x86\dxc.exe" -T vs_6_0 -E vs_main %(Identity) -Fh %(Filename).vs.h -Vn %(Filename)_vs
"$(WDKBinRoot)\x86\dxc.exe" -T ps_6_0 -E ps_main %(Identity) -Fh %(Filename).ps.h -Vn %(Filename)_ps
<span class="nt"></Command></span>
<span class="nt"><AdditionalInputs></span>$(ShaderHeaders)<span class="nt"></AdditionalInputs></span>
<span class="nt"><Outputs></span>%(Filename).vs.h;%(Filename).ps.h<span class="nt"></Outputs></span>
<span class="nt"></DXCShader></span>
<span class="nt"></ItemGroup></span>
<span class="c"><!-- Compile by forwarding to the Custom Build Tool infrastructure,</span>
<span class="c"> so it will take care of .tlogs and error/warning parsing --></span>
<span class="nt"><CustomBuild</span>
<span class="na">Sources=</span><span class="s">"@(DXCShader)"</span>
<span class="na">MinimalRebuildFromTracking=</span><span class="s">"true"</span>
<span class="na">TrackerLogDirectory=</span><span class="s">"$(TLogLocation)"</span>
<span class="na">ErrorListRegex=</span><span class="s">"(?'FILENAME'.+):(?'LINE'\d+):(?'COLUMN'\d+): (?'CATEGORY'error|warning): (?'TEXT'.*)"</span> <span class="nt">/></span>
<span class="nt"></Target></span>
<span class="nt"></Project></span>
</code></pre></div>
<h2 id="the-next-level"><a href="https://www.reedbeta.com/blog/custom-toolchain-with-msbuild/#the-next-level" title="Permalink to this section">The Next Level</a></h2>
<p>At this point, we have a pretty usable MSBuild customization for compiling shaders, or using other
kinds of custom toolchains! I’m pretty happy with it. However, there’s still a couple of areas for
improvement.</p>
<ul>
<li>As mentioned before, I’d like to get file access tracking to work so we can have exact
dependencies for included files, rather than conservative (overly broad) dependencies.</li>
<li>I haven’t done anything with parallel building. Currently, <code><CustomBuild></code> tasks are run one at a
time. There <em>is</em> a <code><ParallelCustomBuild></code> task in the CPPTasks assembly…unfortunately, it
doesn’t support <code>.tlog</code> updating or the error/warning regex, so it’s not directly usable here.</li>
</ul>
<p>To obtain these features, I think I’d need to write my own build extension in C#, defining a custom
task and calling it in place of <code><CustomBuild></code> in the targets file. It might not be too hard to get
that working, but I haven’t attempted it yet.</p>
<p>In the meantime, now that the hard work of circumventing the weird gotchas and reverse-engineering
the undocumented innards has been done, it should be pretty easy to adapt this <code>.targets</code> setup to
other needs for code generation or external tools, and have them act mostly like first-class
citizens in our Visual Studio builds. Cheers!</p>Mesh Shader Possibilities
https://www.reedbeta.com/blog/mesh-shader-possibilities/
http://reedbeta.com/blog/mesh-shader-possibilities/Nathan ReedSat, 29 Sep 2018 11:42:26 -0700https://www.reedbeta.com/blog/mesh-shader-possibilities/#commentsCodingGPUGraphics<p>NVIDIA recently announced their latest GPU architecture, called Turing. Although its headlining feature is
<a href="https://arstechnica.com/gadgets/2018/08/microsoft-announces-the-next-step-in-gaming-graphics-directx-raytracing/">hardware-accelerated ray tracing</a>,
Turing also includes <a href="https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/">several other developments</a>
that look quite intriguing in their own right.</p>
<p>One of these is the new concept of <a href="https://devblogs.nvidia.com/introduction-turing-mesh-shaders/"><em>mesh shaders</em></a>,
details of which dropped a couple weeks ago—and the graphics programming community was agog, with many
enthusiastic discussions taking place on Twitter and elsewhere. So what are mesh shaders (and their
counterparts, task shaders), why are graphics programmers so excited about them, and what might we
be able to do with them?</p>
<!--more-->
<h2 id="the-gpu-geometry-pipeline-has-gotten-cluttered"><a href="https://www.reedbeta.com/blog/mesh-shader-possibilities/#the-gpu-geometry-pipeline-has-gotten-cluttered" title="Permalink to this section">The GPU Geometry Pipeline Has Gotten Cluttered</a></h2>
<p>The process of submitting geometry—triangles to be drawn—to the GPU has a simple underlying
paradigm: you put your vertices into a buffer, point the GPU at it, and issue a draw call to say
how many primitives to render. The vertices get slurped linearly out of the buffer, each is
processed by a vertex shader, the triangles are rasterized and shaded, and Bob’s your uncle.</p>
<p>But over decades of GPU development, various extra features have gotten bolted onto this basic pipeline
in the name of greater performance and efficiency. Indexed triangles and vertex caches were created to exploit
vertex reuse. Complex vertex stream format descriptions are needed to prepare data for shading.
Instancing, and later multi-draw, allowed certain sets of draw calls to be combined together;
indirect draws could be generated on the GPU itself. Then came
the extra shader stages: geometry shaders, to allow programmable operations on primitives and even
inserting or deleting primitives on the fly, and then tessellation shaders, letting you submit a
low-res mesh and dynamically subdivide it to a programmable level.</p>
<p>While these features and more were all added for good reasons (or at least what <em>seemed</em> like
good reasons at the time), the compound of all of them has become unwieldy. Which subset of the
many available options do you reach for in a given situation? Will your choice be efficient across
all the GPU architectures your software must run on?</p>
<p>Moreover, this elaborate pipeline is still not as flexible as we would sometimes like—or, where
flexible, it is not performant. Instancing can only draw copies of a single mesh at a time;
multi-draw is still inefficient for large numbers of small draws. Geometry shaders’ programming model is <a href="http://www.joshbarczak.com/blog/?p=667">not
conducive to efficient implementation</a> on wide SIMD cores in
GPUs, and its <a href="https://fgiesen.wordpress.com/2011/07/20/a-trip-through-the-graphics-pipeline-2011-part-10/">input/output buffering presents difficulties too</a>.
Hardware tessellation, though very handy for certain things, is often <a href="https://www.sebastiansylvan.com/post/the-problem-with-tessellation-in-directx-11/">difficult to use well</a>
due to the limited granularity at which you can set tessellation factors, the limited set of baked-in
<a href="/blog/tess-quick-ref/">tessellation modes</a>, and performance issues on some GPU architectures.</p>
<h2 id="simplicity-is-golden"><a href="https://www.reedbeta.com/blog/mesh-shader-possibilities/#simplicity-is-golden" title="Permalink to this section">Simplicity Is Golden</a></h2>
<p>Mesh shaders represent a radical simplification of the geometry pipeline. With a mesh shader
enabled, all the shader stages and fixed-function features described above are swept away. Instead, we get
a clean, straightforward pipeline using a compute-shader-like programming model. Importantly, this
new pipeline is both highly flexible—enough to handle the existing geometry tasks in a typical game,
plus enable new techniques that are challenging to do on the GPU today—<em>and</em> it looks
like it should be quite performance-friendly, with no apparent architectural barriers to efficient
GPU execution.</p>
<p>Like a compute shader, a mesh shader defines work groups of parallel-running threads, and they can
communicate via on-chip shared memory as well as <a href="http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2017/07/GDC2017-Wave-Programming-D3D12-Vulkan.pdf">wave intrinsics</a>.
In lieu of a draw call, the app launches some number of mesh shader work groups. Each work group
is responsible for writing out a small, self-contained chunk of geometry, called a
“meshlet”, expressed in arrays of vertex attributes and corresponding indices. These meshlets
then get tossed directly into the rasterizer, and Bob’s your uncle.</p>
<p>(More details can be found in <a href="https://devblogs.nvidia.com/introduction-turing-mesh-shaders/">NVIDIA’s blog post</a>,
a <a href="http://on-demand.gputechconf.com/siggraph/2018/video/sig1811-3-christoph-kubisch-mesh-shaders.html">talk by Christoph Kubisch</a>,
and the <a href="https://www.khronos.org/registry/OpenGL/extensions/NV/NV_mesh_shader.txt">OpenGL extension spec</a>.)</p>
<p>The appealing thing about this model is how data-driven and freeform it is. The mesh shader pipeline
has very relaxed expectations about the shape of your data and the kinds of things you’re doing to do.
Everything’s up to the programmer: you can pull the vertex and index data from buffers, generate
them algorithmically, or any combination.</p>
<p>At the same time, the mesh shader model sidesteps the issues that hampered geometry shaders, by explicitly embracing
SIMD execution (in the form of the compute “work group” abstraction). Instead of each shader <em>thread</em>
generating geometry on its own—which leads to divergence, and large input/output data sizes—we
have the whole work group outputting a meshlet cooperatively. This mean we can use
compute-style tricks, like: first do some work on the vertices in parallel, then have a barrier, then work on
the triangles in parallel. It also means the input/output bandwidth needs are a lot more reasonable.
And, because meshlets are indexed triangle lists, they don’t break vertex reuse, as geometry shaders often did.</p>
<h2 id="an-upgrade-path"><a href="https://www.reedbeta.com/blog/mesh-shader-possibilities/#an-upgrade-path" title="Permalink to this section">An Upgrade Path</a></h2>
<p>The other really neat thing about mesh shaders is that they don’t require you to drastically rework
how your game engine handles geometry to take advantage of them. It looks like it should be pretty
easy to convert most common geometry types to mesh shaders, making it an approachable upgrade path for
developers.</p>
<p>(You don’t have to convert <em>everything</em> to mesh shaders straight away, though; it’s possible
to switch between the old geometry pipeline and the new mesh-shader-based one at different points in
the frame.)</p>
<p>Suppose you have an ordinary authored mesh that you want to load and render. You’ll
need to break it up into meshlets, which have a static maximum size declared in the
shader—NVIDIA’s blog post recommends 64 vertices and 126 triangles as a default. How do we do this?</p>
<p>Fortunately, most game engines currently do some form of <a href="https://tomforsyth1000.github.io/papers/fast_vert_cache_opt.html">vertex cache optimization</a>,
which already organizes the primitives by locality—triangles sharing one or two vertices will tend
to be close together in the index buffer. So, a quite viable
strategy for creating meshlets is: just scan the index buffer linearly, accumulating the set of
vertices used, until you hit either 64 vertices or 126 triangles; reset and repeat until you’ve gone
through the whole mesh. This could be done at art build time, or it’s simple enough that you could even do it
in the engine at level load time.</p>
<p>Alternatively, vertex cache optimization algorithms can probably be modified to produce meshlets directly.
For GPUs without mesh shader support, you can concatenate all the meshlet vertex buffers
together, and rapidly generate a traditional index buffer by offsetting and concatenating all the
meshlet index buffers. It’s pretty easy to go back and forth.</p>
<p>In either case, the mesh shader would be mostly just acting as a vertex shader, with some extra
code to fetch vertex and index data from their buffers and plug them into the mesh outputs.</p>
<p>What about other kinds of geometry found in games?</p>
<p>Instanced draws are straightforward: multiply the meshlet count and put in a bit of
shader logic to hook up instance parameters. A more interesting case is multi-draw, where we want
to draw a lot of meshes that <em>aren’t</em> all copies of the same thing. For this, we can employ
<em>task shaders</em>—a secondary feature of the mesh shader pipeline. Task shaders
add an extra layer of compute-style work groups, running before the mesh shader, and they control
<em>how many</em> mesh shader work groups to launch. They can also write output variables to be consumed by the
mesh shader. A very efficient multi-draw should be possible by launching task shaders with a thread
per draw, which in turn launch the mesh shaders for all the individual draws.</p>
<p>If we need to draw a lot of <em>very</em> small meshes, such as quads for particles/imposters/text/point-based rendering,
or boxes for occlusion tests / projected decals and whatnot, then we can pack a bunch of them
into each mesh shader workgroup. The geometry can be generated entirely in-shader rather than relying
on a pre-initialized index buffer from the CPU. (This was one of the original use cases that, it was
hoped, could be done with geometry shaders—e.g. submitting point primitives, and having the GS expand them
into quads.) There’s also a lot of flexibility to do stuff with variable topology, like particle
beams/strips/ribbons, which would otherwise need to be generated either on the CPU or in a separate
compute pre-pass.</p>
<p>(By the way, the <em>other</em> original use case that, it was hoped, could be done with geometry shaders
was multi-view rendering: drawing the same geometry to, say, multiple faces of a cubemap or slices
of a cascaded shadow map within a single draw call. You could do that with mesh shaders, too—but
Turing actually has a separate hardware multi-view capability for these applications.)</p>
<p>What about tessellated meshes?</p>
<p>The two-layer structure of task and mesh shaders is broadly
similar to that of tessellation hull and domain shaders. While it doesn’t appear that mesh shaders
have any kind of access to the fixed-function tessellator unit, it’s also
not too hard to imagine that we could write code in task/mesh shaders to reproduce tessellation
functionality (or at least some of it). Figuring out the details would be a bit of a research project
for sure—maybe someone has already worked on this?—and perf would be a question mark. However,
we’d get the benefit of being able to <em>change</em> how tessellation works, instead of being stuck with
whatever Microsoft decided on in the late 2000s.</p>
<h2 id="new-possibilities"><a href="https://www.reedbeta.com/blog/mesh-shader-possibilities/#new-possibilities" title="Permalink to this section">New Possibilities</a></h2>
<p>It’s great that mesh shaders can subsume our current geometry tasks, and in some cases make them
more efficient. But mesh shaders also open up possibilities for new kinds of geometry processing
that wouldn’t have been feasible on the GPU before, or would have required expensive compute
pre-passes storing data out to memory and then reading it back in through the traditional geometry
pipeline.</p>
<p>With our meshes already in meshlet form, we can do <a href="https://www.slideshare.net/gwihlidal/optimizing-the-graphics-pipeline-with-compute-gdc-2016">finer-grained culling</a>
at the meshlet level, and even at the triangle level within each meshlet. With task shaders, we can
potentially do mesh LOD selection on the GPU, and if we want to get fancy we could even try dynamically
packing together very small draws (from coarse LODs) to get better meshlet utilization.</p>
<p>In place of tile-based forward lighting, or as an extension to it, it might be useful to cull
lights (and projected decals, etc.) per meshlet, assuming there’s a good way to pass the variable-size
light list from a mesh shader down to the fragment shader. (This suggestion from <a href="https://twitter.com/sebaaltonen">Seb Aaltonen</a>.)</p>
<p>Having access to the topology in the mesh shader should enable us to calculate dynamic normals,
tangents, and curvatures for a mesh that’s deforming due to complex skinning, displacement mapping,
or procedural vertex animation. We can also do voxel meshing, or isosurface extraction—marching
cubes or tetrahedra, plus generating normals etc. for the isosurface—directly in a mesh shader,
for rendering fluids and volumetric data.</p>
<p>Geometry for hair/fur, foliage, or other surface cover might be feasible to generate on the fly,
with view-dependent detail.</p>
<p>3D modeling and CAD apps may be able to apply mesh shaders to dynamically triangulate quad meshes or
n-gon meshes, as well as things like dynamically insetting/outsetting geometry for
visualizations.</p>
<p>For rendering displacement-mapped terrain, water, and so forth, mesh shaders may be able to assist
us with <a href="https://developer.nvidia.com/gpugems/GPUGems2/gpugems2_chapter02.html">geometry clipmaps</a>
and geomorphing; they might also be interesting for <a href="http://hhoppe.com/proj/vdrpm/">progressive meshing</a>
schemes.</p>
<p>And last but not least, we might be able to render <a href="https://ia601908.us.archive.org/16/items/GDC2014Brainerd/GDC2014-Brainerd.pdf">Catmull–Clark subdivision surfaces</a>,
or other subdivision schemes, more easily and efficiently than it can be done on the GPU today.</p>
<p>To be clear, a great deal of the above is speculation and handwaving on my part—I don’t want to
mislead you that all of these things are <em>for sure</em> doable with the new mesh and task shader
pipeline. There will certainly be algorithmic difficulties and architectural hindrances that will
come up as graphics programmers have a chance to dig into this. Still, I’m quite excited to see what
people will do with this capability over the next few years, and I hope and expect that it won’t be
an NVIDIA-exclusive feature for too long.</p>Normals and the Inverse Transpose, Part 3: Grassmann On Duals
https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/
http://reedbeta.com/blog/normals-inverse-transpose-part-3/Nathan ReedSun, 22 Jul 2018 22:18:10 -0700https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#commentsGraphicsMath<p>Welcome back! In the last couple of articles, we learned about different ways to understand normal
vectors in 3D space—either as bivectors (<a href="/blog/normals-inverse-transpose-part-1/">part 1</a>), or as
dual vectors (<a href="/blog/normals-inverse-transpose-part-2/">part 2</a>). Both can be valid interpretations,
but they carry different units, and react differently to transformations.</p>
<p>In this third and final installment, we’re going leave behind the focus on normal vectors, and explore
a couple of other unitful vector quantities. We’ve seen how Grassmann bivectors and trivectors act as
oriented areas and volumes, respectively; and we saw how dual vectors act as oriented <em>line densities</em>, with
units of inverse length. Now, we’re going to put these two geometric concepts together, and find out
what they can accomplish with their combined powers. (Get it? Powers? Like powers of a scale factor?
Uh, you know what, never mind.)</p>
<!--more-->
<p>I’m going to dive right in, so if you need a refresher on either Grassmann algebra or dual spaces,
you may want to re-skim the previous articles.</p>
<div class="toc">
<ul>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#wedge-products-of-dual-vectors">Wedge Products of Dual Vectors</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#dual-bivectors">Dual Bivectors</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#dual-trivectors">Dual Trivectors</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#a-few-more-topics">A Few More Topics</a><ul>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#the-interior-product">The Interior Product</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#the-hodge-star">The Hodge Star</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#the-inner-product-or-forgetting-about-duals">The Inner Product, or Forgetting About Duals</a></li>
</ul>
</li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#whats-the-use-of-all-this">What’s The Use of All This?</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#organizing-the-zoo">Organizing the Zoo</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="wedge-products-of-dual-vectors"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#wedge-products-of-dual-vectors" title="Permalink to this section">Wedge Products of Dual Vectors</a></h2>
<p>Grassmann algebra allows us to take wedge products of vectors, producing higher-grade algebraic
entities such as bivectors and trivectors. Just as we can do this with base vectors, we can do the
same thing on dual vectors, producing <em>dual bivectors</em> and <em>dual trivectors</em>.</p>
<p>A dual bivector is formed by wedging two dual vectors, like:
$$
{\bf e_x^*} \wedge {\bf e_y^*} = {\bf e_{xy}^*}
$$
and a dual trivector is the product of three:
$$
{\bf e_x^*} \wedge {\bf e_y^*} \wedge {\bf e_z^*} = {\bf e_{xy}^*} \wedge {\bf e_z^*} = {\bf e_{xyz}^*}
$$
This works exactly the same way that wedge products of ordinary vectors do; in particular, the same
anticommutative law applies.</p>
<p>So what’s the geometric meaning of these dual $k$-vectors? Recall that a dual vector is defined as
a linear form—a function from some vector space $V$ to scalars $\Bbb R$. Conveniently, the wedge
products of dual vectors turn out to be isomorphic to the duals of wedge products of vectors.
(Mathematically, we can say, for finite-dimensional $V$:
$$
\textstyle
\bigwedge^k \bigl( V^* \bigr) \cong \bigl(\bigwedge^k V \bigr)^*
$$
where $\bigwedge^k$ is the operation to construct the set of $k$-vectors over a given base
vector space.)</p>
<p>The upshot is that dual $k$-vectors can be understood as <em>linear forms on $k$-vectors</em>: a dual
bivector is a linear function from bivectors to scalars, and a dual trivector is a linear function
from trivectors to scalars. Let’s see how this works in more detail.</p>
<h2 id="dual-bivectors"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#dual-bivectors" title="Permalink to this section">Dual Bivectors</a></h2>
<p>In the previous article, we saw how a dual vector can be visualized as a field of parallel, uniformly
spaced planes, representing the level sets of a linear form:</p>
<div class="align-center"><figure >
<img src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/1-form.png" alt="A dual vector in 3D, visualized as a set of parallel planes" class="invert-when-dark" style="height:15em" title="A dual vector in 3D, visualized as a set of parallel planes" />
<figcaption><p class="attribution"><a href="https://commons.wikimedia.org/wiki/File:N-form.svg">Maschen (Wikipedia)</a></p></figcaption>
</figure></div>
<p>You can think of the discrete planes in this picture as representing intervals of one unit
in the output of the linear form. Keep in mind, though, that there are actually a <em>continuous
infinity</em> of these planes, filling space—one for every possible output value of the linear form.
When you evaluate the linear form—i.e. pair a dual vector with a vector—the result represents <em>how
many planes</em> the vector crosses, from its tail to its tip (in a continuous-measure sense of “how many”).
This will depend on both the length and orientation of the vector: for example, a vector parallel to
the planes will return zero, no matter its length.</p>
<p>A dual <em>bivector</em> can be thought of in a similar way—but instead of planes, we now picture a field
of parallel <em>lines</em>, uniformly spaced over the plane perpendicular to them.</p>
<div class="align-center"><figure >
<img src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/2-form.png" alt="A dual bivector in 3D, visualized as a set of parallel lines, formed as the intersections of the planes of two dual vectors" class="invert-when-dark" style="height:20em" title="A dual bivector in 3D, visualized as a set of parallel lines, formed as the intersections of the planes of two dual vectors" />
<figcaption><p class="attribution"><a href="https://commons.wikimedia.org/wiki/File:N-form.svg">Maschen (Wikipedia)</a></p></figcaption>
</figure></div>
<p>As suggested by this diagram, when you wedge two dual vectors, the resulting dual bivector consists
of all the <em>lines of intersection</em> of the two dual vectors’ respective planes.</p>
<p>What happens when we pair this dual bivector with a base bivector? As before, the
result is a scalar—this time representing <em>how many lines</em> the bivector crosses! If you visualize
the bivector as a parallelogram, or circle or any other shape, it will have a certain area. It
will therefore intersect some quantity of the continuous mass of lines. This quantity won’t depend on
the <em>shape</em> of the bivector—remember, bivectors don’t actually <em>have</em> any defined shape—only on
its area (magnitude) and orientation. A bivector whose plane runs parallel to the lines will return
zero, no matter its area.</p>
<p>Because dual vectors have units of inverse length, and a dual bivector is a product of dual vectors,
<strong>a dual bivector has units of inverse area</strong>. It represents an oriented areal
density, such as a probability density over a surface! When you pair the dual bivector with a
bivector, the result tells you how much probability (or whatever else) is covered by that bivector’s
area. And as implied by their units, dual bivectors scale as $1/a^2$. (If you scale an object <em>up</em> by
a factor of $a$, the probablity density on its surface goes <em>down</em> by a factor of $a^2$, because the
same total probability is now spread over an $a^2$-larger area.)</p>
<p>How about the transformation rule for dual bivectors? Well, we learned in part 1 that bivectors transform as
$\text{cofactor}(M)$; and in part 2, we found that dual vectors transform as the inverse transpose,
$M^{-T}$. It follows that dual bivectors transform as $\text{cofactor}\bigl(M^{-T}\bigr)$,
or equivalently $\bigl(\text{cofactor}(M)\bigr)^{-T}$. Startlingly, for 3×3 matrices these formulas
reduce to just
$$
\frac{M}{\det M}
$$
So, dual bivectors simply transform using $M$ divided by its own determinant.</p>
<h2 id="dual-trivectors"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#dual-trivectors" title="Permalink to this section">Dual Trivectors</a></h2>
<p>Follow the pattern: if a dual vector in 3D looks like a stack of parallel planes, and a dual bivector
looks like a field of parallel lines, then a dual <em>trivector</em> looks like a cloud of parallel <em>points</em>.
Well, drop the “parallel”—it doesn’t mean anything. It’s just uniformly spaced points.</p>
<div class="align-center"><figure >
<img src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/3-form.png" alt="A dual trivector in 3D, visualized as a set of points formed as the intersections of the planes of three dual vectors" class="invert-when-dark" style="height:20em" title="A dual trivector in 3D, visualized as a set of points formed as the intersections of the planes of three dual vectors" />
<figcaption><p class="attribution"><a href="https://commons.wikimedia.org/wiki/File:N-form.svg">Maschen (Wikipedia)</a></p></figcaption>
</figure></div>
<p>As before, the wedge product of three dual vectors—or a dual vector and dual bivector—constructs
the continuous point cloud made of all the intersection points of the wedge factors. This quantity
scales as $1/a^3$ and represents a volume density. When you pair it with a trivector, the result
tells you how much of the point cloud is enclosed in that trivector’s volume.</p>
<p>The transformation rule for this one is easy—dual trivectors in 3D just get multiplied by $1/\det M$.</p>
<h2 id="a-few-more-topics"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#a-few-more-topics" title="Permalink to this section">A Few More Topics</a></h2>
<p>With the introduction of dual bi- and trivectors, our “scaling zoo” is now complete! We’ve got the
full ecosystem of vectorial quantities with scaling powers from −3 to +3, each with its proper units
and matching transformation formula.</p>
<p>In the rest of this section, I’ll quickly touch on a few more mathematical aspects of this extended
Grassmann algebra with dual spaces.</p>
<h3 id="the-interior-product"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#the-interior-product" title="Permalink to this section">The Interior Product</a></h3>
<p>As we saw in part 2, a vector space and its dual have a “natural pairing” operation, much like
an inner product, between vectors and dual vectors. This pairing
extends to $k$-vectors and their duals, too. In fact, we can further extend the natural pairing to
work between $k$-vectors and duals <em>of different grades</em>. For example, we can define a
way to “pair” a dual vector $w$ with a bivector $B = u \wedge v$, yielding a vector:
$$
\langle w, B \rangle = \langle w, u \rangle v - u \langle w, v \rangle
$$
Geometrically, the resulting vector lies in the plane of $B$, and runs parallel to the level planes
of $w$. In some sense, $w$ is “eating” the dimension of $B$ that lies along the direction of $w$’s
density, and leaving the leftover dimension behind as a vector.</p>
<p>This extended pairing operation is known as the <a href="https://en.wikipedia.org/wiki/Exterior_algebra#Interior_product">interior product</a>
or contraction product, although different references often define it in slightly different ways
(there are various conventions in the literature). I’m not going to go into it too deeply.
The key point is that you can combine a $k$-vector with a dual $\ell$-vector, for any grades
$k$ and $\ell$; the result will be a $(k-\ell)$-vector, interpreting negative grades as duals.</p>
<h3 id="the-hodge-star"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#the-hodge-star" title="Permalink to this section">The Hodge Star</a></h3>
<p>In addition to the vector-space duality we’ve been talking about, Grassmann algebra contains another,
distinct notion of duality: Hodge duality, represented by the Hodge star operator, $\star$. (Note
that this is a different symbol from the asterisk $*$ used for the dual vector space!)</p>
<p>The vector-space notion of duality relates $k$-vectors to duals of <em>equal grade</em>—vectors to dual
vectors, bivectors to dual bivectors, and so on. Hodge duality, however, connects things to duals of a
complementary grade. Applying the Hodge star to a $k$-vector produces an element of grade $n - k$,
where $n$ is the dimension of space. In 3D, it interchanges vectors (grade 1) with bivectors (grade 2),
and scalars (grade 0) with trivectors (grade 3).</p>
<p>The way I’ll define the Hodge star initially is a bit different than the standard way. In fact,
there are actually <em>two</em> Hodge star operations: one that goes from $k$-vectors to dual $(n-k)$-vectors,
and another that goes the other way. I’ll denote these by $\star$ and $-\star$ respectively. The
two are inverses of each other (in 3D, at least). They’re defined as follows:
$$
\begin{aligned}
\star&: \textstyle\bigwedge^k V \to \textstyle\bigwedge^{n-k}V^* &&:
& v^\star &= \langle {\bf e_{xyz}^*}, v \rangle \\
-\star&: \textstyle\bigwedge^k V^* \to \textstyle\bigwedge^{n-k}V &&:
& v^{-\star} &= \langle v, {\bf e_{xyz}} \rangle
\end{aligned}
$$
The angle brackets on the right here are the interior product. What we’re saying is: to do
the Hodge star on a $k$-vector, we take its interior product with ${\bf e_{xyz}^*}$, the standard
unit dual trivector (or, in $n$ dimensions, the unit dual $n$-vector). This results in a dual
$(n-k)$-vector, which geometrically represents a density over all the dimensions <em>not</em> included in
the original $k$-vector.</p>
<p>Conversely, to do the anti-Hodge-star on a dual $k$-vector, we take its interior product with
${\bf e_{xyz}}$, giving an $(n-k)$-vector containing all the dimensions <em>not</em> represented by the
original dual $k$-vector, i.e. all the dimensions perpendicular to its level sets.</p>
<p>(These two operations are <em>almost</em> defined on disjoint domains, and could therefore be combined into
one “smart” star that automatically knows what to do based on the type of its argument…except for
the $k = 0$ case: when you hodge a scalar, does it go to a trivector, or to a dual trivector? Both
are possible; that’s why we need two distinct operations here.)</p>
<p>For 3D geometry, the interesting cases are vectors interchanging with bivectors:</p>
<ul>
<li>A vector $v$ hodges to a dual bivector whose “field lines” run parallel to $v$.</li>
<li>A bivector $B$ hodges to a dual vector whose level planes are parallel to $B$.</li>
<li>A dual vector $w$ unhodges to a bivector parallel to $w$’s level planes.</li>
<li>A dual bivector $D$ unhodges to a vector parallel to $D$’s field lines.</li>
</ul>
<p>Although the formal definition was somewhat involved, you can see that the geometric result of the
Hodge operations is actually pretty simple. It’s all about swapping between the geometry of a
$k$-vector and the corresponding level-set geometry of a dual $(n-k)$-vector. The Hodge stars are
a very useful tool for working with Grassmann and dual-Grassmann quantities in practice.</p>
<h3 id="the-inner-product-or-forgetting-about-duals"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#the-inner-product-or-forgetting-about-duals" title="Permalink to this section">The Inner Product, or Forgetting About Duals</a></h3>
<p>In most treatments of Grassmann or geometric algebra, dual spaces are hardly mentioned. The more conventional
definition of the Hodge star has it mapping directly between $k$-vectors and $(n-k)$-vectors—no
duals in sight. How does this work?</p>
<p>It turns out that if we have an inner product defined on our vector space, we can use it to
convert back and forth between vectors and dual vectors, or $k$-vectors and their duals.</p>
<p>So far, we haven’t discussed any means of mapping individual vectors back and forth between the base and
dual spaces. Although they’re both vector spaces of the same dimension, there’s no natural isomorphism
that would enable us to map them in a non-arbitrary way. However, the presence of an inner
product does pick out a specific isomorphism with the dual space: that which maps each vector $v$ to
a dual vector $v^*$ that implements <em>dotting with $v$</em>, using the inner product.</p>
<p>Symbolically, for all vectors $u \in V$, we have $\langle v^*, u \rangle = v \cdot u$. This can be
extended to inner products and isomorphisms for all $k$-vectors as well (see
<a href="https://en.wikipedia.org/wiki/Exterior_algebra#Inner_product">Wikipedia</a> for details).</p>
<p>Note, however, that this map is <em>not</em> preserved by scaling, or by transformations in general, because
$v^*$ transforms as $M^{-T}$ while $v$ transforms as $M$.</p>
<p>With this correspondence, it becomes possible to largely ignore the existence of dual spaces and dual
elements altogether—we have the fiction that they’re not distinct from the base elements. In an
orthonormal basis, even the <em>coordinates</em> of a vector and its corresponding dual will be identical.</p>
<p>For an example of “forgetting” about duals: the Hodge star operations can be defined using the inner
product to invisibly dualize their input or output as well as hodging it. Then the two Hodge stars I
defined above collapse into one operation, mapping between $\bigwedge^k V$ and $\bigwedge^{n-k} V$.</p>
<h2 id="whats-the-use-of-all-this"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#whats-the-use-of-all-this" title="Permalink to this section">What’s The Use of All This?</a></h2>
<p>This is kind of a lot. We started with just vectors and normal vectors—two kinds of vector-shaped
things with different rules, which was confusing enough. But now we have <em>four</em>: vectors, dual vectors,
bivectors, and dual bivectors. And on top of that we have three scalar-shaped things, too: true
unitless scalars, trivectors, and dual trivectors.</p>
<p>Evidently, lots of people manage to get along well enough without being totally aware of
all these distinctions! Even texts on Grassmann or geometric algebra may not fully delve into
the “duals” story, instead treating $k$-vectors and their duals as the same thing (implicitly using
the isomorphism defined above). Their differing transformation behavior becomes sort of a curiosity,
an unsystematic ornamental detail. And this comes at the cost of making some aspects of the algebra
require an inner product or a metric, and only work properly in an orthonormal basis. In contrast,
when you’re “cooking with duals”, you can derive formulas that work properly in any basis.</p>
<p>As a quick example of this, let’s look at a concrete problem you might encounter in graphics. Let’s
say you have a triangle mesh and you want to select a random point on it, chosen uniformly over the
surface area. To do this, we must first select a random triangle, with probability
proportional to area. The standard technique is to precompute the areas of all the triangles
and build a prefix-sum table; then, to select a triangle, we take a uniform random value and
binary-search on it in the table.</p>
<p>Let’s throw in another wrinkle, though. What if the triangle mesh is transformed—possibly by a
nonuniform scaling, or a shear? In general, this will alter the areas of all the triangles, in an
orientation-dependent way. A uniform distribution over surface area in the mesh’s <em>local</em> space
will no longer be uniform in world space. We could address this by pre-transforming the whole mesh
into world space and doing the sampling process there—but that’s more expensive than necessary.</p>
<p>We can use bivectors to help. Instead of calculating just a scalar area for each triangle, calculate
the bivector representing its orientation and area. (If the triangle’s vertices are $p_1, p_2, p_3$,
this is $\tfrac{1}{2}(p_2 - p_1) \wedge (p_3 - p_1)$.) Now we can transform all the bivectors into
world space, using their transformation rule, and they will accurately represent the areas of the
transformed triangles. Then we can calculate their magnitudes and build the prefix-sum table, as
before.</p>
<p>Conversely, suppose we have an existing, non-uniform areal probability measure defined over our
triangle mesh. (Maybe it’s a light source with a texture defining its emissive brightness, and we
want to sample with respect to emitted power; or maybe we want to sample with respect to solid angle
subtended at some point, or some sort of visual importance, etc.) We can represent these
probability densities as dual bivectors, and again we can take them back and forth between local and
world space—even in the presence of shear or nonuniform scaling—with confidence that we’re still
representing the same distribution.</p>
<p>Some other examples where dual $k$-vectors show up:</p>
<ul>
<li>The derivative (gradient) of a scalar field, such as an SDF, is naturally a dual vector.</li>
<li>Dual vectors represent spatial frequencies (wavevectors) in Fourier analysis.</li>
<li>The radiance carried by a ray is a density with respect to projected area, and can therefore be
represented, at least in part, as a dual bivector.</li>
</ul>
<p>Like many theoretical math concepts, I think these ideas are mostly useful for enriching your own
mental models of geometry, strengthening your thought process, and deriving results that you can
then use in code in a more “conventional” way. I’m not <em>necessarily</em> suggesting we should
all go off and start implementing $k$-vectors and their duals as classes in our math libraries.
(Frankly, our math libraries are <a href="/blog/on-vector-math-libraries/">enough of a mess already</a>.)</p>
<h2 id="organizing-the-zoo"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#organizing-the-zoo" title="Permalink to this section">Organizing the Zoo</a></h2>
<p>One more thing to muse on before I leave you. We’ve seen that there is a “scaling zoo” of mathematical
elements with different physical, geometric interpretations and behaviors. Different branches of
science and math have distinct ways of conceptually organizing this zoo, and thinking about its
denizens and their relationships.</p>
<p>In computer science, for example, we would probably understand vectors, bivectors, dual vectors, and
so forth as different <em>types</em>. Each might have an internal structure as a composition of more elementary
values (real numbers), and a suite of allowed operations that define what you can do with them and
how they interact with one another.</p>
<p>Physicists, meanwhile, tend to take a more rough-and-ready approach: geometric elements are
thought of as simply matrices of real (or sometimes complex) numbers, together with <em>transformation
laws</em>—rules that define what happens to a given matrix under a change of coordinates. Algebraic
properties such as anticommutativity are obtained by constructing the matrices in such a way that matrix
multiplication implements the desired algebra. For example, a bivector can be represented as an
antisymmetric matrix; wedging two vectors $u, v$ to make a bivector corresponds to calculating the matrix
$$
uv^T - vu^T
$$
which has the same anticommutative property as a wedge product. Multiplying this matrix by a (dual)
vector $w$ then represents the interior product of the bivector with $w$. Meanwhile, a dual
bivector would be structurally similar, but have a different transformation law (“covariant” versus
“contravariant”).</p>
<p>Lastly, mathematicians like to formalize things by saying that different geometric
quantities are elements of different <em>spaces</em> and/or <em>algebras</em>. Both terms ultimately mean a
set (in the mathematical sense), together with some extra structure—such as algebraic operations,
a topology, a norm or metric, and so on—defined on top of the bare set. The exact kind of structures
you need depends on what you’re doing, and there’s a whole menagerie of such structures that
might be invoked in different contexts.</p>
<p>So which structure is behind the scaling zoo? We know we’ve got the vector space structure, and the
Grassmann algebraic structure. But neither of these fully accounts for the different scaling and transformation
behaviors of dual elements: dual spaces are isomorphic to their base spaces (in finite dimensions),
totally identical insofar as the vector and Grassmann structures are concerned.</p>
<p>I don’t have a fully developed answer yet—but I suspect it’s got to do with the <a href="https://en.wikipedia.org/wiki/Representation_of_a_Lie_group">representation theory of Lie groups</a>.
My guess is that the different types of scaling elements we’ve seen can be codified as vector spaces
acted on by different representations of $GL(n)$, the Lie group of all linear maps on $\Bbb R^n$.
But I’m not going to get into that here. (If you’d like to read more on this, here are
a couple web references: <a href="https://sbseminar.wordpress.com/2008/07/08/how-to-write-down-the-representations-of-gl_n/">one</a>,
<a href="http://www.maths.qmul.ac.uk/~whitty/LSBU/MathsStudyGroup/SeligGLn.pdf">two</a>. Also: <a href="http://inference-review.com/article/woits-way">Peter Woit’s book</a> on the role of representation theory in
particle physics.)</p>
<h2 id="conclusion"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-3/#conclusion" title="Permalink to this section">Conclusion</a></h2>
<p>I hope this has been an entertaining and enlightening tour through some of the layers
beneath the surface of your favorite Euclidean geometry. We started with a seemingly simple
question—why do normal vectors transform using the inverse transpose matrix?—and found
that there was <em>much</em> more rich structure there than meets the eye.</p>
<p>The “scaling zoo” of $k$-vectors and their duals makes a pleasingly complete and symmetrical whole.
Even if I’m not going to be employing these things in practical work every day, I feel that studying
them has helped me understand some things that were vague and foggy in my mind before. It’s worth
appreciating that these subtle distinctions exist. One of my general axioms in life is that
everything is more complicated than it first appears, and nowhere is this more consummately borne
out than mathematics!</p>Normals and the Inverse Transpose, Part 2: Dual Spaces
https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/
http://reedbeta.com/blog/normals-inverse-transpose-part-2/Nathan ReedSat, 19 May 2018 16:13:37 -0700https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#commentsGraphicsMath<p>In the <a href="/blog/normals-inverse-transpose-part-1/">first part</a> of this series, we learned about
Grassmann algebra, and concluded that normal vectors in 3D can be interpreted as bivectors. To
transform bivectors, we need to use a different matrix (in general) than the one that transforms
ordinary vectors. Using a canonical basis for bivectors, we found that the matrix required is the <em>cofactor
matrix</em>, which is proportional to the inverse transpose. This provides at least a partial
explanation of why the inverse transpose is used to transform normal vectors.</p>
<p>However, we also left a few loose ends untied. We found out about the cofactor matrix,
but we didn’t really see how that connects to the
<a href="https://computergraphics.stackexchange.com/a/1506/48">algebraic derivation</a> that transforming a
plane equation $N \cdot x + d = 0$ involves the inverse transpose. I just sort of handwaved the
proportionality between the two.</p>
<!--more-->
<p>Moreover, we saw that Grassmann $k$-vectors provide vectorial geometric objects with a natural
interpretation as carrying units of length, area, and volume, owing to their scaling behavior. But
we didn’t find anything similar for densities—units of <em>inverse</em> length, area, or volume.</p>
<p>As we’ll see in this article, there’s one more geometric concept we need to complete the
picture. Putting this new concept together with the Grassmann algebra we’ve already learned will
turn out to clarify and resolve these remaining issues.</p>
<p>Without further ado, let’s dive in!</p>
<div class="toc">
<ul>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#functions-as-vectors">Functions As Vectors</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#linear-forms-and-the-dual-space">Linear Forms and the Dual Space</a><ul>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#the-natural-pairing">The Natural Pairing</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#the-dual-basis">The Dual Basis</a></li>
</ul>
</li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#transforming-dual-vectors">Transforming Dual Vectors</a><ul>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#uniform-scaling">Uniform Scaling</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#sheared-dual-vectors-and-the-inverse-transpose">Sheared Dual Vectors and the Inverse Transpose</a></li>
</ul>
</li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#so-whats-a-normal-vector-anyway">So What’s a Normal Vector, Anyway?</a></li>
</ul>
</div>
<h2 id="functions-as-vectors"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#functions-as-vectors" title="Permalink to this section">Functions As Vectors</a></h2>
<p>Most of this article will be concerned with functions taking and returning vectors of various kinds.
To understand what follows, it’s necessary to make a bit of a mental flip, which you
might find quite counterintuitive if you haven’t encountered it before.</p>
<p>The flip is this: <strong>functions that map into a vector space <em>are themselves</em> vectors</strong>.</p>
<p>That statement might not appear to make any sense at first! Vectors and functions are totally
different kinds of things, right, like apples and…chairs? How can a function literally <em>be</em> a
vector?</p>
<p>If you look up the <a href="https://en.wikipedia.org/wiki/Vector_space#Definition">technical definition of a vector space</a>,
you’ll find that it’s quite nonspecific about what the <em>structure</em> of a vector has to be. We often
think of them as arrows with a magnitude and direction, or as ordered lists of numbers (coordinates).
However, all you truly need for a vector space is a set of <em>things</em> that support two basic operations:
being added together, and being multiplied by scalars (here, real numbers). These operations just
need to obey a few reasonable axioms.</p>
<p>Well, functions can be added together! If we have two functions $f$ and $g$, we can add them
<em>pointwise</em> to produce a new function $h$, defined by $h(x) = f(x) + g(x)$ for every point $x$ in
the domain. Likewise, we can multiply a function pointwise by a scalar: $g(x) = a \cdot f(x)$.
These operations do satisfy the vector space axioms, and therefore any set of compatible
functions forms a vector space in its own right: a <em>function space</em>.</p>
<p>To put it a bit more formally: given a domain set $X$ (any kind of set, not necessarily a vector
space itself) and a range vector space $V$, the set of functions $f: X \to V$ forms a vector space
under pointwise addition and scalar multiplication. You need the range to be a vector space so you
can add and multiply the outputs of the functions, but the domain isn’t required to be a vector space—or
even a “space” per se at all; it could be a discrete set.</p>
<p>This realization that functions can be treated as vectors then lets us apply linear-algebra techniques
to understand and work with functions—a large branch of mathematics called
<a href="https://en.wikipedia.org/wiki/Functional_analysis">functional analysis</a>.</p>
<h2 id="linear-forms-and-the-dual-space"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#linear-forms-and-the-dual-space" title="Permalink to this section">Linear Forms and the Dual Space</a></h2>
<p>From this point forward, we’ll be concerned with a specific class of functions known as
<a href="https://en.wikipedia.org/wiki/Linear_form"><strong>linear forms</strong></a>.</p>
<p>If we have some vector space $V$ (such as 3D space $\Bbb R^3$, for instance), then a linear form
on $V$ is defined as a linear function $f: V \to \Bbb R$. That is, it’s a linear
function that takes a vector argument and returns a scalar.</p>
<p><em>(A note for the mathematicians: in this article I’m only talking about finite-dimensional vector
spaces over $\Bbb R$, so I may occasionally make a statement that doesn’t hold for general vector
spaces. Sorry!)</em></p>
<p>I like to visualize a linear form as a set of parallel, uniformly spaced planes (3D) or lines (2D): the
<a href="https://en.wikipedia.org/wiki/Level_set">level sets</a> of
the function at intervals of one unit in the output. Here are some examples:</p>
<p class="image-array"><img alt="A linear form, x + y" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/linear-form-1.png" style="width:14em" title="A linear form, x + y" />
<img alt="A linear form, −⅓x + ½y" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/linear-form-2.png" style="width:14em" title="A linear form, −⅓x + ½y" />
<img alt="A linear form, 2x + y" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/linear-form-3.png" style="width:14em" title="A linear form, 2x + y" /></p>
<p>The gradients here indicate the linear form’s orientation—the function is increasing with the
gradient’s opacity; the discrete lines mark where its output crosses an integer, and the opacity
wraps around to zero. Note that “bigger”
linear forms (in the sense of bigger output values) have more tightly-spaced lines, and vice versa.</p>
<p>As elaborated in the previous section, linear forms on a given vector space can themselves be treated as
vectors, in their own function space. Linear combinations of linear
functions are still linear, so they do form a closed vector space in their own right.</p>
<p>This vector space—the set of all linear forms on $V$—is important enough that
it has its own name: the <a href="https://en.wikipedia.org/wiki/Dual_space"><strong>dual space</strong></a> of $V$. It’s
denoted $V^*$. The elements of the dual space (the linear forms) are then called <strong>dual vectors</strong>,
or sometimes <em>covectors</em>.</p>
<h3 id="the-natural-pairing"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#the-natural-pairing" title="Permalink to this section">The Natural Pairing</a></h3>
<p>The fact that dual vectors are <em>linear</em> functions, and not general functions from $V$ to
$\Bbb R$, strongly restricts their behavior. Linear forms on an $n$-dimensional
vector space have only $n$ degrees of freedom, versus the infinite degrees of freedom that a
general function has. To put it another way, $V^*$ has the same dimensionality as $V$.</p>
<p>To see this more concretely: a linear form on $\Bbb R^n$ can be
fully specified by the values it returns when you evaluate it on the $n$ vectors of a basis.
The result it returns for any <em>other</em> vector can then be derived by linearity. For
example, if $f$ is a linear form on $\Bbb R^3$, and $v = (x, y, z)$ is an arbitrary vector, then:
$$
\begin{aligned}
f(v) &= f(x {\bf e_x} + y {\bf e_y} + z {\bf e_z}) \\
&= x \, f({\bf e_x}) + y \, f({\bf e_y}) + z \, f({\bf e_z})
\end{aligned}
$$
If you’re thinking that the above looks awfully like a dot product between $(x, y, z)$ and
$\bigl(f({\bf e_x}), f({\bf e_y}), f({\bf e_z}) \bigr)$—you’re right!</p>
<p>Indeed, the operation of evaluating a linear form has the properties of a <em>product</em> between the dual space and
the base vector space: $V^* \times V \to \Bbb R$. This product is called the <strong>natural pairing</strong>.</p>
<p>Like the vector dot product, the natural pairing results in a real number, and is bilinear—linear
on both sides. However, here we’re taking a product not of two vectors, but of a dual vector with
a “plain” vector. The linearity on the left side comes from pointwise adding/multiplying linear forms;
that on the right comes from the linear forms being, well, linear in their vector argument.</p>
<p>Going forward, I’ll denote the natural pairing by angle brackets, like this: $\langle w, v \rangle$.
Here $w$ is a dual vector in $V^*$, and $v$ is a vector in $V$. To reiterate, this is simply
<em>evaluating</em> the linear form $w$, as a function, on the vector $v$. But because functions are vectors,
and dual vectors in particular are <em>linear</em> functions, this operation also has the properties of a product.</p>
<p>The above equation looks like this in angle-bracket notation:
$$
\begin{aligned}
\langle w, v \rangle
&= \bigl\langle w, \, x {\bf e_x} + y {\bf e_y} + z {\bf e_z} \bigr\rangle \\
&= x \langle w, {\bf e_x} \rangle + y \langle w, {\bf e_y} \rangle + z \langle w, {\bf e_z} \rangle
\end{aligned}
$$
Note how this now looks like “just” an application of the distributive property—which it is!</p>
<h3 id="the-dual-basis"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#the-dual-basis" title="Permalink to this section">The Dual Basis</a></h3>
<p>The above construction can also be used to define a canonical basis for $V^*$, for a given basis
on $V$. Namely, we want to make the numbers $\langle w, {\bf e_x} \rangle, \langle w, {\bf e_y} \rangle, \langle w, {\bf e_z} \rangle$
be the <em>coordinates</em> of $w$ with respect to this basis, the same way that $x, y, z$ are coordinates
with respect to $V$’s basis. We can do this by defining <strong>dual basis vectors</strong> ${\bf e_x^*}, {\bf e_y^*}, {\bf e_z^*}$,
according to the following constraints:
$$
\begin{aligned}
\langle {\bf e_x^*}, {\bf e_x} \rangle &= 1 \\
\langle {\bf e_x^*}, {\bf e_y} \rangle &= 0 \\
\langle {\bf e_x^*}, {\bf e_z} \rangle &= 0
\end{aligned}
$$
and similarly for ${\bf e_y^*}, {\bf e_z^*}$. The nine total constraints can be summarized as:
$$
\langle {\bf e}_i^*, {\bf e}_j \rangle =
\begin{cases}
1 & \text{if } i = j, \\
0 & \text{if } i \neq j,
\end{cases}
\quad i, j \in \{ {\bf x, y, z} \}
$$
This dual basis always exists and is unique, given a valid basis on $V$ to start from.</p>
<p>Geometrically speaking, the dual basis consists of linear forms that measure the distance along
each axis—but the level sets of those linear forms are parallel to <em>all the other axes</em>. They’re
not necessarily perpendicular to the same axis that they’re measuring, unless the basis happens to be
orthonormal. This feature will be important a bit later!</p>
<p>By way of example, here are a couple of vector bases together with their corresponding dual bases:</p>
<p class="image-array"><img alt="An orthonormal basis and its corresponding dual basis" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/dual-basis-1.png" style="width:20em;padding:0 10pt" title="An orthonormal basis and its corresponding dual basis" />
<img alt="An non-orthonormal basis and its corresponding dual basis" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/dual-basis-2.png" style="width:20em;padding:0 10pt" title="An non-orthonormal basis and its corresponding dual basis" /></p>
<p>Here’s an example of a linear form decomposed into basis components, $w = p {\bf e_x^*} + q {\bf e_y^*}$:</p>
<p><img alt="A linear form as a sum of x and y basis components" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/dual-basis-3.png" style="width:20em" title="A linear form as a sum of x and y basis components" /></p>
<p>With the dual basis defined as above, if we express both a dual vector $w$ and a vector $v$ in terms
of their respective bases,
then the natural pairing $\langle w, v \rangle$ boils down to just the dot product of the
respective coordinates:
$$
\begin{aligned}
\langle w, v \rangle
&= \bigl\langle p {\bf e_x^*} + q {\bf e_y^*} + r {\bf e_z^*}, \; x {\bf e_x} + y {\bf e_y} + z {\bf e_z} \bigr\rangle \\
&= px + qy + rz
\end{aligned}
$$</p>
<h2 id="transforming-dual-vectors"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#transforming-dual-vectors" title="Permalink to this section">Transforming Dual Vectors</a></h2>
<p>In the preceding article, we learned that although vectors and <em>bivectors</em> may appear
structurally similar (they both have three components, in 3D space), they have different geometric
meanings and different behavior when subject to transformations—in particular, to scaling.</p>
<p>With dual vectors, we have a third example in this class! Dual vectors are again
“vectorial” objects (obeying the vector space axioms), again structurally similar to vectors and
bivectors (having three components, in 3D space), but with a different geometric meaning (linear
forms). This immediately suggests we look into dual vectors’ transformation behavior!</p>
<p>Dual vectors are linear forms, which are functions. So how do we transform a function?</p>
<p>The way I like to think about this is that the function’s output values are carried along with the
points of its domain when they’re transformed. Imagine labeling every point in the domain with the
function’s value at that point. Then apply the transformation to all the points; they move somewhere
else, but carry their label along with them. (Another way of thinking about it is that you’re
transforming the <em>graph</em> of the function, considered as a point-set in a one-higher-dimensional
space.)</p>
<p>To formalize this a bit more: suppose we transform vectors by some matrix $M$, and we want to
apply this transformation also to a function $f(v)$, yielding a new function $g(v)$. What we want is
that $g$ evaluated on a <em>transformed</em> vector should equal $f$ evaluated on the original vector:
$$
g(Mv) = f(v)
$$
Or, equivalently,
$$
g(v) = f(M^{-1}v)
$$
In other words, we can apply a transformation to a function by making a new function that first
applies the <em>inverse</em> transformation to its argument, then passes that to the old function.</p>
<p>Note that this only works if $M$ is invertible. If it isn’t, then our picture of “carrying the output
values along with the domain points” falls apart: a noninvertible $M$ can collapse many distinct domain
points into one, and then how could we decide what the function’s output should be at those points?</p>
<h3 id="uniform-scaling"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#uniform-scaling" title="Permalink to this section">Uniform Scaling</a></h3>
<p>Now that we understand how to apply a transformation to a function, let’s look at uniform scaling
as an example. We’ll scale by a factor $a > 0$, so that vectors transform as $v \mapsto av$. Then
functions will transform as $f(v) \mapsto f(v/a)$, per the previous section.</p>
<p>Let’s switch back to looking at this from a “dual vector” point of view instead of a
“function” point of view.
So, if $f(v) = \langle w, v \rangle$ for some dual vector $w$, then what happens when we scale by $a$?
$$
\begin{aligned}
\langle w, v \rangle \mapsto & \left\langle w, \frac{v}{a} \right\rangle \\
= & \left\langle \frac{w}{a}, v \right\rangle
\end{aligned}
$$
I’ve just moved the $1/a$ factor from one side of the angle brackets to the other, which is allowed
because it’s a bilinear operation. To summarize, we’ve found that the dual vector $w$ transforms as:
$$
w \mapsto \frac{w}{a}
$$</p>
<p>Hmm, interesting! When we scale vectors by $a$, then <strong>dual vectors scale by $\bm{1/a}$</strong>.
If you recall the previous article, we justified assigning units like “area” and “volume” to bivectors
and trivectors on the basis of their scaling behavior. Following that line of reasoning, we can now
conclude that <strong>dual vectors carry units of inverse length!</strong></p>
<p>In fact, dual vectors represent <em>oriented linear densities</em>. They provide a quantitative way
of talking about situations where some kind of scalar “stuff”—such as probability, texel count,
opacity, a change in voltage/temperature/pressure, etc.—is spread out along one dimension in space. When you
pair the dual vector with a vector (i.e. evaluate the linear form on a vector), you’re asking “how
much of that ‘stuff’ does this vector span?”</p>
<p>Under a scaling, we want to preserve the amount of “stuff”. If we’re scaling <em>up</em>, then the density
of “stuff” will need to go <em>down</em>, as the same amount of stuff is now spread over a longer distance;
and vice versa. This property is implemented by the inverse scaling behavior of dual vectors.</p>
<h3 id="sheared-dual-vectors-and-the-inverse-transpose"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#sheared-dual-vectors-and-the-inverse-transpose" title="Permalink to this section">Sheared Dual Vectors and the Inverse Transpose</a></h3>
<p>We’ve seen how uniform scaling applies inversely to dual vectors. We could study nonuniform scaling
now, too, but it turns out that axis-aligned nonuniform scaling isn’t that interesting—it just
applies inversely to each axis, as you might expect. It’ll be more illuminating at this point to
look at what happens with a <em>shear</em>.</p>
<p>I’ll stick to 2D for this one. As an example transformation, we’ll shear the $y$ axis toward $x$ a
little bit:
$$
M = \begin{bmatrix}
1 & \tfrac{1}{2} \\
0 & 1
\end{bmatrix}
$$
Here’s what it looks like:</p>
<p><img alt="The shear applied to a standard vector basis" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/dual-scaling-1.png" style="width:28em" title="The shear applied to a standard vector basis" /></p>
<p>When we perform this transformation on a dual vector, what happens? When you look at it
visually, it’s pretty straightforward—the level sets (isolines) of the linear form will tilt to
follow the shear.</p>
<p><img alt="Animation of a linear form shearing" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/dual-scaling-2.gif" style="width:20em" title="Animation of a linear form shearing" /></p>
<p>But how do we express this as a matrix acting on the dual vector’s coordinates? Let’s focus on the
$\bf e_x^*$ component. Note that our transformation $M$ doesn’t affect the $x$-axis—it maps
$\bf e_x$ to itself. But what about $\bf e_x^*$?</p>
<p><img alt="Animation of eₓ* shearing" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/dual-scaling-3.gif" style="width:20em" title="Animation of eₓ* shearing" /></p>
<p>The $\bf e_x^*$ component of a dual vector <em>does</em> change under this transformation, because
the isolines pick up the shear! Or, to put it another way:
although distances along the $x$ axis (which $\bf e_x^*$ measures) don’t change here,
$\bf e_x^*$ still cares about what the other axes are doing because <em>it has to stay parallel to them</em>.
That’s one of the defining conditions for the dual basis to do its job.</p>
<p>In particular, we have that $\bf e_x^*$ maps to ${\bf e_x^*} - \tfrac{1}{2}{\bf e_y^*}$.
If we work it out the rest of the way, the full matrix that applies to the coordinates of
a dual vector is:
$$
\begin{bmatrix}
1 & 0 \\
-\tfrac{1}{2} & 1
\end{bmatrix}
$$
This is the inverse transpose of $M$!</p>
<p>We can loosely relate the effect of the inverse transpose here to that of the cofactor
matrix for bivectors, as seen in the preceding article. Like a bivector, each dual
basis element cares about what’s happening to the <em>other</em> axes (because it needs to keep parallel
to them)—but it also must scale inversely along its <em>own</em> axis. The determinant of $M$
gives the cumulative scaling along <em>all</em> the axes:
$$
\det M = \text{scaling on my axis} \cdot \text{scaling on other axes}
$$
We can algebraically rearrange this to:
$$
\frac{1}{\text{scaling on my axis}} = \frac{1}{\det M} \cdot \text{scaling on other axes}
$$
This matches the relation between the inverse transpose and the cofactor matrix.
$$
M^{-T} = \frac{1}{\det M} \cdot \text{cofactor}(M)
$$
I’m handwaving a lot here—a detailed geometric demonstration would take us off
into the weeds—but hopefully this gives at least a little bit of intuition for why the inverse
transpose matrix is the right thing to use for dual vectors.</p>
<h2 id="so-whats-a-normal-vector-anyway"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-2/#so-whats-a-normal-vector-anyway" title="Permalink to this section">So What’s a Normal Vector, Anyway?</a></h2>
<p>As we’ve seen, the level sets of a linear form are parallel lines in 2D, or planes in 3D. This implies
that we can define a plane by picking out a specific level set of a given dual vector:
$$
\langle w, v \rangle = d
$$
The dual vector $w$ is acting as a signed distance field for the plane.</p>
<p>We’ve also seen that when expressed in terms of matched basis-and-dual-basis components,
the natural pairing product $\langle w, v \rangle$ reduces to a dot product $w \cdot v$. And then
the above equation looks like the familiar plane equation:
$$
w \cdot v = d
$$
This shows that the dual vector’s coordinates with respect to the dual basis are <em>also</em> the coordinates
of a normal vector to the plane, in the standard vector basis.</p>
<p>So, normal vectors can be interpreted as dual vectors expressed in the dual basis, and that’s
why they transform with the inverse transpose!</p>
<p>But wait—in the last article, didn’t I just say that normal vectors should be interpreted as
bivectors, and therefore they transform with the cofactor matrix? Which one is it?</p>
<p>Ultimately, I don’t think there’s a definitive answer to this question! “Normal vector” as an idea is
a bit too vague—bivectors and dual vectors are <em>both</em> defensible ways to formalize the “normal vector”
concept. As we’ve seen, the way they transform is equivalent as far as <em>orientation</em>:
bivectors and dual vectors both transform to stay perpendicular to the plane they define, by either
$B \wedge v = d$ or $\langle w, v \rangle = d$, respectively. The difference between them is in
the units they carry and their scaling behavior: bivectors are areas, while dual vectors are inverse
lengths.</p>
<p>That’s all I have to say about transforming normal vectors! But we’ve got another question
still dangling. At the end of Part 1, I asked about vectorial quantities with negative scaling powers.
In dual vectors, we’ve now achieved scaling power −1. But what about −2 and −3? To find those,
we’re going to have to combine dual spaces with Grassmann algebra. We’ll do that in the
<a href="/blog/normals-inverse-transpose-part-3/">third and final part</a> of this series.</p>Normals and the Inverse Transpose, Part 1: Grassmann Algebra
https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/
http://reedbeta.com/blog/normals-inverse-transpose-part-1/Nathan ReedSat, 07 Apr 2018 14:36:16 -0700https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#commentsGraphicsMath<p>A mysterious fact about linear transformations is that some of them, namely nonuniform scalings and
shears, make a puzzling distinction between “plain” vectors and normal vectors. When we transform
“plain” vectors with a matrix, we’re required to transform the normals with—for some
reason—the <em>inverse transpose</em> of that matrix. How are we to understand this?</p>
<p>It takes only <a href="https://computergraphics.stackexchange.com/a/1506/48">a bit of algebra</a> to show that
using the inverse transpose ensures that transformed normals will remain perpendicular to the
tangent planes they define. That’s fine as far as it goes, but it misses a deeper and more
interesting story about the geometry behind this—which I’ll explore over the next
few articles.</p>
<!--more-->
<div class="toc">
<ul>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#units-and-scaling">Units and Scaling</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#grassmann-algebra">Grassmann Algebra</a><ul>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#basis-k-vectors">Basis $k$-Vectors</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#the-wedge-product">The Wedge Product</a></li>
</ul>
</li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#transforming-k-vectors">Transforming $k$-Vectors</a><ul>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#bivectors-and-nonuniform-scaling">Bivectors and Nonuniform Scaling</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#the-cofactor-matrix">The Cofactor Matrix</a></li>
</ul>
</li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#bivectors-and-normals">Bivectors and Normals</a></li>
<li><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#further-questions">Further Questions</a></li>
</ul>
</div>
<h2 id="units-and-scaling"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#units-and-scaling" title="Permalink to this section">Units and Scaling</a></h2>
<p>Before we dig into the meat of this article, though, let’s take a little apéritif. Consider
plain old <em>uniform</em> scaling (by the same factor along all axes). It’s hard to think
of a more innocuous transformation—it’s literally just multiplying all vectors by a scalar
constant.</p>
<p>But if we look carefully, there’s already something not quite trivial going on here.
Some quantities carry physical “dimensions” or “units”, like lengths, areas, and volumes. When we perform
a scaling transformation, these quantities are altered in a way that corresponds to their units.
Meanwhile, other quantities are “unitless” and don’t change under a scaling.</p>
<p>To be really explicit, let’s enumerate the possibilities for scaling behavior in 3D space.
Suppose we scale by a factor $a > 0$. Then:</p>
<ul>
<li><strong>Unitless numbers</strong> do not change—or in other words, they get multiplied by $a^0$.</li>
<li><strong>Lengths</strong> get multiplied by $a$.</li>
<li><strong>Areas</strong> get multiplied by $a^2$.</li>
<li><strong>Volumes</strong> get multiplied by $a^3$.</li>
</ul>
<p>And that’s not all—there are also <em>densities</em>, which vary inversely with the scale factor:</p>
<ul>
<li><strong>Linear densities</strong> get multiplied by $1/a$.</li>
<li><strong>Area densities</strong> get multiplied by $1/a^2$.</li>
<li><strong>Volume densities</strong> get multiplied by $1/a^3$.</li>
</ul>
<p>Think of things like “texels per length”, or “probability per area”, or “particles per volume”. If
you scale <em>up</em> a 3D model while keeping its textures the same size, then its texels-per-length
density goes <em>down</em>, and so on.</p>
<p>So, even restricting ourselves to just uniform scaling and looking at scalar (non-vector) values, we
already have a phenomenon where different quantities—which appear the same <em>structurally</em>, i.e.
they’re all just single-component scalars—are revealed to behave differently when a transformation
is applied, owing to the different units they carry. In particular, they carry different powers of
length, ranging from −3 to +3. A quantity with $k$ powers of length scales as $a^k$.</p>
<p>(We could also invent quantities that have scaling powers of ±4 or more, or even fractional scaling
powers. But I’ll leave those aside, as such things don’t have a strong geometric interpretation in 3D.)</p>
<p>Okay, maybe this is somehow reminiscent of the “plain vectors versus normals” thing. But how
does this work for vector quantities? How do nonuniform scalings affect this picture? And where does
the inverse transpose come into it? To really understand this, we’ll have to range farther into the
domains of math.</p>
<h2 id="grassmann-algebra"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#grassmann-algebra" title="Permalink to this section">Grassmann Algebra</a></h2>
<p>For the rest of this series, we’re going to be making use of
<a href="https://en.wikipedia.org/wiki/Exterior_algebra"><strong>Grassmann algebra</strong></a> (also called “exterior algebra”).
Since this is probably unfamiliar to many of my readers, I’ll give a pretty quick introduction to it.
For more background, see <a href="https://www.youtube.com/watch?v=WZApQkDBr5o">this talk by Eric Lengyel</a>,
or the first few chapters of Dorst et al’s <a href="http://www.geometricalgebra.net/"><em>Geometric Algebra for Computer Science</em></a>;
there are also many other references available around the web.</p>
<p>Grassmann algebra extends linear algebra to operate not just on vectors, but on additional “higher-grade”
geometric entities called <strong>bivectors</strong>, <strong>trivectors</strong>, and so on. These objects are collectively
known as <strong>$\bm k$-vectors</strong>, where $k$ is the <strong>grade</strong> or dimensionality of the object. They obey the same
mathematical rules as vectors do—they can be added together, and multiplied by scalars. However,
their geometric interpretation is different.</p>
<p>We often think of a vector as being sort of an abstract arrow—it has both a direction in space in which
the arrow points, and a magnitude, represented by the arrow’s length. A bivector is a lot like that,
but <em>planar</em> instead of linear. Instead of an arrow, it’s an abstract chunk of a flat surface.</p>
<p>Like vectors, bivectors also have directions, in the sense that a planar surface can face various
directions in space; and they have magnitudes, geometrically represented as the <em>area</em> of the
surface chunk. However, what they don’t have is a notion of <em>shape</em> within their plane. When you
picture a bivector as a piece of a plane, you’re free to imagine it as a square, a circle, a
parallelogram, or any funny shape you want, as long as it has the correct area.</p>
<div class="align-center"><figure >
<img src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/grassmann-examples.png" alt="Diagram of geometric interpretation of vectors, bivectors, and trivectors" class="not-too-wide invert-when-dark" title="Diagram of geometric interpretation of vectors, bivectors, and trivectors" />
<figcaption><p class="attribution"><a href="https://commons.wikimedia.org/wiki/File:N_vector_positive.svg">Maschen (Wikipedia)</a></p></figcaption>
</figure></div>
<p>Similarly, <em>trivectors</em> are three-dimensional vectorial quantities; they represent a chunk of space,
instead of a flat surface or an arrow. Again, they have no defined shape, only a magnitude—which
is now a <em>volume</em> instead of an area or length.</p>
<p>In 3D space, trivectors don’t really have a direction in a useful sense—or rather, there’s only
one possible direction, which is <em>parallel to space</em>. However, trivectors still come in two opposite
orientations, which we can denote as “positive” and “negative”, or alternatively “right-handed” and
“left-handed”. It’s much like how a vector can point either left or right along a 1D line, and we
can label those orientations as positive and negative if we like.</p>
<p>In higher dimensions, trivectors could also face different directions, as vectors and bivectors do.
Higher-dimensional spaces would even allow for quadvectors and higher grades. However, we’ll be
sticking to 3D for this series!</p>
<h3 id="basis-k-vectors"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#basis-k-vectors" title="Permalink to this section">Basis $k$-Vectors</a></h3>
<p>Just as you can break down a vector into components with respect to a basis, you can do the same with
bivectors and trivectors. When we write a vector $v$ in terms of coordinates, $v = (x, y, z)$,
what we’re really saying is that $v$ can be made up as a linear combination of basis vectors:
$$
v = x \, {\bf e_x} + y \, {\bf e_y} + z \, {\bf e_z}
$$
The basis vectors ${\bf e_x}, {\bf e_y}, {\bf e_z}$ can be taken as defining the direction
and scale of the coordinate $x, y, z$ axes. In the same way, a bivector $B$ can be formed from a
linear combination of <em>basis bivectors</em>:
$$
B = p \, {\bf e_{yz}} + q \, {\bf e_{zx}} + r \, {\bf e_{xy}}
$$
Here, ${\bf e_{xy}}$ would be a bivector of unit area oriented along the $xy$ plane, and
similarly for ${\bf e_{yz}}, {\bf e_{zx}}$. The basis bivectors correspond not to individual
coordinate axes, but to the planes spanned by <em>pairs</em> of axes. This defines “bivector coordinates”
$(p, q, r)$ by which we can identify or create any other bivector in the space.</p>
<p class="image-array"><img alt="A vector and its components along the axes" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/vector-components.png" style="height:20em;width:auto;padding:0 10pt" title="A vector and its components along the axes" />
<img alt="A bivector and its components along the axis planes" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/bivector-components.png" style="height:20em;width:auto;padding:0 10pt" title="A bivector and its components along the axis planes" /></p>
<p>The trivector case is less interesting:
$$
T = t \, {\bf e_{xyz}}
$$
As mentioned before, trivectors in 3D only have one possible direction, so they have only one basis
element: the unit trivector “along the $xyz$ space”, so to speak. All other trivectors are just some
scalar multiple of ${\bf e_{xyz}}$.</p>
<h3 id="the-wedge-product"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#the-wedge-product" title="Permalink to this section">The Wedge Product</a></h3>
<p>So, Grassmann algebra contains all these vector-like entities of different grades: ordinary vectors
(grade 1), bivectors (grade 2), and trivectors (grade 3). You can also think of plain old scalars
as being grade 0. Finally, to allow different grades to interoperate together, Grassmann algebra
defines an operation called the <strong>wedge product</strong>, or exterior product, denoted $\wedge$. This gives
you the ability to create a bivector by multiplying together two vectors. For example:
$$
{\bf e_x} \wedge {\bf e_y} = {\bf e_{xy}}
$$
In general, you can wedge any two vectors, and the result will be a bivector lying in the plane
spanned by those vectors; its magnitude will be the area of the parallelogram formed by the vectors
(like the cross product).</p>
<p>Note, however, that the bivector doesn’t “remember” the <em>specific</em> two vectors it was wedged from.
Any two vectors in the same plane, spanning a parallelogram of the same area (and orientation), will
generate the same bivector. A bivector can also be factored back into two vectors, but not uniquely.</p>
<p>You can also wedge together <em>three</em> vectors, or a bivector with a vector, to form a trivector.
$$
{\bf e_x} \wedge {\bf e_y} \wedge {\bf e_z} = {\bf e_{xy}} \wedge {\bf e_z} = {\bf e_{xyz}}
$$
This turns out to be equivalent to the “scalar triple product”, producing a trivector representing
the oriented volume of the parallelepiped formed by the three vectors.</p>
<p>The wedge product obeys most of the ordinary multiplication rules you know, such as associativity
and the distributive law. Scalar multiplication commutes with wedges—for scalar $a$, we have:
$$
(au) \wedge v = u \wedge (av) = a(u \wedge v)
$$
However, wedging two vectors together is <em>anticommutative</em> (again like the cross product). For
vectors $u, v$, we have:
$$
u \wedge v = -(v \wedge u)
$$
This has a few implications worth noting. First, any vector wedged with itself always gives zero:
$v \wedge v = 0$. Furthermore, any list of <em>linearly dependent</em> vectors, wedged together, will give
zero. For example, $u \wedge v = 0$ whenever $u$ and $v$ are collinear. In the case of three vectors,
$u \wedge v \wedge w = 0$ whenever $u, v, w$ are coplanar.</p>
<p>This also explains why grades beyond 3 don’t exist in 3D space. The wedge product of <em>four</em> 3D
vectors is always zero, because you can’t have four linearly independent vectors in 3D.</p>
<h2 id="transforming-k-vectors"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#transforming-k-vectors" title="Permalink to this section">Transforming $k$-Vectors</a></h2>
<p>Earlier, I asserted that you could think of the magnitude of a vector as a length,
that of a bivector as an area, and that of a trivector as a volume. But what justifies those
assignments of units to these quantities?</p>
<p>Earlier, we saw that lengths, areas, and volumes have distinct scaling behavior. Upon uniformly
scaling 3D space by a factor $a > 0$, lengths, areas, and volumes will scale as $a, a^2, a^3$,
respectively. We have the tools to see, now, that vectors, bivectors, and trivectors behave in the
same way.</p>
<p>Scaling a vector can be done by multiplying with the appropriate matrix:
$$
\begin{gathered}
v \mapsto Mv \\
\begin{bmatrix} x \\ y \\ z \end{bmatrix} \mapsto
\begin{bmatrix}
a & 0 & 0 \\
0 & a & 0 \\
0 & 0 & a
\end{bmatrix}
\begin{bmatrix} x \\ y \\ z \end{bmatrix}
= \begin{bmatrix} ax \\ ay \\ az \end{bmatrix} = av
\end{gathered}
$$
The vector $v$ as a whole, as well as its components $x, y, z$ and its scalar magnitude, all pick
up a factor $a$ upon scaling; so we can safely call them lengths. Hopefully, this is uncontroversial!</p>
<p>What about bivectors? To see how they behave under scaling (or any linear transformation), we can
turn to the wedge product. In 3D, any bivector can be factored as a wedge product of two vectors.
We already know how to transform vectors. Therefore, we can transform a bivector by transforming its
vector factors and re-wedging them:
$$
\begin{aligned}
B &= u \wedge v \\
(u \wedge v) &\mapsto (Mu) \wedge (Mv) \\
&= (au) \wedge (av) \\
&= a^2 (u \wedge v) \\
&= a^2 B
\end{aligned}
$$
Presto! Since the bivector has two vector factors, and each one scales by $a$, the bivector picks up
an overall factor of $a^2$, making it an area.</p>
<p>Trivectors too can be transformed by factoring them into vectors. It comes as no surprise to find
that their three vector factors give them an overall scaling of $a^3$. Just for completeness:
$$
\begin{aligned}
T &= (u \wedge v \wedge w) \\
(u \wedge v \wedge w) &\mapsto (Mu) \wedge (Mv) \wedge (Mw) \\
&= (au) \wedge (av) \wedge (aw) \\
&= a^3 (u \wedge v \wedge w) \\
&= a^3 T
\end{aligned}
$$</p>
<h3 id="bivectors-and-nonuniform-scaling"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#bivectors-and-nonuniform-scaling" title="Permalink to this section">Bivectors and Nonuniform Scaling</a></h3>
<p>Now, we can finally begin to address our original question. What complications come
into play when we start doing <em>nonuniform</em> scaling?</p>
<p>To investigate this, let’s study an example. We’ll scale by a factor of 3 along the $x$ axis,
leaving the other two axes alone. Our scaling matrix will therefore be:
$$
M = \begin{bmatrix}
3 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}
$$
For plain old vectors, this does the obvious thing: the $x$ component gets multiplied by 3, and the
$y, z$ components are unchanged. In general, this alters both the vector’s length and direction
in a way that depends on its initial direction—vectors close to the $x$ axis are going to be
stretched more, while vectors close to the $yz$ plane will be less affected.</p>
<p><img alt="Animation of a vector scaling along the x axis" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/vector-scaling.gif" style="height:20em;width:auto" title="Animation of a vector scaling along the x axis" /></p>
<p>What happens to a bivector when we perform this transformation? First, let’s just think about it
geometrically. A bivector represents a chunk of area with a particular planar facing direction. When
we stretch this out along the $x$ axis, we again expect both its direction and area to change. But
different bivectors will be affected differently: a bivector close to the $yz$ plane will again
be less affected by the scaling, while bivectors whose planes have a significant component along the
$x$ axis will be stretched more.</p>
<p><img alt="Animation of a bivector scaling along the x axis" class="invert-when-dark" src="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/bivector-scaling.gif" style="height:20em;width:auto" title="Animation of a bivector scaling along the x axis" /></p>
<p>Okay, now to the algebra. As we saw before, we can decompose any bivector $B$ into components
along axis-aligned basis bivectors:
$$
B = p \, {\bf e_{yz}} + q \, {\bf e_{zx}} + r \, {\bf e_{xy}}
$$
To apply our scaling $M$ to the bivector, we just need to see how $M$ affects the basis bivectors.
This can be done by factoring them into their component basis vectors and applying $M$ to those:
$$
\begin{aligned}
{\bf e_{yz}} = {\bf e_y} \wedge {\bf e_z}
\quad &\mapsto \quad (M{\bf e_y}) \wedge (M{\bf e_z}) = {\bf e_y} \wedge {\bf e_z} = {\bf e_{yz}} \\
{\bf e_{zx}} = {\bf e_z} \wedge {\bf e_x}
\quad &\mapsto \quad (M{\bf e_z}) \wedge (M{\bf e_x}) = {\bf e_z} \wedge 3{\bf e_x} = 3{\bf e_{zx}} \\
{\bf e_{xy}} = {\bf e_x} \wedge {\bf e_y}
\quad &\mapsto \quad (M{\bf e_x}) \wedge (M{\bf e_y}) = 3{\bf e_x} \wedge {\bf e_y} = 3{\bf e_{xy}}
\end{aligned}
$$
This matches the geometric intuition: $\bf e_{yz}$ didn’t change at all, while $\bf e_{zx}$
and $\bf e_{xy}$ both picked up a factor of 3 because their planes include the $x$ axis.</p>
<p>So, the overall effect of applying $M$ to the bivector $B$ is:
$$
B \mapsto p \, {\bf e_{yz}} + 3q \, {\bf e_{zx}} + 3r \, {\bf e_{xy}}
$$
Now, just as we would for a vector, we can also write out the transformation of $B$ as components
acted on by a matrix:
$$
\begin{bmatrix} p \\ q \\ r \end{bmatrix} \mapsto
\begin{bmatrix}
1 & 0 & 0 \\
0 & 3 & 0 \\
0 & 0 & 3
\end{bmatrix}
\begin{bmatrix} p \\ q \\ r \end{bmatrix}
= \begin{bmatrix} p \\ 3q \\ 3r \end{bmatrix}
$$
This is the same transformation we just derived, only written a different notation. But notice
something here: the matrix appearing in this equation is <em>not</em> the same matrix $M$ used to
transform vectors.</p>
<p>Apropos of nothing, I’m just going to mention that the inverse transpose of $M$ is <em>proportional</em>
to the matrix above:
$$
M^{-T} =
\begin{bmatrix}
\tfrac{1}{3} & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}
$$
HMM. 🤔🤔🤔</p>
<h3 id="the-cofactor-matrix"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#the-cofactor-matrix" title="Permalink to this section">The Cofactor Matrix</a></h3>
<p>In fact, the matrix we need for transforming bivectors is the
<a href="https://en.wikipedia.org/wiki/Minor_(linear_algebra)"><strong>cofactor matrix</strong></a> of $M$.</p>
<p>This is proportional to the inverse transpose by a factor of $\det M$. (The inverse of $M$ can be calculated as the
<a href="https://en.wikipedia.org/wiki/Invertible_matrix#Analytic_solution">transpose of its cofactor matrix</a>
divided by $\det M$.) Actually, the cofactor matrix is defined even when $M$ is noninvertible—a
nice property, since we <em>can</em> transform vectors using a noninvertible matrix, and we should be able
to do the same to bivectors!</p>
<p>Let’s take a closer look at why the cofactor matrix is the right thing. First
of all, what even is a “cofactor” here?</p>
<p>Each element of an $n \times n$ square matrix has a corresponding cofactor. The recipe for
calculating the cofactor of the element at row $i$, column $j$ is as follows:</p>
<ol>
<li>Start with the original $n \times n$ matrix, and delete the $i$th row and the $j$th column.
This reduces it to an $(n - 1) \times (n - 1)$ submatrix of all the remaining elements.</li>
<li>Calculate the determinant of this submatrix.</li>
<li>Multiply the determinant by $(-1)^{i+j}$, i.e. flip the sign if $i + j$ is odd. That’s the cofactor!</li>
</ol>
<p>Then, the cofactor <em>matrix</em> is just sticking all the cofactors into a new $n \times n$ matrix.</p>
<p>So how is it that this construction works to transform a bivector? Let’s look at the bivector’s first basis
component: $p \, {\bf e_{yz}}$. This term represents an area component in the $yz$ plane; as such,
it only cares what the transformation $M$ does to the $y$ and $z$ axes. Well, the recipe for the 1,1
cofactor of $M$ instructs us to extract the 2×2 submatrix that specifies what $M$ does to the $y$
and $z$ axes. Then we take its determinant, which is nothing but the factor by which area in the
$yz$ plane gets scaled!</p>
<p>Because of the way we chose our bivector basis—${\bf e_{yz}}, {\bf e_{zx}}, {\bf e_{xy}}$
<em>in that order</em>—each element of the cofactor matrix automatically calculates a determinant that
tells how $M$ scales area in the appropriate plane. Or, for the off-diagonal elements, how $M$ maps
area from one axis plane to another. In other words, the cofactors work out to be exactly the
coefficients needed to transform the axis components of a bivector.</p>
<p>(The sign factor in step 3 above, by the way, serves to fix up some order issues. Namely, without
the sign factor, we’d have $\bf e_{xz}$ instead of $\bf e_{zx}$. The latter is the
preferred choice of basis element, for various reasons of convention.)</p>
<p>Incidentally, although we’re focusing on the 3D case here, I’ll quickly note that in $n$ dimensions,
the cofactor matrix works to transform $(n-1)$-vectors (in the appropriate basis). In fact, to
transform $k$-vectors in general, you would want a matrix of $(n-k)$th <em>minors</em> (determinants of
submatrices with $n - k$ rows and columns deleted) of $M$.</p>
<h2 id="bivectors-and-normals"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#bivectors-and-normals" title="Permalink to this section">Bivectors and Normals</a></h2>
<p>At this point, I have to make a small confession. I’ve been hiding something up my sleeve for the
past few pages. The trick is this: <strong>bivectors are isomorphic to normal vectors
in 3D</strong>. In fact, the <em>components</em> $(p, q, r)$ of a bivector in our standard basis are exactly (up
to normalization) the $(x, y, z)$ components of a normal to the bivector’s plane!</p>
<p>Let’s see how this comes about. We saw earlier that wedging a set of linearly dependent vectors
together will give zero. This means that the plane of a bivector $B$ can be defined by the following
equation:
$$
B \wedge v = 0
$$
Any vector $v$ that lies in $B$’s plane will satisfy this equation, because it will form a linearly
dependent set with two vectors “inside” $B$ (two vectors that span the plane). Or, to put it another
way, the trivector spanned by $B$ and $v$ will have zero volume.</p>
<p>Suppose we expand this equation using our standard vector and bivector bases, and simplify:
$$
\begin{gathered}
(p \, {\bf e_{yz}} + q \, {\bf e_{zx}} + r \, {\bf e_{xy}})
\wedge (x \, {\bf e_x} + y \, {\bf e_y} + z \, {\bf e_z}) = 0 \\
(px \, {\bf e_{yzx}} + qy \, {\bf e_{zxy}} + rz \, {\bf e_{xyz}}) = 0 \\
(px + qy + rz) {\bf e_{xyz}} = 0 \\
px + qy + rz = 0 \\
\end{gathered}
$$
Let me annotate this a bit in case the steps weren’t clear. In the second line I’ve distributed the
wedge product out over all the basis terms; most of the terms fall out because they have two copies
of the same axis wedged in (for example, ${\bf e_{yz}} \wedge {\bf e_y} = 0$). In the third
line, I reordered the axes in all the trivectors to $\bf e_{xyz}$, which we can do as long as
we keep track of the sign flips—and here, they all have an even number of sign flips. Finally, I
factored $\bf e_{xyz}$ out of the whole thing and discarded it.</p>
<p>Now, the final line looks just like a dot product between vectors $(p, q, r)$ and $(x, y, z)$! Or
in other words, it looks like the usual plane equation $n \cdot v = 0$, with normal vector $n = (p, q, r)$.</p>
<p>This shows that the bivector coordinates $p, q, r$ with respect to the basis
${\bf e_{yz}}, {\bf e_{zx}}, {\bf e_{xy}}$ are <em>also</em> the coordinates of a normal to the
plane, in the standard vector basis ${\bf e_x}, {\bf e_y}, {\bf e_z}$; moreover, the
operations of <em>wedging</em> with a bivector and <em>dotting</em> with its corresponding normal are identical.
Formally, this is an application of <a href="https://en.wikipedia.org/wiki/Hodge_star_operator">Hodge duality</a>,
which (in 3D) interchanges bivectors and their normals—but more on that in a future article.</p>
<h2 id="further-questions"><a href="https://www.reedbeta.com/blog/normals-inverse-transpose-part-1/#further-questions" title="Permalink to this section">Further Questions</a></h2>
<p>We’ve seen that normal vectors in 3D can be thought of as Grassmann bivectors, at least to an
extent. We’ve also seen geometrically why the cofactor matrix is the right thing to use to transform
a bivector. This provides a somewhat more satisfying answer than “the algebra works out that way” to
our original question of why some transformations make a distinction between ordinary vectors and
normal vectors.</p>
<p>However, there’s still a few remaining issues that I’ve glossed over. I said
that bivectors are “isomorphic” to normal vectors, meaning there’s a one-to-one relationship between
them—but what’s that relationship, exactly? Related, why did we end up with the cofactor matrix instead of the
inverse transpose? They’re proportional to each other, and one could make a case that it doesn’t
really matter in practice which you use, as we usually don’t care about the <em>magnitudes</em> of normal
vectors (we typically normalize them anyway). But we (or, well, I) would still like to understand
the origin of this discrepancy.</p>
<p>Another question: in our “apéritif” at the top of this article, we encountered units with both
positive and negative scaling powers, ranging from −3 to +3. We’ve now seen that Grassmann
$k$-vectors have scaling powers of $k$, from 0 to 3. But what about vectorial quantities with
negative scaling powers? Do those exist, and if so, what are they?</p>
<p>In the <a href="/blog/normals-inverse-transpose-part-2/">next part</a> of this series, we’ll dig deeper into
this and complicate our geometric story still further. 🤓</p>Flows Along Conic Sections
https://www.reedbeta.com/blog/flows-along-conic-sections/
http://reedbeta.com/blog/flows-along-conic-sections/Nathan ReedTue, 12 Dec 2017 20:35:26 -0800https://www.reedbeta.com/blog/flows-along-conic-sections/#commentsGraphicsMath<p>Here’s a cute bit of math I figured out recently. It probably doesn’t have much practical
application, at least not for graphics programmers, but I thought it was fun and wanted to share it.</p>
<p>First of all, everyone knows about rotations: they make things go in circles! More formally, given
a plane to rotate in and a center point, rotations of any angle will preserve circles in the same
plane and with the same center. By “preserve circles”, I mean that the rotation will send every
point on the circle to somewhere on the same circle. The individual points move, but the <em>set</em>
of points comprising the circle is invariant under rotation.</p>
<!--more-->
<p>Moreover, rotations with a fixed plane and center form a one-parameter family of transformations:
they can be parameterized by a single degree of freedom, the angle. By varying the angle, you move
the points around on each circle. Another way of saying this is that the family of rotations defines
a <em>flow</em> along circles: if you take derivatives with respect to the rotation angle, you can get a
vector field that shows how each point is pushed around by the rotation—like a velocity field in a
fluid simulation. The family of concentric circles preserved by the rotation show up as the
<a href="https://en.wikipedia.org/wiki/Integral_curve">integral curves</a> of this vector field.</p>
<p>So far, so good. Now for the fun part: it happens that <strong>all conic sections</strong>, not just circles,
have a similar family of linear or affine transformations that preserve them and induce a flow
along them.</p>
<p>Here’s a shadertoy to demonstrate. It cycles through circles, ellipses, parabolas, and hyperbolas,
and in each case animates through transformations that preserve that conic. The background
coordinate grid shows what the transformation is doing to the space, and dots trace individual
points to show how they flow along the conic.</p>
<div class="embed-wrapper-outer invert-when-dark" >
<div class="embed-wrapper-inner">
<iframe class="embed" type="text/html" allowfullscreen frameborder="0" src="https://www.shadertoy.com/embed/XtXfDS?paused=false&gui=false"></iframe>
</div>
</div>
<p>So, what are these transformations? Let’s look at each type of conic in turn.</p>
<p><strong>Circles</strong>. As we’ve seen, circles are preserved by rotations, which can be parameterized by
their angle $\theta$ and have a matrix of the form:
$$
\begin{bmatrix}
\cos\theta & -\sin\theta \\ \sin\theta & \cos\theta
\end{bmatrix}
$$</p>
<p><strong>Ellipses</strong>. Ellipses are just scaled circles, and the transformations that preserve them can be
derived by: scaling the ellipse to a circle, rotating, then unscaling back to the ellipse. If the
ellipse is axis-aligned and has aspect ratio $\alpha$, then the ellipse-preserving transformations
have the form:
$$
\begin{bmatrix}
\cos\theta & -\alpha\sin\theta \\ \frac{1}{\alpha}\sin\theta & \cos\theta
\end{bmatrix}
$$
Geometrically, this produces a rotation combined with some shear and nonuniform scaling that varies
with angle. Naturally, it reduces to the circle case when $\alpha = 1$.</p>
<p><strong>Parabolas</strong>. This one is different! It turns out there’s no continuous family of <em>linear</em>
transformations that map a parabola to itself. However, there does exist a family of <em>affine</em>
transformations that does so. For the parabolas $y = x^2 + k$, it looks like this:
$$
\begin{bmatrix} x \\ y \end{bmatrix} \mapsto
\begin{bmatrix}
1 & 0 \\ v & 1
\end{bmatrix}
\begin{bmatrix} x \\ y \end{bmatrix} +
\begin{bmatrix} \frac{1}{2}v \\ \frac{1}{4}v^2 \end{bmatrix}
$$
This consists of a shear along the $y$-axis, which sort of “rolls” the parabola, pushing some points
up and others down so that a different point becomes the vertex. Then the translation puts the
vertex back where it was to begin with. The family of transformations is parameterized by the shear
fraction, $v$.</p>
<p>Note that unlike the previous cases, this family isn’t periodic. It’s unbounded; $v$ can range over
all the real numbers. As $v$ gets farther from zero, the shear will get more and more extreme, and
the original coordinate system more and more distorted—but the parabola stays just where it is,
and its points just keep flowing along it.</p>
<p><strong>Hyperbolas</strong>. We’re back to linear transformations again for these ones. Hyperbolas are preserved
by <a href="https://en.wikipedia.org/wiki/Squeeze_mapping">squeeze mappings</a>, which are nonuniform scalings
that have reciprocal scale factors along two axes: if they scale one axis by a factor $a$, they
scale the other by $1/a$. The axes here must be aligned with the asymptotes of the hyperbola.</p>
<p>It turns out that a convenient way to parameterize these is in terms of the
<a href="https://en.wikipedia.org/wiki/Hyperbolic_angle">hyperbolic angle</a> (somewhat of a misnomer, as it
isn’t an <em>angle</em> in the usual sense at all). The hyperbolas $y^2 - x^2 = k$ are preserved by
transformations of the form:
$$
\begin{bmatrix}
\cosh u & \sinh u \\ \sinh u & \cosh u
\end{bmatrix}
$$
This kind of transformation is also known as a “hyperbolic rotation” or a
<a href="https://en.wikipedia.org/wiki/Lorentz_transformation">Lorentz transformation</a>; it’s central in
special relativity (although there usually parameterized differently). Like the parabolic case,
the family is unbounded; $u$, the hyperbolic angle, can range over all real numbers but induces a
consistent flow along the hyperbolas no matter how positive or negative it gets.</p>
<p>Incidentally, these families of transformations we’ve been discussing are examples of continuous
symmetry groups—<a href="https://en.wikipedia.org/wiki/Lie_group">Lie groups</a>. The flow vector field is the
generator of the <a href="https://en.wikipedia.org/wiki/Lie_algebra">Lie algebra</a> for that Lie group.</p>
<p>And, as an application of <a href="https://en.wikipedia.org/wiki/Noether%27s_theorem">Noether’s theorem</a>,
each family also has a corresponding conserved quantity—a particular function of the coordinates
that’s conserved (does not change) when a transformation is applied. Consequently, these quantities
are also constant along the corresponding conic sections. They can therefore serve to identify a
specific conic section amongst all the ones preserved by the same family of transformations.</p>
<ul>
<li>Circles’ conserved quantity is $x^2 + y^2$, the radius of the circle.</li>
<li>Ellipses (in the axis-aligned case) have conserved quantity $(x/\alpha)^2 + y^2$, an
aspect-ratio-corrected radius.</li>
<li>Parabolas (in the standard orientation and aspect ratio we considered) have conserved quantity
$y - x^2$, the height of the parabola above the origin.</li>
<li>Hyperbolas (in the standard orientation and aspect ratio we considered): the conserved quantity
is $y^2 - x^2$, which is the plus or minus the hyperbola’s distance of closest approach to the
origin.</li>
</ul>Conformal Texture Mapping
https://www.reedbeta.com/blog/conformal-texture-mapping/
http://reedbeta.com/blog/conformal-texture-mapping/Nathan ReedSun, 26 Nov 2017 17:28:39 -0800https://www.reedbeta.com/blog/conformal-texture-mapping/#commentsGraphicsMath<p>In two <a href="/blog/quadrilateral-interpolation-part-1/">previous</a> <a href="/blog/quadrilateral-interpolation-part-2/">articles</a>,
I’ve explored some unusual methods of texture mapping—beyond the conventional
approach of linearly interpolating UV coordinates across triangles. This post is a sort of
continuation-in-spirit of that work, but I’m no longer focusing specifically on quadrilaterals.</p>
<p>A problem that often afflicts texture mapping on smooth, curvy models (such as characters) is distortion:
in some regions, the texture may appear overly squashed,
stretched, or sheared on the 3D model. A related but distinct problem is that of different
regions of the model having different texel density, due to varying scale in the UV mapping.
I wanted to explore these issues mathematically. Are there ways to create
texture mappings that have low distortion by construction?</p>
<p>Ultimately, I didn’t come to an altogether satisfying resolution of this question, but I encountered
plenty of interesting math along the way and I want to share some of it. This post will be on the
more esoteric, less immediately-applicable side—but I hope you’ll find the topic intriguing
nonetheless.</p>
<!--more-->
<div class="toc">
<ul>
<li><a href="https://www.reedbeta.com/blog/conformal-texture-mapping/#quantifying-texture-distortion">Quantifying Texture Distortion</a></li>
<li><a href="https://www.reedbeta.com/blog/conformal-texture-mapping/#conformal-maps">Conformal Maps</a></li>
<li><a href="https://www.reedbeta.com/blog/conformal-texture-mapping/#mobius-transformations">Möbius Transformations</a></li>
<li><a href="https://www.reedbeta.com/blog/conformal-texture-mapping/#holomorphic-functions">Holomorphic Functions</a></li>
<li><a href="https://www.reedbeta.com/blog/conformal-texture-mapping/#invertibility-and-critical-points">Invertibility And Critical Points</a></li>
<li><a href="https://www.reedbeta.com/blog/conformal-texture-mapping/#compulsory-criticality">Compulsory Criticality</a></li>
<li><a href="https://www.reedbeta.com/blog/conformal-texture-mapping/#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="quantifying-texture-distortion"><a href="https://www.reedbeta.com/blog/conformal-texture-mapping/#quantifying-texture-distortion" title="Permalink to this section">Quantifying Texture Distortion</a></h2>
<p>How can we mathematically characterize “distortion” in a texture mapping? To begin with, we should
make a distinction between local and global forms of distortion. Local distortion would be visible
when zooming in on a small region of the model—looking at a single triangle, or a small
neighborhood around a point. Conversely, global distortion might only show up when you look at the
whole model and compare texture scale and orientation across widely separated points.</p>
<p>Some amount of global distortion is inevitable in mapping a flat, 2D texture to a non-flat, 3D
object. It’s not necessarily a problem, and it can even be useful in some cases to allow varying
texel density to concentrate more texels in more-important or more-detailed parts of a model. For
example, human head models usually give more texel density to the face region than to the sides, top,
and back of the head.</p>
<p>However, local distortion is usually undesirable. It changes the shapes of features in the texture,
distorts the shape of filter kernels operating in texture space, and gives unequal texel density along
different axes—bad news all around.</p>
<p>To measure local distortion, we can look at the tangent basis implied by the UVs assigned on the
mesh—the same basis we typically compute as part of the setup for normal and parallax mapping.</p>
<p>This basis can be computed per-triangle, and consists of the two 3D vectors within the triangle’s
plane that correspond to the texture’s U and V axes—known as the tangent and bitangent
vectors, respectively. (The triangle normal is usually included as a third basis vector, but we won’t
need that here.) If the triangle’s vertex positions are $p_1, p_2, p_3$ and the corresponding
UVs are $(u_1, v_1) \ldots (u_3, v_3)$, then the tangent and bitangent vectors $T, B$ can be defined by:
$$
\begin{aligned}
p_2 - p_1 &= (u_2 - u_1)T + (v_2 - v_1)B \\
p_3 - p_1 &= (u_3 - u_1)T + (v_3 - v_1)B
\end{aligned}
$$
These equations can be cast in matrix form, and solved as follows:
$$
\begin{bmatrix} T_x & B_x \\ T_y & B_y \\ T_z & B_z \end{bmatrix} =
\begin{bmatrix}
(p_{2x} - p_{1x}) & (p_{3x} - p_{1x}) \\
(p_{2y} - p_{1y}) & (p_{3y} - p_{1y}) \\
(p_{2z} - p_{1z}) & (p_{3z} - p_{1z})
\end{bmatrix}
\begin{bmatrix} (u_2 - u_1) & (u_3 - u_1) \\ (v_2 - v_1) & (v_3 - v_1) \end{bmatrix}^{-1}
$$
In the case of a general parameterized surface given by $p(u, v)$, the tangent basis at a point is
defined as $T = \partial p / \partial u, B = \partial p / \partial v$.</p>
<p>When using a tangent basis for normal mapping, you’d probably orthonormalize it at this point. Here,
we don’t want to do that—the “raw” tangent basis contains the information we want to extract
about texture distortion.</p>
<p>There are a couple of different ways a texture can be distorted locally. One is for it to be
non-uniformly scaled; another is for it to be sheared:</p>
<p class="image-array"><img alt="Non-uniformly scaled texture" src="https://www.reedbeta.com/blog/conformal-texture-mapping/quad_scaled.png" title="Non-uniformly scaled texture" />
<img alt="Sheared texture" src="https://www.reedbeta.com/blog/conformal-texture-mapping/quad_sheared.png" title="Sheared texture" /></p>
<p>Both of these effects can be measured by looking at the tangent basis. Nonuniform scaling can be
measured by comparing the lengths of $T$ and $B$, and shear can be measured by the angle between
them; an unsheared texture mapping should have $T$ and $B$ perpendicular.</p>
<p>A convenient metric for both forms of distortion is the eccentricity of the ellipse created by
transforming a unit circle from tangent space to model space. If the mapping is undistorted, it will
map the unit circle to another circle; if there’s any nonuniform scaling or shear present, the
circle will get elongated into an ellipse (though not necessarily an axis-aligned one):</p>
<div class="embed-wrapper-outer invert-when-dark" >
<div class="embed-wrapper-inner">
<iframe class="embed" type="text/html" allowfullscreen frameborder="0" src="https://www.shadertoy.com/embed/MlXfDH?paused=false&gui=false"></iframe>
</div>
</div>
<p>How can we compute the eccentricity from the tangent basis? The major and minor radii of the ellipse
are the <a href="https://en.wikipedia.org/wiki/Singular-value_decomposition">singular values</a> of the
tangent-to-model transform (i.e. the $[T, B]$ matrix). I’ll skip the detailed derivation, but I
found that the major and minor radii $a, b$ can be expressed in terms of $T$ and $B$ as follows:
$$
\begin{aligned}
a^2 &= \tfrac{1}{2} \left[ (T^2 + B^2) + \sqrt{(T^2 - B^2)^2 + 4(T \cdot B)^2} \right] \\
b^2 &= \tfrac{1}{2} \left[ (T^2 + B^2) - \sqrt{(T^2 - B^2)^2 + 4(T \cdot B)^2} \right]
\end{aligned}
$$
The eccentricity of the ellipse can then be computed as:
$$
\epsilon = \sqrt{1 - \frac{b^2}{a^2}}
$$
This value equals 0 when the ellipse is a circle, and grows toward 1 as it becomes more elongated.</p>
<h2 id="conformal-maps"><a href="https://www.reedbeta.com/blog/conformal-texture-mapping/#conformal-maps" title="Permalink to this section">Conformal Maps</a></h2>
<p>If a texture mapping—either on a triangle mesh or a general parameterized surface—has no
local distortion anywhere, i.e. its eccentricity equals 0 at every point, then it belongs to
a class known as <a href="https://en.wikipedia.org/wiki/Conformal_map"><strong>conformal maps</strong></a>.</p>
<p>Conformal maps are a rich seam of mathematics, with a lot of connections to deep parts of
geometry, analysis, and mathematical physics. In two dimensions, they’re particularly powerful and
flexible (their usefulness falls off in higher dimensions).</p>
<p>Moreover, conformal maps are oddly aesthetically pleasing. 😄 There’s often a rather
soothing quality of smoothness to them, owing to their lack of local distortion.</p>
<p>The key geometric property of these maps is that they preserve angles. To be precise:
if two lines or curves intersect at a certain angle, then their images under a conformal
map will intersect with the same angle. However, <em>distances</em> aren’t preserved in general: a
conformal map may scale things up and down, with different scale factors at different points. As
a result, shapes and sizes of things may be distorted in a global sense.</p>
<p>Another way to express the same idea is that a conformal map can be approximated to first order near
any point as a similarity transformation—a linear transformation with no shear or nonuniform scaling,
only rotation and uniform scaling.</p>
<p>We can also relax the definition of a conformal map by allowing the eccentricity to be bounded by a
constant, $0 \leq \epsilon \leq \epsilon_{\text{max}}$, rather than requiring it to be exactly zero
everywhere. This is called a <a href="https://en.wikipedia.org/wiki/Quasiconformal_mapping"><strong>quasiconformal map</strong></a>.</p>
<h2 id="mobius-transformations"><a href="https://www.reedbeta.com/blog/conformal-texture-mapping/#mobius-transformations" title="Permalink to this section">Möbius Transformations</a></h2>
<p>To get some initial intuition for what conformal maps are like, it’s useful to narrow our focus to a
specific type of conformal map that’s easy to analyze and play with. For this, we’ll look at
<a href="https://en.wikipedia.org/wiki/M%C3%B6bius_transformation"><strong>Möbius transformations</strong></a>, which are
just about the simplest conformal maps that are interesting enough to be worth studying. (They’re
named after <a href="https://en.wikipedia.org/wiki/August_Ferdinand_M%C3%B6bius">August Ferdinand Möbius</a>,
the same fellow better-known for the Möbius strip; he also invented homogeneous coordinates.)</p>
<p>In 2D, the most straightforward way to define these maps is with complex numbers. A 2D Möbius
transformation has the form:
$$
f(z) = \frac{az + b}{cz + d}, \qquad z \in \mathbb{C}
$$
where $a, b, c, d \in \mathbb{C}$ are some constants, which should satisfy $ad - bc \neq 0$ (or the
transform will be degenerate).</p>
<div class="embed-wrapper-outer invert-when-dark" >
<div class="embed-wrapper-inner">
<iframe class="embed" type="text/html" allowfullscreen frameborder="0" src="https://www.shadertoy.com/embed/4tXyWs?paused=false&gui=false"></iframe>
</div>
</div>
<p>Here’s a Shadertoy that applies a Möbius transformation to a coordinate grid, animating the
parameters over time.
While watching this, pay attention to the grid intersections. Though the straight lines become
curved, and the overall shapes may distort quite a bit, wherever two grid lines meet each other they
always remain perpendicular! That’s the conformal property at work.</p>
<p>A few interesting facts about Möbius transformations, in no particular order:</p>
<ul>
<li>They form a mathematical group. The composition of two Möbius transformations is another Möbius,
and the inverse of a Möbius is another Möbius!</li>
<li>Although it has four complex parameters (eight components total), a Möbius has only <em>six</em> degrees
of freedom. That’s because an overall complex factor multiplied into all the parameters has no
effect. In other words, parameters $(a, b, c, d)$ and $(ua, ub, uc, ud)$ specify the same
transformation, for any $u \neq 0 \in \mathbb{C}$.</li>
<li>Möbius transformations generally map lines to circles, and circles to other circles. (And,
occasionally, circles to lines.)</li>
<li>Möbius transformations can be defined in higher dimensions as well, and they behave analogously,
with (hyper)planes mapping to (hyper)spheres.</li>
<li>In 3D and higher, Möbius transformations are the <em>only</em> conformal maps that exist. In 2D, they’re
just a small subset of a much richer collection of conformal maps.</li>
</ul>
<p>The six degrees of freedom of a 2D Möbius are just enough that we can construct a transformation to
map any three chosen points to three others. This makes it tempting to think that we could use them
for texture mapping, by applying a Möbius to each triangle of a 3D model.</p>
<p>Unfortunately, the mapping will not in general be continuous from one triangle to the next: the
shared edge will be mapped to different circles by each triangle’s Möbius, and there are no more
degrees of freedom available to try to fix it.
There have been some papers, <a href="http://www.dmg.tuwien.ac.at/geom/ig/publications/2015/conformal2015/conformal2015.pdf">like this one</a>,
trying to patch together piecewise Möbius transformations with least-squares optimization, to produce
<em>approximate</em> conformal maps.</p>
<p>There’s a good deal more that could be said about Möbius transformations.
However, let’s move on for now and look at a broader set of conformal maps.</p>
<h2 id="holomorphic-functions"><a href="https://www.reedbeta.com/blog/conformal-texture-mapping/#holomorphic-functions" title="Permalink to this section">Holomorphic Functions</a></h2>
<p>From this point forward, we’ll restrict ourselves to the 2D case. Given that Möbius transformations
aren’t quite as flexible as we might like, how can we construct other types of conformal maps?</p>
<p>It’s no accident that we used complex numbers to define the Möbius transformation in the previous
section. Complex numbers, in fact, are intimately linked to conformal maps in 2D.</p>
<p>Why is this? If you recall, I mentioned earlier that one way to define a conformal map is that it
can be approximated to first order near any point as a similarity transformation. Well, multiplication
by a (nonzero) complex number implements a 2D similarity transformation: if $z = re^{i\theta}$, then
multiplication by $z$ will scale by $r$ and rotate by $\theta$.</p>
<p>As you may have guessed, “approximated to first order near a point” is a long-winded way of talking
about derivatives. So, what we’re saying is that for a function on $\mathbb{C}$ to be a conformal
map, its derivative at any point should act as a complex number. In other words, it should be
<em>complex differentiable</em>. Functions that satisfy this requirement are called
<a href="https://en.wikipedia.org/wiki/Holomorphic_function"><strong>holomorphic functions</strong></a>.</p>
<p>I should note that being complex-differentiable is different—and much more restrictive—than just being
differentiable as a vector function on $\mathbb{R}^2$. In other words, it’s not enough for the
$x$ and $y$ components of a mapping to <em>individually</em> be differentiable. As seen before, the
derivative at each point must take the form of a similarity transform; formally, the $x$ and $y$
components must satisfy the <a href="https://en.wikipedia.org/wiki/Cauchy%E2%80%93Riemann_equations">Cauchy–Riemann equations</a>,
which state that the mapping’s tangent and bitangent vectors must be orthogonal, and
of equal length, at each point. Only when these conditions are satisfied can you interpret the mapping
as a differentiable complex function of a complex variable.</p>
<p>Fortunately, the basic differentiation rules we learn in school for real-valued functions do
carry over to complex functions! In particular, all the basic arithmetic operations on complex numbers
are differentiable. So, to make a holomorphic function, all we have to do is write down
an algebraic formula—pretty much whatever we like—for a complex function $f(z)$. These functions
will always produce conformal maps, by construction.</p>
<p>We can also use exp, log, and trig functions, as well as many other special functions; they can be
can be extended into the complex domain and are holomorphic too. However, there are a few operations
we <em>can’t</em> use: the complex conjugate, magnitude, argument, or real or imaginary parts of a complex
number. Those <em>aren’t</em> holomorphic, it turns out. As long as we follow these rules, any function we
build will be holomorphic and therefore conformal.</p>
<p>So, okay! We know a lot about how to build functions to accomplish specific things. In fact, we can
try taking functions we’ve already got experience with, and just extending them to the complex domain.
For instance, take a 1D cubic Bézier curve with control points $c_0, c_1, c_2, c_3$:
$$
B(t) = (1-t)^3 c_0 + 3(1-t)^2 t c_1 + 3(1-t) t^2 c_2 + t^3 c_3
$$
We’ll use the same formula, but make everything complex numbers—both the control points and the
input variable.
$$
B(z) = (1-z)^3 c_0 + 3(1-z)^2 z c_1 + 3(1-z) z^2 c_2 + z^3 c_3
$$
Let’s see what it looks like! Here I’ve set it up to produce the identity map with a little bit of
animated (complex) wiggle in the tangents at the endpoints, 0 and 1.</p>
<div class="embed-wrapper-outer invert-when-dark" >
<div class="embed-wrapper-inner">
<iframe class="embed" type="text/html" allowfullscreen frameborder="0" src="https://www.shadertoy.com/embed/llByzW?paused=false&gui=false"></iframe>
</div>
</div>
<p>Huh. Well, it’s doing…<em>something</em>. The mapping does seem to be conformal, for the most part—right
angles are staying right angles. But why are we seeing the unit square getting duplicated and kinda
merging with itself into curvy 8-sided and 12-sided figures? Why do the grid lines seem to break
and reconnect all the time? This is interesting to look at, but doesn’t seem too useful for texture
mapping. What’s going on?</p>
<h2 id="invertibility-and-critical-points"><a href="https://www.reedbeta.com/blog/conformal-texture-mapping/#invertibility-and-critical-points" title="Permalink to this section">Invertibility And Critical Points</a></h2>
<p>The problem comes down to <em>invertibility</em>. When we get this “multiple copies” phenomenon, what we’re
seeing is the complex function mapping multiple regions of its domain (the Shadertoy’s screen space)
to the same region of its range (the coordinate grid being visualized). In other words, the function
isn’t one-to-one—and therefore it fails to be invertible.</p>
<p>Stepping back to real-valued functions for a moment may help clarify. Here’s the graph of the real
function $f(x) = x^2$:</p>
<p><img alt="Graph of x²" class="not-too-wide invert-when-dark" src="https://www.reedbeta.com/blog/conformal-texture-mapping/xsquared.png" title="Graph of x²" /></p>
<p>It’s a parabola, of course. It’s also one of the simplest examples of a non-invertible function.
Why? Because inputs $x$ and $-x$ both map to the same value, $x^2$. If you squint at it a bit, you
can see the graph as being made up of two distorted copies of the <em>positive half</em> of the real line.</p>
<p>The complex extension of this function $f(z) = z^2$, works the same way—but a bit more
dramatically, it gives us two distorted copies of <em>the entire complex plane</em>, squished together!</p>
<p><img alt="Graph of z²" class="not-too-wide invert-when-dark" src="https://www.reedbeta.com/blog/conformal-texture-mapping/zsquared.png" title="Graph of z²" /></p>
<p>Now, a funny thing happens when we look at higher powers. When we restrict ourselves to the real
numbers only, $x^3$ is invertible, $x^4$ is not, $x^5$ is, and so on: odd powers are invertible,
while even ones aren’t. Even powers all map $x$ and $-x$ to the same value, while odd powers maintain
the distinction.</p>
<p>However, when extended to the complex plane, $z^n$ <em>always</em> fails to be invertible unless $n = 1$!
In fact, graphing $z^n$ gives you $n$ copies of the plane, squished together into wedges around the
origin. All of the copies are conformal mappings—but their scale becomes increasingly extreme as
you approach the origin, and the function is not strictly conformal <em>at</em> the origin.</p>
<div class="embed-wrapper-outer invert-when-dark" >
<div class="embed-wrapper-inner">
<iframe class="embed" type="text/html" allowfullscreen frameborder="0" src="https://www.shadertoy.com/embed/llXBDH?paused=false&gui=false"></iframe>
</div>
</div>
<p>An example for the $n = 3$ case, which you can verify by working out the calculations if you like:
$$
\begin{aligned}
1^3 &= 1 \\
(-\tfrac{1}{2} + \tfrac{\sqrt{3}}{2} i)^3 &= 1 \\
(-\tfrac{1}{2} - \tfrac{\sqrt{3}}{2} i)^3 &= 1 \\
\end{aligned}
$$
Three distinct complex numbers, when cubed, all give the same result of 1.</p>
<p>In general, a holomorphic function will fail to be invertible wherever it has a <em>critical point</em>—a
point where its derivative equals zero. In the vicinity of such a place, the function will locally
behave like $z^n$: it will have $n$ copies of the surrounding region of the complex plane, squished
together into wedges around the critical point. Here, $n$ is one plus the order (aka multiplicity)
of the zero in the derivative.</p>
<p>This is what was going on in the Bézier example from the previous section. Since it was a cubic
polynomial, its derivative is quadratic, and quadratic polynomials have two zeros. So the cubic
Bézier curve has two critical points, which move around the plane as the curve’s parameters change.
When the critical points get too close to the region of interest (the unit square, say), we can see
two or even three copies of that region mushed together.</p>
<h2 id="compulsory-criticality"><a href="https://www.reedbeta.com/blog/conformal-texture-mapping/#compulsory-criticality" title="Permalink to this section">Compulsory Criticality</a></h2>
<p>If we want to build holomorphic functions that are guaranteed to be invertible, we need to avoid
critical points, i.e. zeros of the derivative. Unfortunately, this turns out to be more challenging
than you might expect.</p>
<p>It’s easy to make a real polynomial that doesn’t have any zeros, such as $f(x) = x^2 + 1$.
Correspondingly, it’s easy to make a real polynomial that’s everywhere invertible, by taking the
integral of one that doesn’t have any zeros: $\int (x^2 + 1) \, dx = \tfrac{1}{3}x^3 + x$, for
example.</p>
<p>However, a crucial difference between the real and complex domains comes into play here: while zeros are optional
for real polynomials, they are <em>mandatory</em> for complex ones. A complex polynomial of degree $n \geq 1$
always has <em>exactly</em> $n$ zeros (counted with multiplicity). For example, the zeros of $z^2 + 1$ are
at $z = \pm i$. Thus, a complex polynomial of degree $n \geq 2$ always has at least one critical
point, and possibly up to $n - 1$ of them.</p>
<p>In other words, it’s impossible for complex polynomials of degree 2 or higher to be globally invertible.</p>
<p>Polynomials aren’t the only functions out there, though. What about rational functions? They
obey a similar dictum: if a rational function is degree $p$ over degree $q$, then there are
potentially $p + q - 1$ critical points (remember the quotient rule)—not to mention anywhere from
1 to $q$ poles, where the denominator goes to zero and the rational function blasts off to infinity.
Incidentally, poles behave similarly to critical points in some ways: they come in different orders,
and a pole of order $n$ will have $n$ copies of the complex plane around it. So poles are <em>another</em>
way to break invertibility.</p>
<p>This leads to the somewhat depressing conclusion that the <em>only</em> polynomial or rational complex
functions that are everywhere invertible are those that have degree at-most-1 over degree at-most-1.
In other words: Möbius transformations.</p>
<h2 id="conclusion"><a href="https://www.reedbeta.com/blog/conformal-texture-mapping/#conclusion" title="Permalink to this section">Conclusion</a></h2>
<p>So, polynomial and rational functions aren’t good enough—we’d need to dig deeper if we’re to
find a class of invertible holomorphic functions more powerful than Möbius. One possibility might
be to define $f(z) = \int e^{g(z)} \, dz$, where $g(z)$ is some holomorphic function without poles.
Then $f(z)$ will have neither poles nor critical points, since its derivative is $e^{g(z)}$. (The
complex exponential function, like the real version, is everywhere nonzero.)</p>
<p>Now, if we step back for a moment, we don’t necessarily need <em>global</em> invertibility. If we’re mainly
interested in some bounded region—such as the unit square, for texture mapping—then it may well
be sufficient for our purposes to maintain <em>local</em> invertibility there. This could be done by
keeping critical points and poles far enough from the region of interest that they don’t weird things
out too much. That still seems like a challenging juggling act to perform, though—and moreover,
the more degrees of freedom we have in our function, the more critical points or poles we probably
have to worry about.</p>
<p>In the course of reading up on this subject, I also found <a href="http://www.cs.technion.ac.il/~gotsman/AmendedPubl/Ofir/hilbert.pdf">another paper</a>
that takes a quite different approach—based on <a href="https://en.wikipedia.org/wiki/Cauchy%27s_integral_formula">Cauchy’s integral formula</a>—to
constructing conformal maps. I might write about that in another post sometime—there’s a lot more
to this rabbit hole of math, and it’s interesting stuff, but ultimately it doesn’t seem very practical.</p>
<p>For more reading on the theory of holomorphic functions, see <a href="https://terrytao.wordpress.com/category/teaching/246a-complex-analysis/">Terry Tao’s complex analysis course notes</a>.
(Be warned, it’s a graduate-level course and the notes are pretty dense and formal.)</p>