Nathan Reed’s coding blog
http://reedbeta.com/
Latest posts on Nathan Reed’s coding blogen-usSat, 24 Nov 2018 22:44:43 -0800Python-Like enumerate() In C++17
http://reedbeta.com/blog/python-like-enumerate-in-cpp17/
http://reedbeta.com/blog/python-like-enumerate-in-cpp17/Nathan ReedSat, 24 Nov 2018 22:42:04 -0800http://reedbeta.com/blog/python-like-enumerate-in-cpp17/#commentsCoding<p>Python has a handy built-in function called <a href="https://docs.python.org/3/library/functions.html?highlight=enumerate#enumerate"><code>enumerate()</code></a>,
which lets you iterate over an object (e.g. a list) and have access to both the <em>index</em> and the
<em>item</em> in each iteration. You use it in a <code>for</code> loop, like this:</p>
<div class="codehilite"><pre><span></span><span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">thing</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">listOfThings</span><span class="p">):</span>
<span class="k">print</span><span class="p">(</span><span class="s2">"The </span><span class="si">%d</span><span class="s2">th thing is </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">thing</span><span class="p">))</span>
</pre></div>
<p>Iterating over <code>listOfThings</code> directly would give you <code>thing</code>, but not <code>i</code>, and there are plenty of
situations where you’d want both (looking up the index in another data structure, progress reports,
error messages, generating output filenames, etc).</p>
<p>C++ <a href="https://en.cppreference.com/w/cpp/language/range-for">range-based <code>for</code> loops</a> work a lot like
Python’s <code>for</code> loops. Can we implement an analogue of Python’s <code>enumerate()</code> in C++? We can!
<!--more--></p>
<p>C++17 added <a href="https://en.cppreference.com/w/cpp/language/structured_binding">structured bindings</a>
(also known as “destructuring” in other languages), which allow you to pull apart a tuple type and
assign the pieces to different variables, in a single statement. It turns out that this is also
allowed in range <code>for</code> loops. If the iterator returns a tuple, you can pull it apart and assign the
pieces to different loop variables.</p>
<p>The syntax for this looks like:</p>
<div class="codehilite"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o"><</span><span class="n">ThingA</span><span class="p">,</span> <span class="n">ThingB</span><span class="o">>></span> <span class="n">things</span><span class="p">;</span>
<span class="p">...</span>
<span class="k">for</span> <span class="p">(</span><span class="k">auto</span> <span class="p">[</span><span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">]</span> <span class="o">:</span> <span class="n">things</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// a gets the ThingA and b gets the ThingB from each tuple</span>
<span class="p">}</span>
</pre></div>
<p>So, we can implement <code>enumerate()</code> by creating an iterable object that wraps another iterable and
generates the indices during iteration. Then we can use it like this:</p>
<div class="codehilite"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">Thing</span><span class="o">></span> <span class="n">things</span><span class="p">;</span>
<span class="p">...</span>
<span class="k">for</span> <span class="p">(</span><span class="k">auto</span> <span class="p">[</span><span class="n">i</span><span class="p">,</span> <span class="n">thing</span><span class="p">]</span> <span class="o">:</span> <span class="n">enumerate</span><span class="p">(</span><span class="n">things</span><span class="p">))</span>
<span class="p">{</span>
<span class="c1">// i gets the index and thing gets the Thing in each iteration</span>
<span class="p">}</span>
</pre></div>
<p>The implementation of <code>enumerate()</code> is pretty short, and I present it here for your use:</p>
<div class="codehilite"><pre><span></span><span class="cp">#include</span> <span class="cpf"><tuple></span><span class="cp"></span>
<span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="p">,</span>
<span class="k">typename</span> <span class="n">TIter</span> <span class="o">=</span> <span class="k">decltype</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">begin</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">declval</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">())),</span>
<span class="k">typename</span> <span class="o">=</span> <span class="k">decltype</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">end</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">declval</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">()))</span><span class="o">></span>
<span class="k">constexpr</span> <span class="k">auto</span> <span class="n">enumerate</span><span class="p">(</span><span class="n">T</span> <span class="o">&&</span> <span class="n">iterable</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">struct</span> <span class="n">iterator</span>
<span class="p">{</span>
<span class="kt">size_t</span> <span class="n">i</span><span class="p">;</span>
<span class="n">TIter</span> <span class="n">iter</span><span class="p">;</span>
<span class="kt">bool</span> <span class="k">operator</span> <span class="o">!=</span> <span class="p">(</span><span class="k">const</span> <span class="n">iterator</span> <span class="o">&</span> <span class="n">other</span><span class="p">)</span> <span class="k">const</span> <span class="p">{</span> <span class="k">return</span> <span class="n">iter</span> <span class="o">!=</span> <span class="n">other</span><span class="p">.</span><span class="n">iter</span><span class="p">;</span> <span class="p">}</span>
<span class="kt">void</span> <span class="k">operator</span> <span class="o">++</span> <span class="p">()</span> <span class="p">{</span> <span class="o">++</span><span class="n">i</span><span class="p">;</span> <span class="o">++</span><span class="n">iter</span><span class="p">;</span> <span class="p">}</span>
<span class="k">auto</span> <span class="k">operator</span> <span class="o">*</span> <span class="p">()</span> <span class="k">const</span> <span class="p">{</span> <span class="k">return</span> <span class="n">std</span><span class="o">::</span><span class="n">tie</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="o">*</span><span class="n">iter</span><span class="p">);</span> <span class="p">}</span>
<span class="p">};</span>
<span class="k">struct</span> <span class="n">iterable_wrapper</span>
<span class="p">{</span>
<span class="n">T</span> <span class="n">iterable</span><span class="p">;</span>
<span class="k">auto</span> <span class="nf">begin</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="n">iterator</span><span class="p">{</span> <span class="mi">0</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">begin</span><span class="p">(</span><span class="n">iterable</span><span class="p">)</span> <span class="p">};</span> <span class="p">}</span>
<span class="k">auto</span> <span class="nf">end</span><span class="p">()</span> <span class="p">{</span> <span class="k">return</span> <span class="n">iterator</span><span class="p">{</span> <span class="mi">0</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">end</span><span class="p">(</span><span class="n">iterable</span><span class="p">)</span> <span class="p">};</span> <span class="p">}</span>
<span class="p">};</span>
<span class="k">return</span> <span class="n">iterable_wrapper</span><span class="p">{</span> <span class="n">std</span><span class="o">::</span><span class="n">forward</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">iterable</span><span class="p">)</span> <span class="p">};</span>
<span class="p">}</span>
</pre></div>
<p>This uses SFINAE to ensure it can only be applied to iterable types, and will generate readable
error messages if used on something else. It accepts its parameter as an rvalue reference so you can
apply it to temporary values (e.g. directly to the return value of a function call) as well as to
variables and members.</p>
<p>This compiles without warnings in C++17 mode on gcc 8.2, clang 6.0, and MSVC 15.9. I’ve banged on it
a bit to ensure it doesn’t incur any extra copies, and it works as expected with either const or
non-const containers. It seems to optimize away pretty cleanly, too! 🤘</p>Using A Custom Toolchain In Visual Studio With MSBuild
http://reedbeta.com/blog/custom-toolchain-with-msbuild/
http://reedbeta.com/blog/custom-toolchain-with-msbuild/Nathan ReedTue, 20 Nov 2018 13:34:01 -0800http://reedbeta.com/blog/custom-toolchain-with-msbuild/#commentsCoding<p>Like many of you, when I work on a graphics project I sometimes have a need to compile some shaders.
Usually, I’m writing in C++ using Visual Studio, and I’d like to get my shaders built using the
same workflow as the rest of my code. Visual Studio these days has built-in support for HLSL via
<code>fxc</code>, but what if we want to use the next-gen <a href="https://github.com/Microsoft/DirectXShaderCompiler"><code>dxc</code></a>
compiler?</p>
<p>This post is a how-to for adding support for a custom toolchain—such as <code>dxc</code>, or any other
command-line-invokable tool—to a Visual Studio project, by scripting MSBuild (the underlying build
system Visual Studio uses). We won’t quite make it to parity with a natively integrated language,
but we’re going to get as close as we can.
<!--more--></p>
<p>If you don’t want to read all the explanation but just want some working code to look at, jump down
to the <a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#example-project">Example Project</a> section.</p>
<p>This article is written against Visual Studio 2017, but it may also work in some earlier VSes
(I haven’t tested).</p>
<div class="toc">
<ul>
<li><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#msbuild">MSBuild</a></li>
<li><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#adding-a-custom-target">Adding A Custom Target</a></li>
<li><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#invoking-the-tool">Invoking The Tool</a></li>
<li><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#incremental-builds">Incremental Builds</a></li>
<li><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#header-dependencies">Header Dependencies</a></li>
<li><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#errorwarning-parsing">Error/Warning Parsing</a></li>
<li><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#example-project">Example Project</a></li>
<li><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#the-next-level">The Next Level</a></li>
</ul>
</div>
<h2 id="msbuild"><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#msbuild" title="Permalink to this section">MSBuild</a></h2>
<p>Before we begin, it’s important you understand what we’re getting into. Not to mince words, but
MSBuild is a <a href="http://wiki.c2.com/?StringlyTyped">stringly typed</a>, semi-documented, XML-guzzling,
paradigmatically muddled, cursed hellmaze. However, it <em>does</em> ship with Visual Studio, so if you
can use it for your custom build steps, then you don’t need to deal with any extra add-ins or
software installs.</p>
<p>To be fair, MSBuild is <a href="https://github.com/Microsoft/msbuild">open-source on GitHub</a>, so at least
in principle you can dive into it and see what the cursed hellmaze is doing. However, I’ll warn you
up front that many of the most interesting parts vis-à-vis Visual Studio integration are <em>not</em>
included in the Git repo, but are hidden away in VS’s build extension DLLs. (More about that later.)</p>
<p>My jumping-off point for this enterprise was <a href="http://miken-1gam.blogspot.com/2013/01/visual-studio-and-custom-build-rules.html">this blog post by Mike Nicolella</a>.
Mike showed how to set up an MSBuild <code>.targets</code> file to create an association between a specific file
extension in your project, and a build rule (“target”, in MSBuild parlance) to process those files.
We’ll review how that works, then extend it and jazz it up a bit to get some more quality-of-life
features.</p>
<p>MSBuild docs (such as they are) can be found <a href="https://docs.microsoft.com/en-us/visualstudio/msbuild/msbuild?view=vs-2017">on MSDN here</a>.
Some more information can be gleaned by looking at the C++ build rules installed with Visual
Studio; on my machine they’re in <code>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets</code>.
For example, the file <code>Microsoft.CppCommon.targets</code> in that directory contains most of the target
definitions for C++ compilation, linking, resources and manifests, and so on.</p>
<h2 id="adding-a-custom-target"><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#adding-a-custom-target" title="Permalink to this section">Adding A Custom Target</a></h2>
<p>As shown in Mike’s blog post, we can define our own build rule using a couple of XML files which
will be imported into the VS project. (I’ll keep using shader compilation with <code>dxc</code> as my running
example, but this approach can be adapted for a lot of other things, too.)</p>
<p>First, create a file <code>dxc.targets</code>—in your project directory, or really anywhere—containing
the following:</p>
<div class="codehilite"><pre><span></span><span class="cp"><?xml version="1.0" encoding="utf-8"?></span>
<span class="nt"><Project</span> <span class="na">xmlns=</span><span class="s">"http://schemas.microsoft.com/developer/msbuild/2003"</span><span class="nt">></span>
<span class="nt"><ItemGroup></span>
<span class="c"><!-- Include definitions from dxc.xml, which defines the DXCShader item. --></span>
<span class="nt"><PropertyPageSchema</span> <span class="na">Include=</span><span class="s">"$(MSBuildThisFileDirectory)dxc.xml"</span> <span class="nt">/></span>
<span class="c"><!-- Hook up DXCShader items to be built by the DXC target. --></span>
<span class="nt"><AvailableItemName</span> <span class="na">Include=</span><span class="s">"DXCShader"</span><span class="nt">></span>
<span class="nt"><Targets></span>DXC<span class="nt"></Targets></span>
<span class="nt"></AvailableItemName></span>
<span class="nt"></ItemGroup></span>
<span class="nt"><Target</span>
<span class="na">Name=</span><span class="s">"DXC"</span>
<span class="na">Condition=</span><span class="s">"'@(DXCShader)' != ''"</span>
<span class="na">BeforeTargets=</span><span class="s">"ClCompile"</span><span class="nt">></span>
<span class="nt"><Message</span> <span class="na">Importance=</span><span class="s">"High"</span> <span class="na">Text=</span><span class="s">"Building shaders!!!"</span> <span class="nt">/></span>
<span class="nt"></Target></span>
<span class="nt"></Project></span>
</pre></div>
<p>And another file <code>dxc.xml</code> containing:</p>
<div class="codehilite"><pre><span></span><span class="cp"><?xml version="1.0" encoding="utf-8"?></span>
<span class="nt"><ProjectSchemaDefinitions</span> <span class="na">xmlns=</span><span class="s">"http://schemas.microsoft.com/build/2009/properties"</span><span class="nt">></span>
<span class="c"><!-- Associate DXCShader item type with .hlsl files --></span>
<span class="nt"><ItemType</span> <span class="na">Name=</span><span class="s">"DXCShader"</span> <span class="na">DisplayName=</span><span class="s">"DXC Shader"</span> <span class="nt">/></span>
<span class="nt"><ContentType</span> <span class="na">Name=</span><span class="s">"DXCShader"</span> <span class="na">ItemType=</span><span class="s">"DXCShader"</span> <span class="na">DisplayName=</span><span class="s">"DXC Shader"</span> <span class="nt">/></span>
<span class="nt"><FileExtension</span> <span class="na">Name=</span><span class="s">".hlsl"</span> <span class="na">ContentType=</span><span class="s">"DXCShader"</span> <span class="nt">/></span>
<span class="nt"></ProjectSchemaDefinitions></span>
</pre></div>
<p>Let’s pause for a moment and take stock of what’s going on here. First, we’re creating a new “item
type”, called <code>DXCShader</code>, and associating it with the extension <code>.hlsl</code>. That way, any files we
add to our project with that extension will automatically have this item type applied.</p>
<p>Second, we’re instructing MSBuild that <code>DXCShader</code> items are to be built with the <code>DXC</code> target, and
we’re defining what that target does. For now, all it does is print a message in the build output,
but we’ll get it doing some actual work shortly.</p>
<p>A few miscellaneous syntax notes:</p>
<ul>
<li>Yes, you need two separate files. No, there’s no way to combine them, AFAICT. This is just the
way MSBuild works.</li>
<li>The syntax <code>@(DXCShader)</code> means “the list of all <code>DXCShader</code> items in the project”. The <code>Condition</code>
attribute on a target says under what conditions that target should execute: if the condition is
false, the target is skipped. Here, we’re executing the target if the list <code>@(DXCShader)</code> is non-empty.</li>
<li><code>BeforeTargets="ClCompile"</code> means this target will run before the <code>ClCompile</code> target, i.e. before
C/C++ source files are compiled with <code>cl.exe</code>. This is because we’re going to output our shader
bytecode to headers which will get included into C++, so the shader compile step needs to run
earlier.</li>
<li><code>Importance="High"</code> is needed on the <code><Message></code> task for it to show up in the VS IDE on the
default verbosity setting. Lower importances will be masked unless you turn up the verbosity.</li>
</ul>
<p>To get this into your project, in the VS IDE right-click the project → Build Dependencies… → Build Customizations,
then click “Find Existing” and point it at <code>dxc.targets</code>. Alternatively, add this line to your <code>.vcxproj</code>
(as a child of the root <code><Project></code> element, doesn’t matter where):</p>
<div class="codehilite"><pre><span></span><span class="nt"><Import</span> <span class="na">Project=</span><span class="s">"dxc.targets"</span> <span class="nt">/></span>
</pre></div>
<p>Now, if you add a <code>.hlsl</code> file to your project it should automatically show up as type “DXC Shader”
in the properties; and when you build, you should see the message <code>Building shaders!!!</code> in the
output.</p>
<p>Incidentally, in <code>dxc.xml</code> you can also set up property pages that will show up in the VS IDE on
<code>DXCShader</code>-type files. This lets you define your own metadata and let users configure it per
file. I haven’t done this, but for example, you could have properties to indicate which shader
stages or profiles the file should be compiled for. The <code><Target></code> element can then have logic that refers
to those properties. Many examples of the XML to define property pages can be found in <code>C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\1033</code>
(or a corresponding location depending on which version of VS you have). For example,
<code>custom_build_tool.xml</code> in that directory defines the properties for the built-in Custom Build
Tool item type.</p>
<h2 id="invoking-the-tool"><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#invoking-the-tool" title="Permalink to this section">Invoking The Tool</a></h2>
<p>Okay, now it’s time to get our custom target to actually do something. Mike’s blog post used the MSBuild
<a href="https://docs.microsoft.com/en-us/visualstudio/msbuild/exec-task?view=vs-2017"><code><Exec></code> task</a> to
run a command on each source file. However, we’re going to take a different tack and use the
Visual Studio <code><CustomBuild></code> task instead.</p>
<p>The <code><CustomBuild></code> task is the same one that ends up getting executed if you manually set your
files to “Custom Build Tool” and fill in the command/inputs/outputs metadata in the property pages.
But instead of putting that in by hand, we’re going to set up our target to <em>generate</em> the metadata
and then pass it in to <code><CustomBuild></code>. Doing it this way is going to let us access a couple handy
features later that we wouldn’t get with the plain <code><Exec></code> task.</p>
<p>Add this inside the DXC <code><Target></code> element:</p>
<div class="codehilite"><pre><span></span><span class="c"><!-- Setup metadata for custom build tool --></span>
<span class="nt"><ItemGroup></span>
<span class="nt"><DXCShader></span>
<span class="nt"><Message></span>%(Filename)%(Extension)<span class="nt"></Message></span>
<span class="nt"><Command></span>
"$(WDKBinRoot)\x86\dxc.exe" -T vs_6_0 -E vs_main %(Identity) -Fh %(Filename).vs.h -Vn %(Filename)_vs
"$(WDKBinRoot)\x86\dxc.exe" -T ps_6_0 -E ps_main %(Identity) -Fh %(Filename).ps.h -Vn %(Filename)_ps
<span class="nt"></Command></span>
<span class="nt"><Outputs></span>%(Filename).vs.h;%(Filename).ps.h<span class="nt"></Outputs></span>
<span class="nt"></DXCShader></span>
<span class="nt"></ItemGroup></span>
<span class="c"><!-- Compile by forwarding to the Custom Build Tool infrastructure --></span>
<span class="nt"><CustomBuild</span> <span class="na">Sources=</span><span class="s">"@(DXCShader)"</span> <span class="nt">/></span>
</pre></div>
<p>Now, given some valid HLSL source files in the project, this will invoke <code>dxc.exe</code> twice on each
one—first compiling a vertex shader, then a pixel shader. The bytecode will be output as C arrays in
header files (<code>-Fh</code> option). I’ve just put the output headers in the main project directory, but
in production you’d probably want to put them in a subdirectory somewhere.</p>
<p>Let’s back up and look at the syntax in this snippet. First, the <code><ItemGroup><DXCShader></code> combo
basically says “iterate over the <code>DXCShader</code> items”, i.e. the HLSL source files in the project.
Then what we’re doing is adding metadata: each of the child elements—<code><Message></code>, <code><Command></code>,
and <code><Outputs></code>—becomes a metadata key/value pair attached to a <code>DXCShader</code>.</p>
<p>The <code>%(Foo)</code> syntax accesses item metadata (within a previously established context for “which item”,
which is here created by the iteration over the shaders). All MSBuild items have certain
<a href="https://docs.microsoft.com/en-us/visualstudio/msbuild/msbuild-well-known-item-metadata?view=vs-2017">built-in metadata</a>
like path, filename, and extension; we’re building on those to construct additional
metadata, in the format expected by the <code><CustomBuild></code> task. (It matches the metadata that would be
created if you set up the command line etc. manually in the Custom Build Tool property pages.)</p>
<p>Incidentally, the <code>$(WDKBinRoot)</code> variable (“property”, in MSBuild-ese) is the path to the Windows
SDK <code>bin</code> folder, where lots of tools like <code>dxc</code> live. It needs to be quoted because it can (and
usually does) contain spaces. You can find out these things by running MSBuild with “diagnostic”
verbosity (in VS, go to Tools → Options → Projects and Solutions → Build and Run → “MSBuild project
build output verbosity”)—this will spit out all the defined properties plus a ton of logging about
which targets are running and what they’re doing.</p>
<p>Finally, after setting up all the required metadata, we simply pass it to the <code><CustomBuild></code> task.
(This task isn’t part of core MSBuild, but is defined in <code>Microsoft.Build.CPPTasks.Common.dll</code>—an
extension plugin to MSBuild that comes with Visual Studio.) Again we see the <code>@(DXCShader)</code> syntax,
meaning to pass in the list of all <code>DXCShader</code> items in the project. Internally, <code><CustomBuild></code>
iterates over it and invokes your specified command lines.</p>
<h2 id="incremental-builds"><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#incremental-builds" title="Permalink to this section">Incremental Builds</a></h2>
<p>At this point, we have a working custom build! We can simply add <code>.hlsl</code> files to our project, and
they’ll automatically be compiled by <code>dxc</code> as part of the build process, without us having to do
anything. <em>Hurrah!</em></p>
<p>However, while working with this setup you will notice a couple of problems.</p>
<ol>
<li>When you modify an HLSL source file, Visual Studio will <em>not</em> reliably detect that it
needs to recompile it. If the project was up-to-date before, hitting Build will do nothing!
However, if you have also modified something else (such as a C++ source file), <em>then</em> the build
will pick up the shaders in addition.</li>
<li>Anytime anything else gets built, <em>all</em> the shaders get built. In other words, MSBuild doesn’t
yet understand that if an individual shader is already up-to-date then it can be skipped.</li>
</ol>
<p>Fortunately, we can easily fix these. But first, why are these problems happening at all?</p>
<p>VS and MSBuild depend on <a href="https://docs.microsoft.com/en-us/visualstudio/extensibility/visual-cpp-project-extensibility?view=vs-2017#tlog-files"><code>.tlog</code> (tracker log) files</a>
to cache information about source file dependencies and efficiently determine whether a build is
up-to-date. Somewhere inside your build output directory there will be a folder full of these logs,
listing what source files have gotten built, what inputs they depended on (e.g. headers), and
what outputs they generated (e.g. object files). The problem is that our custom target isn’t
producing any <code>.tlog</code>s.</p>
<p>Conveniently for us, the <code><CustomBuild></code> task supports <code>.tlog</code> handling right out of the box; we
just have to turn it on! Change the <code><CustomBuild></code> invocation in the targets file to this:</p>
<div class="codehilite"><pre><span></span><span class="c"><!-- Compile by forwarding to the Custom Build Tool infrastructure,</span>
<span class="c"> so it will take care of .tlogs --></span>
<span class="nt"><CustomBuild</span>
<span class="na">Sources=</span><span class="s">"@(DXCShader)"</span>
<span class="na">MinimalRebuildFromTracking=</span><span class="s">"true"</span>
<span class="na">TrackerLogDirectory=</span><span class="s">"$(TLogLocation)"</span> <span class="nt">/></span>
</pre></div>
<p>That’s all there is to it—now, modified HLSL files will be properly detected and rebuilt, and
<em>unmodified</em> ones will be properly detected and <em>not</em> rebuilt. This also takes care of deleting the
previous output files when you do a clean build. This is one reason to prefer using the <code><CustomBuild></code>
task rather than the simpler <code><Exec></code> task (we’ll see another reason a bit later).</p>
<p><em>Thanks to Olga Arkhipova at Microsoft for helping me figure out this part!</em></p>
<h2 id="header-dependencies"><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#header-dependencies" title="Permalink to this section">Header Dependencies</a></h2>
<p>Now that we have dependencies hooked up for our custom toolchain, a logical next step is to look
into how we can specify extra input dependencies—so that our shaders can have <code>#include</code>s, for
example, and modifications to the headers will automatically trigger rebuilds properly.</p>
<p>The good news is that yes, we can do this by adding an <code><AdditionalInputs></code> metadata key to our
<code>DXCShader</code> items. Files listed there will get registered as inputs in the <code>.tlog</code>, and the build
system will do the rest. The bad news is that there doesn’t seem to be an easy way to detect <em>on
a file-by-file level</em> which additional inputs are needed.</p>
<p>This is frustrating because Visual Studio actually includes a utility for tracking
file accesses in an external tool! It’s called <code>tracker.exe</code> and lives somewhere in your VS
installation. You give it a command line, and it’ll detect all files opened for reading by the
launched process (presumably by injecting a DLL and detouring <code>CreateFile()</code>, or something along
those lines). I believe this is what VS uses internally to track <code>#include</code>s for C++—and it
would be perfect if we could get access to the same functionality for custom toolchains as well.</p>
<p>Unfortunately, the <code><CustomBuild></code> task <em>explicitly disables</em> this tracking functionality. I was
able to find this out by using <a href="https://github.com/icsharpcode/ILSpy">ILSpy</a> to decompile the
<code>Microsoft.Build.CPPTasks.Common.dll</code>. It’s a .NET assembly, so it decompiles pretty cleanly, and
you can examine the innards of the <code>CustomBuild</code> class. It contains this snippet, in the
<code>ExecuteTool()</code> method:</p>
<div class="codehilite"><pre><span></span><span class="kt">bool</span> <span class="n">trackFileAccess</span> <span class="p">=</span> <span class="k">base</span><span class="p">.</span><span class="n">TrackFileAccess</span><span class="p">;</span>
<span class="k">base</span><span class="p">.</span><span class="n">TrackFileAccess</span> <span class="p">=</span> <span class="k">false</span><span class="p">;</span>
<span class="n">num</span> <span class="p">=</span> <span class="k">base</span><span class="p">.</span><span class="n">TrackerExecuteTool</span><span class="p">(</span><span class="n">pathToTool2</span><span class="p">,</span> <span class="n">responseFileCommands</span><span class="p">,</span> <span class="n">commandLineCommands</span><span class="p">);</span>
<span class="k">base</span><span class="p">.</span><span class="n">TrackFileAccess</span> <span class="p">=</span> <span class="n">trackFileAccess</span><span class="p">;</span>
</pre></div>
<p>That is, it’s turning off file access tracking before calling the base class
method that would otherwise invoke the tracker. I’m sure there’s a reason why they did that, but
sadly it’s stymied my attempts to get automatic <code>#include</code> tracking to work for shaders.</p>
<p>(We could also invoke <code>tracker.exe</code> manually in our command line, but then we face the problem of
merging the tracker-generated <code>.tlog</code> into that of the <code><CustomBuild></code> task. They’re just text files,
so it’s potentially doable…but that is <em>way</em> more programming than I’m prepared to attempt in an
XML-based scripting language.)</p>
<p>Although we can’t get fine-grained file-by-file header dependencies, we can still set up <em>conservative</em>
dependencies by making every HLSL source file depend on every header. This will result in rebuilding
all the shaders whenever any header is modified—but better to rebuild too much than not enough.
We can find all the headers using a wildcard pattern and an <code><ItemGroup></code>. Add this to the DXC
<code><Target></code>, before the “setup metadata” section:</p>
<div class="codehilite"><pre><span></span><span class="c"><!-- Find all shader headers (.hlsli files) --></span>
<span class="nt"><ItemGroup></span>
<span class="nt"><ShaderHeader</span> <span class="na">Include=</span><span class="s">"*.hlsli"</span> <span class="nt">/></span>
<span class="nt"></ItemGroup></span>
<span class="nt"><PropertyGroup></span>
<span class="nt"><ShaderHeaders></span>@(ShaderHeader)<span class="nt"></ShaderHeaders></span>
<span class="nt"></PropertyGroup></span>
</pre></div>
<p>You could also set this to find <code>.h</code> files under a <code>Shaders</code> subdirectory, or whatever you prefer.
The <code>**</code> wildcard is available for recursively searching subdirectories, too.</p>
<p>Then add this inside the <code><ItemGroup><DXCShader></code> section:</p>
<div class="codehilite"><pre><span></span><span class="nt"><AdditionalInputs></span>$(ShaderHeaders)<span class="nt"></AdditionalInputs></span>
</pre></div>
<p>We have to do a little dance here, first forming the <code>ShaderHeader</code> item list, then expanding it
into the <code>ShaderHeaders</code> <em>property</em>, and finally referencing that in the metadata. I’m not sure why,
but if I try to use <code>@(ShaderHeader)</code> directly in the metadata it just comes out blank. Perhaps
it’s not allowed to have nested iteration over item lists in MSBuild.</p>
<p>In any case, after making these changes and rebuilding, the build should now pick up any changes to
shader headers. <em>Woohoo!</em></p>
<h2 id="errorwarning-parsing"><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#errorwarning-parsing" title="Permalink to this section">Error/Warning Parsing</a></h2>
<p>There’s just one more bit of sparkle we can easily add. When you compile C++ and you get an error
or warning, the VS IDE recognizes it and produces a clickable link that takes you to the source
location. If a custom build step emits error messages in the same format, they’ll be picked up as
well—but what if your custom toolchain has a different format?</p>
<p>The <code>dxc</code> compiler emits errors and warnings in gcc/clang format, looking something like this:</p>
<div class="codehilite"><pre><span></span>Shader.hlsl:12:15: error: cannot convert from 'float3' to 'float4'
</pre></div>
<p>It turns out that Visual Studio already does recognize this format (at least as of version 15.9),
which is great! But if it didn’t, or in case you’ve got a tool with some other message format, it turns
out you can provide a regular expression to find errors and warnings in the tool output. The regex
can even supply source file/line information, and the errors will become clickable in the IDE, just
as with C++. (This is all <em>totally undocumented</em> and I only know about it because I spotted the
code while browsing through the decompiled CPPTasks DLL. If you want to take a look for yourself,
the juicy bit is the <code>VCToolTask.ParseLine()</code> method.)</p>
<p>This will use <a href="https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-reference">.NET regex syntax</a>,
and in particular, expects a certain set of <a href="https://docs.microsoft.com/en-us/dotnet/standard/base-types/grouping-constructs-in-regular-expressions#named_matched_subexpression">named captures</a>
to provide metadata. By way of example, here’s the regex I wrote for gcc/clang-format errors:</p>
<div class="codehilite"><pre><span></span>(?'FILENAME'.+):(?'LINE'\d+):(?'COLUMN'\d+): (?'CATEGORY'error|warning): (?'TEXT'.*)
</pre></div>
<p><code>FILENAME</code>, <code>LINE</code>, etc. are the names the parsing code expects for the metadata. There’s one more
I didn’t use: <code>CODE</code>, for an error code (like <a href="https://docs.microsoft.com/en-us/cpp/error-messages/compiler-errors-1/compiler-error-c2440?view=vs-2017">C2440</a>,
etc.). The only required one is <code>CATEGORY</code>, without which the message won’t be clickable (and it
must be one of the words “error”, “warning”, or “note”); all the others are optional.</p>
<p>To use it, pass the regex to the <code><CustomBuild></code> task like so:</p>
<div class="codehilite"><pre><span></span><span class="nt"><CustomBuild</span>
<span class="na">Sources=</span><span class="s">"@(DXCShader)"</span>
<span class="na">MinimalRebuildFromTracking=</span><span class="s">"true"</span>
<span class="na">TrackerLogDirectory=</span><span class="s">"$(TLogLocation)"</span>
<span class="na">ErrorListRegex=</span><span class="s">"(?'FILENAME'.+):(?'LINE'\d+):(?'COLUMN'\d+): (?'CATEGORY'error|warning): (?'TEXT'.*)"</span> <span class="nt">/></span>
</pre></div>
<h2 id="example-project"><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#example-project" title="Permalink to this section">Example Project</a></h2>
<p>Here’s a complete VS2017 project with all the features we’ve discussed, a couple demo shaders, and a
C++ file that includes the compiled bytecode (just to show that works).</p>
<p><a class="biglink" href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/buildcust3.zip">Download Example Project (.zip, 4.3 KB)</a></p>
<p>And for completeness, here’s the final contents of <code>dxc.targets</code>:</p>
<div class="codehilite"><pre><span></span><span class="cp"><?xml version="1.0" encoding="utf-8"?></span>
<span class="nt"><Project</span> <span class="na">xmlns=</span><span class="s">"http://schemas.microsoft.com/developer/msbuild/2003"</span><span class="nt">></span>
<span class="nt"><ItemGroup></span>
<span class="c"><!-- Include definitions from dxc.xml, which defines the DXCShader item. --></span>
<span class="nt"><PropertyPageSchema</span> <span class="na">Include=</span><span class="s">"$(MSBuildThisFileDirectory)dxc.xml"</span> <span class="nt">/></span>
<span class="c"><!-- Hook up DXCShader items to be built by the DXC target. --></span>
<span class="nt"><AvailableItemName</span> <span class="na">Include=</span><span class="s">"DXCShader"</span><span class="nt">></span>
<span class="nt"><Targets></span>DXC<span class="nt"></Targets></span>
<span class="nt"></AvailableItemName></span>
<span class="nt"></ItemGroup></span>
<span class="nt"><Target</span>
<span class="na">Name=</span><span class="s">"DXC"</span>
<span class="na">Condition=</span><span class="s">"'@(DXCShader)' != ''"</span>
<span class="na">BeforeTargets=</span><span class="s">"ClCompile"</span><span class="nt">></span>
<span class="nt"><Message</span> <span class="na">Importance=</span><span class="s">"High"</span> <span class="na">Text=</span><span class="s">"Building shaders!!!"</span> <span class="nt">/></span>
<span class="c"><!-- Find all shader headers (.hlsli files) --></span>
<span class="nt"><ItemGroup></span>
<span class="nt"><ShaderHeader</span> <span class="na">Include=</span><span class="s">"*.hlsli"</span> <span class="nt">/></span>
<span class="nt"></ItemGroup></span>
<span class="nt"><PropertyGroup></span>
<span class="nt"><ShaderHeaders></span>@(ShaderHeader)<span class="nt"></ShaderHeaders></span>
<span class="nt"></PropertyGroup></span>
<span class="c"><!-- Setup metadata for custom build tool --></span>
<span class="nt"><ItemGroup></span>
<span class="nt"><DXCShader></span>
<span class="nt"><Message></span>%(Filename)%(Extension)<span class="nt"></Message></span>
<span class="nt"><Command></span>
"$(WDKBinRoot)\x86\dxc.exe" -T vs_6_0 -E vs_main %(Identity) -Fh %(Filename).vs.h -Vn %(Filename)_vs
"$(WDKBinRoot)\x86\dxc.exe" -T ps_6_0 -E ps_main %(Identity) -Fh %(Filename).ps.h -Vn %(Filename)_ps
<span class="nt"></Command></span>
<span class="nt"><AdditionalInputs></span>$(ShaderHeaders)<span class="nt"></AdditionalInputs></span>
<span class="nt"><Outputs></span>%(Filename).vs.h;%(Filename).ps.h<span class="nt"></Outputs></span>
<span class="nt"></DXCShader></span>
<span class="nt"></ItemGroup></span>
<span class="c"><!-- Compile by forwarding to the Custom Build Tool infrastructure,</span>
<span class="c"> so it will take care of .tlogs and error/warning parsing --></span>
<span class="nt"><CustomBuild</span>
<span class="na">Sources=</span><span class="s">"@(DXCShader)"</span>
<span class="na">MinimalRebuildFromTracking=</span><span class="s">"true"</span>
<span class="na">TrackerLogDirectory=</span><span class="s">"$(TLogLocation)"</span>
<span class="na">ErrorListRegex=</span><span class="s">"(?'FILENAME'.+):(?'LINE'\d+):(?'COLUMN'\d+): (?'CATEGORY'error|warning): (?'TEXT'.*)"</span> <span class="nt">/></span>
<span class="nt"></Target></span>
<span class="nt"></Project></span>
</pre></div>
<h2 id="the-next-level"><a href="http://reedbeta.com/blog/custom-toolchain-with-msbuild/#the-next-level" title="Permalink to this section">The Next Level</a></h2>
<p>At this point, we have a pretty usable MSBuild customization for compiling shaders, or using other
kinds of custom toolchains! I’m pretty happy with it. However, there’s still a couple of areas for
improvement.</p>
<ul>
<li>As mentioned before, I’d like to get file access tracking to work so we can have exact
dependencies for included files, rather than conservative (overly broad) dependencies.</li>
<li>I haven’t done anything with parallel building. Currently, <code><CustomBuild></code> tasks are run one at a
time. There <em>is</em> a <code><ParallelCustomBuild></code> task in the CPPTasks assembly…unfortunately, it
doesn’t support <code>.tlog</code> updating or the error/warning regex, so it’s not directly usable here.</li>
</ul>
<p>To obtain these features, I think I’d need to write my own build extension in C#, defining a custom
task and calling it in place of <code><CustomBuild></code> in the targets file. It might not be too hard to get
that working, but I haven’t attempted it yet.</p>
<p>In the meantime, now that the hard work of circumventing the weird gotchas and reverse-engineering
the undocumented innards has been done, it should be pretty easy to adapt this <code>.targets</code> setup to
other needs for code generation or external tools, and have them act mostly like first-class
citizens in our Visual Studio builds. Cheers!</p>Mesh Shader Possibilities
http://reedbeta.com/blog/mesh-shader-possibilities/
http://reedbeta.com/blog/mesh-shader-possibilities/Nathan ReedSat, 29 Sep 2018 11:42:26 -0700http://reedbeta.com/blog/mesh-shader-possibilities/#commentsCodingGPUGraphics<p>NVIDIA recently announced their latest GPU architecture, called Turing. Although its headlining feature is
<a href="https://arstechnica.com/gadgets/2018/08/microsoft-announces-the-next-step-in-gaming-graphics-directx-raytracing/">hardware-accelerated ray tracing</a>,
Turing also includes <a href="https://devblogs.nvidia.com/nvidia-turing-architecture-in-depth/">several other developments</a>
that look quite intriguing in their own right.</p>
<p>One of these is the new concept of <a href="https://devblogs.nvidia.com/introduction-turing-mesh-shaders/"><em>mesh shaders</em></a>,
details of which dropped a couple weeks ago—and the graphics programming community was agog, with many
enthusiastic discussions taking place on Twitter and elsewhere. So what are mesh shaders (and their
counterparts, task shaders), why are graphics programmers so excited about them, and what might we
be able to do with them?</p>
<!--more-->
<h2 id="the-gpu-geometry-pipeline-has-gotten-cluttered"><a href="http://reedbeta.com/blog/mesh-shader-possibilities/#the-gpu-geometry-pipeline-has-gotten-cluttered" title="Permalink to this section">The GPU Geometry Pipeline Has Gotten Cluttered</a></h2>
<p>The process of submitting geometry—triangles to be drawn—to the GPU has a simple underlying
paradigm: you put your vertices into a buffer, point the GPU at it, and issue a draw call to say
how many primitives to render. The vertices get slurped linearly out of the buffer, each is
processed by a vertex shader, the triangles are rasterized and shaded, and Bob’s your uncle.</p>
<p>But over decades of GPU development, various extra features have gotten bolted onto this basic pipeline
in the name of greater performance and efficiency. Indexed triangles and vertex caches were created to exploit
vertex reuse. Complex vertex stream format descriptions are needed to prepare data for shading.
Instancing, and later multi-draw, allowed certain sets of draw calls to be combined together;
indirect draws could be generated on the GPU itself. Then came
the extra shader stages: geometry shaders, to allow programmable operations on primitives and even
inserting or deleting primitives on the fly, and then tessellation shaders, letting you submit a
low-res mesh and dynamically subdivide it to a programmable level.</p>
<p>While these features and more were all added for good reasons (or at least what <em>seemed</em> like
good reasons at the time), the compound of all of them has become unwieldy. Which subset of the
many available options do you reach for in a given situation? Will your choice be efficient across
all the GPU architectures your software must run on?</p>
<p>Moreover, this elaborate pipeline is still not as flexible as we would sometimes like—or, where
flexible, it is not performant. Instancing can only draw copies of a single mesh at a time;
multi-draw is still inefficient for large numbers of small draws. Geometry shaders’ programming model is <a href="http://www.joshbarczak.com/blog/?p=667">not
conducive to efficient implementation</a> on wide SIMD cores in
GPUs, and its <a href="https://fgiesen.wordpress.com/2011/07/20/a-trip-through-the-graphics-pipeline-2011-part-10/">input/output buffering presents difficulties too</a>.
Hardware tessellation, though very handy for certain things, is often <a href="https://www.sebastiansylvan.com/post/the-problem-with-tessellation-in-directx-11/">difficult to use well</a>
due to the limited granularity at which you can set tessellation factors, the limited set of baked-in
<a href="/blog/tess-quick-ref/">tessellation modes</a>, and performance issues on some GPU architectures.</p>
<h2 id="simplicity-is-golden"><a href="http://reedbeta.com/blog/mesh-shader-possibilities/#simplicity-is-golden" title="Permalink to this section">Simplicity Is Golden</a></h2>
<p>Mesh shaders represent a radical simplification of the geometry pipeline. With a mesh shader
enabled, all the shader stages and fixed-function features described above are swept away. Instead, we get
a clean, straightforward pipeline using a compute-shader-like programming model. Importantly, this
new pipeline is both highly flexible—enough to handle the existing geometry tasks in a typical game,
plus enable new techniques that are challenging to do on the GPU today—<em>and</em> it looks
like it should be quite performance-friendly, with no apparent architectural barriers to efficient
GPU execution.</p>
<p>Like a compute shader, a mesh shader defines work groups of parallel-running threads, and they can
communicate via on-chip shared memory as well as <a href="http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2017/07/GDC2017-Wave-Programming-D3D12-Vulkan.pdf">wave intrinsics</a>.
In lieu of a draw call, the app launches some number of mesh shader work groups. Each work group
is responsible for writing out a small, self-contained chunk of geometry, called a
“meshlet”, expressed in arrays of vertex attributes and corresponding indices. These meshlets
then get tossed directly into the rasterizer, and Bob’s your uncle.</p>
<p>(More details can be found in <a href="https://devblogs.nvidia.com/introduction-turing-mesh-shaders/">NVIDIA’s blog post</a>,
a <a href="http://on-demand.gputechconf.com/siggraph/2018/video/sig1811-3-christoph-kubisch-mesh-shaders.html">talk by Christoph Kubisch</a>,
and the <a href="https://www.khronos.org/registry/OpenGL/extensions/NV/NV_mesh_shader.txt">OpenGL extension spec</a>.)</p>
<p>The appealing thing about this model is how data-driven and freeform it is. The mesh shader pipeline
has very relaxed expectations about the shape of your data and the kinds of things you’re doing to do.
Everything’s up to the programmer: you can pull the vertex and index data from buffers, generate
them algorithmically, or any combination.</p>
<p>At the same time, the mesh shader model sidesteps the issues that hampered geometry shaders, by explicitly embracing
SIMD execution (in the form of the compute “work group” abstraction). Instead of each shader <em>thread</em>
generating geometry on its own—which leads to divergence, and large input/output data sizes—we
have the whole work group outputting a meshlet cooperatively. This mean we can use
compute-style tricks, like: first do some work on the vertices in parallel, then have a barrier, then work on
the triangles in parallel. It also means the input/output bandwidth needs are a lot more reasonable.
And, because meshlets are indexed triangle lists, they don’t break vertex reuse, as geometry shaders often did.</p>
<h2 id="an-upgrade-path"><a href="http://reedbeta.com/blog/mesh-shader-possibilities/#an-upgrade-path" title="Permalink to this section">An Upgrade Path</a></h2>
<p>The other really neat thing about mesh shaders is that they don’t require you to drastically rework
how your game engine handles geometry to take advantage of them. It looks like it should be pretty
easy to convert most common geometry types to mesh shaders, making it an approachable upgrade path for
developers.</p>
<p>(You don’t have to convert <em>everything</em> to mesh shaders straight away, though; it’s possible
to switch between the old geometry pipeline and the new mesh-shader-based one at different points in
the frame.)</p>
<p>Suppose you have an ordinary authored mesh that you want to load and render. You’ll
need to break it up into meshlets, which have a static maximum size declared in the
shader—NVIDIA’s blog post recommends 64 vertices and 126 triangles as a default. How do we do this?</p>
<p>Fortunately, most game engines currently do some form of <a href="https://tomforsyth1000.github.io/papers/fast_vert_cache_opt.html">vertex cache optimization</a>,
which already organizes the primitives by locality—triangles sharing one or two vertices will tend
to be close together in the index buffer. So, a quite viable
strategy for creating meshlets is: just scan the index buffer linearly, accumulating the set of
vertices used, until you hit either 64 vertices or 126 triangles; reset and repeat until you’ve gone
through the whole mesh. This could be done at art build time, or it’s simple enough that you could even do it
in the engine at level load time.</p>
<p>Alternatively, vertex cache optimization algorithms can probably be modified to produce meshlets directly.
For GPUs without mesh shader support, you can concatenate all the meshlet vertex buffers
together, and rapidly generate a traditional index buffer by offsetting and concatenating all the
meshlet index buffers. It’s pretty easy to go back and forth.</p>
<p>In either case, the mesh shader would be mostly just acting as a vertex shader, with some extra
code to fetch vertex and index data from their buffers and plug them into the mesh outputs.</p>
<p>What about other kinds of geometry found in games?</p>
<p>Instanced draws are straightforward: multiply the meshlet count and put in a bit of
shader logic to hook up instance parameters. A more interesting case is multi-draw, where we want
to draw a lot of meshes that <em>aren’t</em> all copies of the same thing. For this, we can employ
<em>task shaders</em>—a secondary feature of the mesh shader pipeline. Task shaders
add an extra layer of compute-style work groups, running before the mesh shader, and they control
<em>how many</em> mesh shader work groups to launch. They can also write output variables to be consumed by the
mesh shader. A very efficient multi-draw should be possible by launching task shaders with a thread
per draw, which in turn launch the mesh shaders for all the individual draws.</p>
<p>If we need to draw a lot of <em>very</em> small meshes, such as quads for particles/imposters/text/point-based rendering,
or boxes for occlusion tests / projected decals and whatnot, then we can pack a bunch of them
into each mesh shader workgroup. The geometry can be generated entirely in-shader rather than relying
on a pre-initialized index buffer from the CPU. (This was one of the original use cases that, it was
hoped, could be done with geometry shaders—e.g. submitting point primitives, and having the GS expand them
into quads.) There’s also a lot of flexibility to do stuff with variable topology, like particle
beams/strips/ribbons, which would otherwise need to be generated either on the CPU or in a separate
compute pre-pass.</p>
<p>(By the way, the <em>other</em> original use case that, it was hoped, could be done with geometry shaders
was multi-view rendering: drawing the same geometry to, say, multiple faces of a cubemap or slices
of a cascaded shadow map within a single draw call. You could do that with mesh shaders, too—but
Turing actually has a separate hardware multi-view capability for these applications.)</p>
<p>What about tessellated meshes?</p>
<p>The two-layer structure of task and mesh shaders is broadly
similar to that of tessellation hull and domain shaders. While it doesn’t appear that mesh shaders
have any kind of access to the fixed-function tessellator unit, it’s also
not too hard to imagine that we could write code in task/mesh shaders to reproduce tessellation
functionality (or at least some of it). Figuring out the details would be a bit of a research project
for sure—maybe someone has already worked on this?—and perf would be a question mark. However,
we’d get the benefit of being able to <em>change</em> how tessellation works, instead of being stuck with
whatever Microsoft decided on in the late 2000s.</p>
<h2 id="new-possibilities"><a href="http://reedbeta.com/blog/mesh-shader-possibilities/#new-possibilities" title="Permalink to this section">New Possibilities</a></h2>
<p>It’s great that mesh shaders can subsume our current geometry tasks, and in some cases make them
more efficient. But mesh shaders also open up possibilities for new kinds of geometry processing
that wouldn’t have been feasible on the GPU before, or would have required expensive compute
pre-passes storing data out to memory and then reading it back in through the traditional geometry
pipeline.</p>
<p>With our meshes already in meshlet form, we can do <a href="https://www.slideshare.net/gwihlidal/optimizing-the-graphics-pipeline-with-compute-gdc-2016">finer-grained culling</a>
at the meshlet level, and even at the triangle level within each meshlet. With task shaders, we can
potentially do mesh LOD selection on the GPU, and if we want to get fancy we could even try dynamically
packing together very small draws (from coarse LODs) to get better meshlet utilization.</p>
<p>In place of tile-based forward lighting, or as an extension to it, it might be useful to cull
lights (and projected decals, etc.) per meshlet, assuming there’s a good way to pass the variable-size
light list from a mesh shader down to the fragment shader. (This suggestion from <a href="https://twitter.com/sebaaltonen">Seb Aaltonen</a>.)</p>
<p>Having access to the topology in the mesh shader should enable us to calculate dynamic normals,
tangents, and curvatures for a mesh that’s deforming due to complex skinning, displacement mapping,
or procedural vertex animation. We can also do voxel meshing, or isosurface extraction—marching
cubes or tetrahedra, plus generating normals etc. for the isosurface—directly in a mesh shader,
for rendering fluids and volumetric data.</p>
<p>Geometry for hair/fur, foliage, or other surface cover might be feasible to generate on the fly,
with view-dependent detail.</p>
<p>3D modeling and CAD apps may be able to apply mesh shaders to dynamically triangulate quad meshes or
n-gon meshes, as well as things like dynamically insetting/outsetting geometry for
visualizations.</p>
<p>For rendering displacement-mapped terrain, water, and so forth, mesh shaders may be able to assist
us with <a href="https://developer.nvidia.com/gpugems/GPUGems2/gpugems2_chapter02.html">geometry clipmaps</a>
and geomorphing; they might also be interesting for <a href="http://hhoppe.com/proj/vdrpm/">progressive meshing</a>
schemes.</p>
<p>And last but not least, we might be able to render <a href="https://ia601908.us.archive.org/16/items/GDC2014Brainerd/GDC2014-Brainerd.pdf">Catmull–Clark subdivision surfaces</a>,
or other subdivision schemes, more easily and efficiently than it can be done on the GPU today.</p>
<p>To be clear, a great deal of the above is speculation and handwaving on my part—I don’t want to
mislead you that all of these things are <em>for sure</em> doable with the new mesh and task shader
pipeline. There will certainly be algorithmic difficulties and architectural hindrances that will
come up as graphics programmers have a chance to dig into this. Still, I’m quite excited to see what
people will do with this capability over the next few years, and I hope and expect that it won’t be
an NVIDIA-exclusive feature for too long.</p>Normals and the Inverse Transpose, Part 3: Grassmann On Duals
http://reedbeta.com/blog/normals-inverse-transpose-part-3/
http://reedbeta.com/blog/normals-inverse-transpose-part-3/Nathan ReedSun, 22 Jul 2018 22:18:10 -0700http://reedbeta.com/blog/normals-inverse-transpose-part-3/#commentsGraphicsMath<p>Welcome back! In the last couple of articles, we learned about different ways to understand normal
vectors in 3D space—either as bivectors (<a href="/blog/normals-inverse-transpose-part-1/">part 1</a>), or as
dual vectors (<a href="/blog/normals-inverse-transpose-part-2/">part 2</a>). Both can be valid interpretations,
but they carry different units, and react differently to transformations.</p>
<p>In this third and final installment, we’re going leave behind the focus on normal vectors, and explore
a couple of other unitful vector quantities. We’ve seen how Grassmann bivectors and trivectors act as
oriented areas and volumes, respectively; and we saw how dual vectors act as oriented <em>line densities</em>, with
units of inverse length. Now, we’re going to put these two geometric concepts together, and find out
what they can accomplish with their combined powers. (Get it? Powers? Like powers of a scale factor?
Uh, you know what, never mind.)</p>
<!--more-->
<p>I’m going to dive right in, so if you need a refresher on either Grassmann algebra or dual spaces,
you may want to re-skim the previous articles.</p>
<div class="toc">
<ul>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#wedge-products-of-dual-vectors">Wedge Products of Dual Vectors</a></li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#dual-bivectors">Dual Bivectors</a></li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#dual-trivectors">Dual Trivectors</a></li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#a-few-more-topics">A Few More Topics</a><ul>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#the-interior-product">The Interior Product</a></li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#the-hodge-star">The Hodge Star</a></li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#the-inner-product-or-forgetting-about-duals">The Inner Product, or Forgetting About Duals</a></li>
</ul>
</li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#whats-the-use-of-all-this">What’s The Use of All This?</a></li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#organizing-the-zoo">Organizing the Zoo</a></li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="wedge-products-of-dual-vectors"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#wedge-products-of-dual-vectors" title="Permalink to this section">Wedge Products of Dual Vectors</a></h2>
<p>Grassmann algebra allows us to take wedge products of vectors, producing higher-grade algebraic
entities such as bivectors and trivectors. Just as we can do this with base vectors, we can do the
same thing on dual vectors, producing <em>dual bivectors</em> and <em>dual trivectors</em>.</p>
<p>A dual bivector is formed by wedging two dual vectors, like:
$$
{\bf e_x^*} \wedge {\bf e_y^*} = {\bf e_{xy}^*}
$$
and a dual trivector is the product of three:
$$
{\bf e_x^*} \wedge {\bf e_y^*} \wedge {\bf e_z^*} = {\bf e_{xy}^*} \wedge {\bf e_z^*} = {\bf e_{xyz}^*}
$$
This works exactly the same way that wedge products of ordinary vectors do; in particular, the same
anticommutative law applies.</p>
<p>So what’s the geometric meaning of these dual $k$-vectors? Recall that a dual vector is defined as
a linear form—a function from some vector space $V$ to scalars $\Bbb R$. Conveniently, the wedge
products of dual vectors turn out to be isomorphic to the duals of wedge products of vectors.
(Mathematically, we can say, for finite-dimensional $V$:
$$
\textstyle
\bigwedge^k \bigl( V^* \bigr) \cong \bigl(\bigwedge^k V \bigr)^*
$$
where $\bigwedge^k$ is the operation to construct the set of $k$-vectors over a given base
vector space.)</p>
<p>The upshot is that dual $k$-vectors can be understood as <em>linear forms on $k$-vectors</em>: a dual
bivector is a linear function from bivectors to scalars, and a dual trivector is a linear function
from trivectors to scalars. Let’s see how this works in more detail.</p>
<h2 id="dual-bivectors"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#dual-bivectors" title="Permalink to this section">Dual Bivectors</a></h2>
<p>In the previous article, we saw how a dual vector can be visualized as a field of parallel, uniformly
spaced planes, representing the level sets of a linear form:</p>
<div class="align-center"><figure >
<img src="http://reedbeta.com/blog/normals-inverse-transpose-part-3/1-form.png" alt="A dual vector in 3D, visualized as a set of parallel planes" style="height:15em" title="A dual vector in 3D, visualized as a set of parallel planes" />
<figcaption><p class="attribution"><a href="https://commons.wikimedia.org/wiki/File:N-form.svg">Maschen (Wikipedia)</a></p></figcaption>
</figure></div>
<p>You can think of the discrete planes in this picture as representing intervals of one unit
in the output of the linear form. Keep in mind, though, that there are actually a <em>continuous
infinity</em> of these planes, filling space—one for every possible output value of the linear form.
When you evaluate the linear form—i.e. pair a dual vector with a vector—the result represents <em>how
many planes</em> the vector crosses, from its tail to its tip (in a continuous-measure sense of “how many”).
This will depend on both the length and orientation of the vector: for example, a vector parallel to
the planes will return zero, no matter its length.</p>
<p>A dual <em>bivector</em> can be thought of in a similar way—but instead of planes, we now picture a field
of parallel <em>lines</em>, uniformly spaced over the plane perpendicular to them.</p>
<div class="align-center"><figure >
<img src="http://reedbeta.com/blog/normals-inverse-transpose-part-3/2-form.png" alt="A dual bivector in 3D, visualized as a set of parallel lines, formed as the intersections of the planes of two dual vectors" style="height:20em" title="A dual bivector in 3D, visualized as a set of parallel lines, formed as the intersections of the planes of two dual vectors" />
<figcaption><p class="attribution"><a href="https://commons.wikimedia.org/wiki/File:N-form.svg">Maschen (Wikipedia)</a></p></figcaption>
</figure></div>
<p>As suggested by this diagram, when you wedge two dual vectors, the resulting dual bivector consists
of all the <em>lines of intersection</em> of the two dual vectors’ respective planes.</p>
<p>What happens when we pair this dual bivector with a base bivector? As before, the
result is a scalar—this time representing <em>how many lines</em> the bivector crosses! If you visualize
the bivector as a parallelogram, or circle or any other shape, it will have a certain area. It
will therefore intersect some quantity of the continuous mass of lines. This quantity won’t depend on
the <em>shape</em> of the bivector—remember, bivectors don’t actually <em>have</em> any defined shape—only on
its area (magnitude) and orientation. A bivector whose plane runs parallel to the lines will return
zero, no matter its area.</p>
<p>Because dual vectors have units of inverse length, and a dual bivector is a product of dual vectors,
<strong>a dual bivector has units of inverse area</strong>. It represents an oriented areal
density, such as a probability density over a surface! When you pair the dual bivector with a
bivector, the result tells you how much probability (or whatever else) is covered by that bivector’s
area. And as implied by their units, dual bivectors scale as $1/a^2$. (If you scale an object <em>up</em> by
a factor of $a$, the probablity density on its surface goes <em>down</em> by a factor of $a^2$, because the
same total probability is now spread over an $a^2$-larger area.)</p>
<p>How about the transformation rule for dual bivectors? Well, we learned in part 1 that bivectors transform as
$\text{cofactor}(M)$; and in part 2, we found that dual vectors transform as the inverse transpose,
$M^{-T}$. It follows that dual bivectors transform as $\text{cofactor}\bigl(M^{-T}\bigr)$,
or equivalently $\bigl(\text{cofactor}(M)\bigr)^{-T}$. Startlingly, for 3×3 matrices these formulas
reduce to just
$$
\frac{M}{\det M}
$$
So, dual bivectors simply transform using $M$ divided by its own determinant.</p>
<h2 id="dual-trivectors"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#dual-trivectors" title="Permalink to this section">Dual Trivectors</a></h2>
<p>Follow the pattern: if a dual vector in 3D looks like a stack of parallel planes, and a dual bivector
looks like a field of parallel lines, then a dual <em>trivector</em> looks like a cloud of parallel <em>points</em>.
Well, drop the “parallel”—it doesn’t mean anything. It’s just uniformly spaced points.</p>
<div class="align-center"><figure >
<img src="http://reedbeta.com/blog/normals-inverse-transpose-part-3/3-form.png" alt="A dual trivector in 3D, visualized as a set of points formed as the intersections of the planes of three dual vectors" style="height:20em" title="A dual trivector in 3D, visualized as a set of points formed as the intersections of the planes of three dual vectors" />
<figcaption><p class="attribution"><a href="https://commons.wikimedia.org/wiki/File:N-form.svg">Maschen (Wikipedia)</a></p></figcaption>
</figure></div>
<p>As before, the wedge product of three dual vectors—or a dual vector and dual bivector—constructs
the continuous point cloud made of all the intersection points of the wedge factors. This quantity
scales as $1/a^3$ and represents a volume density. When you pair it with a trivector, the result
tells you how much of the point cloud is enclosed in that trivector’s volume.</p>
<p>The transformation rule for this one is easy—dual trivectors in 3D just get multiplied by $1/\det M$.</p>
<h2 id="a-few-more-topics"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#a-few-more-topics" title="Permalink to this section">A Few More Topics</a></h2>
<p>With the introduction of dual bi- and trivectors, our “scaling zoo” is now complete! We’ve got the
full ecosystem of vectorial quantities with scaling powers from −3 to +3, each with its proper units
and matching transformation formula.</p>
<p>In the rest of this section, I’ll quickly touch on a few more mathematical aspects of this extended
Grassmann algebra with dual spaces.</p>
<h3 id="the-interior-product"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#the-interior-product" title="Permalink to this section">The Interior Product</a></h3>
<p>As we saw in part 2, a vector space and its dual have a “natural pairing” operation, much like
an inner product, between vectors and dual vectors. This pairing
extends to $k$-vectors and their duals, too. In fact, we can further extend the natural pairing to
work between $k$-vectors and duals <em>of different grades</em>. For example, we can define a
way to “pair” a dual vector $w$ with a bivector $B = u \wedge v$, yielding a vector:
$$
\langle w, B \rangle = \langle w, u \rangle v - u \langle w, v \rangle
$$
Geometrically, the resulting vector lies in the plane of $B$, and runs parallel to the level planes
of $w$. In some sense, $w$ is “eating” the dimension of $B$ that lies along the direction of $w$’s
density, and leaving the leftover dimension behind as a vector.</p>
<p>This extended pairing operation is known as the <a href="https://en.wikipedia.org/wiki/Exterior_algebra#Interior_product">interior product</a>
or contraction product, although different references often define it in slightly different ways
(there are various conventions in the literature). I’m not going to go into it too deeply.
The key point is that you can combine a $k$-vector with a dual $\ell$-vector, for any grades
$k$ and $\ell$; the result will be a $(k-\ell)$-vector, interpreting negative grades as duals.</p>
<h3 id="the-hodge-star"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#the-hodge-star" title="Permalink to this section">The Hodge Star</a></h3>
<p>In addition to the vector-space duality we’ve been talking about, Grassmann algebra contains another,
distinct notion of duality: Hodge duality, represented by the Hodge star operator, $\star$. (Note
that this is a different symbol from the asterisk $*$ used for the dual vector space!)</p>
<p>The vector-space notion of duality relates $k$-vectors to duals of <em>equal grade</em>—vectors to dual
vectors, bivectors to dual bivectors, and so on. Hodge duality, however, connects things to duals of a
complementary grade. Applying the Hodge star to a $k$-vector produces an element of grade $n - k$,
where $n$ is the dimension of space. In 3D, it interchanges vectors (grade 1) with bivectors (grade 2),
and scalars (grade 0) with trivectors (grade 3).</p>
<p>The way I’ll define the Hodge star initially is a bit different than the standard way. In fact,
there are actually <em>two</em> Hodge star operations: one that goes from $k$-vectors to dual $(n-k)$-vectors,
and another that goes the other way. I’ll denote these by $\star$ and $-\star$ respectively. The
two are inverses of each other (in 3D, at least). They’re defined as follows:
$$
\begin{aligned}
\star&: \textstyle\bigwedge^k V \to \textstyle\bigwedge^{n-k}V^* &&:
& v^\star &= \langle {\bf e_{xyz}^*}, v \rangle \\
-\star&: \textstyle\bigwedge^k V^* \to \textstyle\bigwedge^{n-k}V &&:
& v^{-\star} &= \langle v, {\bf e_{xyz}} \rangle
\end{aligned}
$$
The angle brackets on the right here are the interior product. What we’re saying is: to do
the Hodge star on a $k$-vector, we take its interior product with ${\bf e_{xyz}^*}$, the standard
unit dual trivector (or, in $n$ dimensions, the unit dual $n$-vector). This results in a dual
$(n-k)$-vector, which geometrically represents a density over all the dimensions <em>not</em> included in
the original $k$-vector.</p>
<p>Conversely, to do the anti-Hodge-star on a dual $k$-vector, we take its interior product with
${\bf e_{xyz}}$, giving an $(n-k)$-vector containing all the dimensions <em>not</em> represented by the
original dual $k$-vector, i.e. all the dimensions perpendicular to its level sets.</p>
<p>(These two operations are <em>almost</em> defined on disjoint domains, and could therefore be combined into
one “smart” star that automatically knows what to do based on the type of its argument…except for
the $k = 0$ case: when you hodge a scalar, does it go to a trivector, or to a dual trivector? Both
are possible; that’s why we need two distinct operations here.)</p>
<p>For 3D geometry, the interesting cases are vectors interchanging with bivectors:</p>
<ul>
<li>A vector $v$ hodges to a dual bivector whose “field lines” run parallel to $v$.</li>
<li>A bivector $B$ hodges to a dual vector whose level planes are parallel to $B$.</li>
<li>A dual vector $w$ unhodges to a bivector parallel to $w$’s level planes.</li>
<li>A dual bivector $D$ unhodges to a vector parallel to $D$’s field lines.</li>
</ul>
<p>Although the formal definition was somewhat involved, you can see that the geometric result of the
Hodge operations is actually pretty simple. It’s all about swapping between the geometry of a
$k$-vector and the corresponding level-set geometry of a dual $(n-k)$-vector. The Hodge stars are
a very useful tool for working with Grassmann and dual-Grassmann quantities in practice.</p>
<h3 id="the-inner-product-or-forgetting-about-duals"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#the-inner-product-or-forgetting-about-duals" title="Permalink to this section">The Inner Product, or Forgetting About Duals</a></h3>
<p>In most treatments of Grassmann or geometric algebra, dual spaces are hardly mentioned. The more conventional
definition of the Hodge star has it mapping directly between $k$-vectors and $(n-k)$-vectors—no
duals in sight. How does this work?</p>
<p>It turns out that if we have an inner product defined on our vector space, we can use it to
convert back and forth between vectors and dual vectors, or $k$-vectors and their duals.</p>
<p>So far, we haven’t discussed any means of mapping individual vectors back and forth between the base and
dual spaces. Although they’re both vector spaces of the same dimension, there’s no natural isomorphism
that would enable us to map them in a non-arbitrary way. However, the presence of an inner
product does pick out a specific isomorphism with the dual space: that which maps each vector $v$ to
a dual vector $v^*$ that implements <em>dotting with $v$</em>, using the inner product.</p>
<p>Symbolically, for all vectors $u \in V$, we have $\langle v^*, u \rangle = v \cdot u$. This can be
extended to inner products and isomorphisms for all $k$-vectors as well (see
<a href="https://en.wikipedia.org/wiki/Exterior_algebra#Inner_product">Wikipedia</a> for details).</p>
<p>Note, however, that this map is <em>not</em> preserved by scaling, or by transformations in general, because
$v^*$ transforms as $M^{-T}$ while $v$ transforms as $M$.</p>
<p>With this correspondence, it becomes possible to largely ignore the existence of dual spaces and dual
elements altogether—we have the fiction that they’re not distinct from the base elements. In an
orthonormal basis, even the <em>coordinates</em> of a vector and its corresponding dual will be identical.</p>
<p>For an example of “forgetting” about duals: the Hodge star operations can be defined using the inner
product to invisibly dualize their input or output as well as hodging it. Then the two Hodge stars I
defined above collapse into one operation, mapping between $\bigwedge^k V$ and $\bigwedge^{n-k} V$.</p>
<h2 id="whats-the-use-of-all-this"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#whats-the-use-of-all-this" title="Permalink to this section">What’s The Use of All This?</a></h2>
<p>This is kind of a lot. We started with just vectors and normal vectors—two kinds of vector-shaped
things with different rules, which was confusing enough. But now we have <em>four</em>: vectors, dual vectors,
bivectors, and dual bivectors. And on top of that we have three scalar-shaped things, too: true
unitless scalars, trivectors, and dual trivectors.</p>
<p>Evidently, lots of people manage to get along well enough without being totally aware of
all these distinctions! Even texts on Grassmann or geometric algebra may not fully delve into
the “duals” story, instead treating $k$-vectors and their duals as the same thing (implicitly using
the isomorphism defined above). Their differing transformation behavior becomes sort of a curiosity,
an unsystematic ornamental detail. And this comes at the cost of making some aspects of the algebra
require an inner product or a metric, and only work properly in an orthonormal basis. In contrast,
when you’re “cooking with duals”, you can derive formulas that work properly in any basis.</p>
<p>As a quick example of this, let’s look at a concrete problem you might encounter in graphics. Let’s
say you have a triangle mesh and you want to select a random point on it, chosen uniformly over the
surface area. To do this, we must first select a random triangle, with probability
proportional to area. The standard technique is to precompute the areas of all the triangles
and build a prefix-sum table; then, to select a triangle, we take a uniform random value and
binary-search on it in the table.</p>
<p>Let’s throw in another wrinkle, though. What if the triangle mesh is transformed—possibly by a
nonuniform scaling, or a shear? In general, this will alter the areas of all the triangles, in an
orientation-dependent way. A uniform distribution over surface area in the mesh’s <em>local</em> space
will no longer be uniform in world space. We could address this by pre-transforming the whole mesh
into world space and doing the sampling process there—but that’s more expensive than necessary.</p>
<p>We can use bivectors to help. Instead of calculating just a scalar area for each triangle, calculate
the bivector representing its orientation and area. (If the triangle’s vertices are $p_1, p_2, p_3$,
this is $\tfrac{1}{2}(p_2 - p_1) \wedge (p_3 - p_1)$.) Now we can transform all the bivectors into
world space, using their transformation rule, and they will accurately represent the areas of the
transformed triangles. Then we can calculate their magnitudes and build the prefix-sum table, as
before.</p>
<p>Conversely, suppose we have an existing, non-uniform areal probability measure defined over our
triangle mesh. (Maybe it’s a light source with a texture defining its emissive brightness, and we
want to sample with respect to emitted power; or maybe we want to sample with respect to solid angle
subtended at some point, or some sort of visual importance, etc.) We can represent these
probability densities as dual bivectors, and again we can take them back and forth between local and
world space—even in the presence of shear or nonuniform scaling—with confidence that we’re still
representing the same distribution.</p>
<p>Some other examples where dual $k$-vectors show up:</p>
<ul>
<li>The derivative (gradient) of a scalar field, such as an SDF, is naturally a dual vector.</li>
<li>Dual vectors represent spatial frequencies (wavevectors) in Fourier analysis.</li>
<li>The radiance carried by a ray is a density with respect to projected area, and can therefore be
represented, at least in part, as a dual bivector.</li>
</ul>
<p>Like many theoretical math concepts, I think these ideas are mostly useful for enriching your own
mental models of geometry, strengthening your thought process, and deriving results that you can
then use in code in a more “conventional” way. I’m not <em>necessarily</em> suggesting we should
all go off and start implementing $k$-vectors and their duals as classes in our math libraries.
(Frankly, our math libraries are <a href="/blog/on-vector-math-libraries/">enough of a mess already</a>.)</p>
<h2 id="organizing-the-zoo"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#organizing-the-zoo" title="Permalink to this section">Organizing the Zoo</a></h2>
<p>One more thing to muse on before I leave you. We’ve seen that there is a “scaling zoo” of mathematical
elements with different physical, geometric interpretations and behaviors. Different branches of
science and math have distinct ways of conceptually organizing this zoo, and thinking about its
denizens and their relationships.</p>
<p>In computer science, for example, we would probably understand vectors, bivectors, dual vectors, and
so forth as different <em>types</em>. Each might have an internal structure as a composition of more elementary
values (real numbers), and a suite of allowed operations that define what you can do with them and
how they interact with one another.</p>
<p>Physicists, meanwhile, tend to take a more rough-and-ready approach: geometric elements are
thought of as simply matrices of real (or sometimes complex) numbers, together with <em>transformation
laws</em>—rules that define what happens to a given matrix under a change of coordinates. Algebraic
properties such as anticommutativity are obtained by constructing the matrices in such a way that matrix
multiplication implements the desired algebra. For example, a bivector can be represented as an
antisymmetric matrix; wedging two vectors $u, v$ to make a bivector corresponds to calculating the matrix
$$
uv^T - vu^T
$$
which has the same anticommutative property as a wedge product. Multiplying this matrix by a (dual)
vector $w$ then represents the interior product of the bivector with $w$. Meanwhile, a dual
bivector would be structurally similar, but have a different transformation law (“covariant” versus
“contravariant”).</p>
<p>Lastly, mathematicians like to formalize things by saying that different geometric
quantities are elements of different <em>spaces</em> and/or <em>algebras</em>. Both terms ultimately mean a
set (in the mathematical sense), together with some extra structure—such as algebraic operations,
a topology, a norm or metric, and so on—defined on top of the bare set. The exact kind of structures
you need depends on what you’re doing, and there’s a whole menagerie of such structures that
might be invoked in different contexts.</p>
<p>So which structure is behind the scaling zoo? We know we’ve got the vector space structure, and the
Grassmann algebraic structure. But neither of these fully accounts for the different scaling and transformation
behaviors of dual elements: dual spaces are isomorphic to their base spaces (in finite dimensions),
totally identical insofar as the vector and Grassmann structures are concerned.</p>
<p>I don’t have a fully developed answer yet—but I suspect it’s got to do with the <a href="https://en.wikipedia.org/wiki/Representation_of_a_Lie_group">representation theory of Lie groups</a>.
My guess is that the different types of scaling elements we’ve seen can be codified as vector spaces
acted on by different representations of $GL(n)$, the Lie group of all linear maps on $\Bbb R^n$.
But I’m not going to get into that here. (If you’d like to read more on this, here are
a couple web references: <a href="https://sbseminar.wordpress.com/2008/07/08/how-to-write-down-the-representations-of-gl_n/">one</a>,
<a href="http://www.maths.qmul.ac.uk/~whitty/LSBU/MathsStudyGroup/SeligGLn.pdf">two</a>. Also: <a href="http://inference-review.com/article/woits-way">Peter Woit’s book</a> on the role of representation theory in
particle physics.)</p>
<h2 id="conclusion"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-3/#conclusion" title="Permalink to this section">Conclusion</a></h2>
<p>I hope this has been an entertaining and enlightening tour through some of the layers
beneath the surface of your favorite Euclidean geometry. We started with a seemingly simple
question—why do normal vectors transform using the inverse transpose matrix?—and found
that there was <em>much</em> more rich structure there than meets the eye.</p>
<p>The “scaling zoo” of $k$-vectors and their duals makes a pleasingly complete and symmetrical whole.
Even if I’m not going to be employing these things in practical work every day, I feel that studying
them has helped me understand some things that were vague and foggy in my mind before. It’s worth
appreciating that these subtle distinctions exist. One of my general axioms in life is that
everything is more complicated than it first appears, and nowhere is this more consummately borne
out than mathematics!</p>Normals and the Inverse Transpose, Part 2: Dual Spaces
http://reedbeta.com/blog/normals-inverse-transpose-part-2/
http://reedbeta.com/blog/normals-inverse-transpose-part-2/Nathan ReedSat, 19 May 2018 16:13:37 -0700http://reedbeta.com/blog/normals-inverse-transpose-part-2/#commentsGraphicsMath<p>In the <a href="/blog/normals-inverse-transpose-part-1/">first part</a> of this series, we learned about
Grassmann algebra, and concluded that normal vectors in 3D can be interpreted as bivectors. To
transform bivectors, we need to use a different matrix (in general) than the one that transforms
ordinary vectors. Using a canonical basis for bivectors, we found that the matrix required is the <em>cofactor
matrix</em>, which is proportional to the inverse transpose. This provides at least a partial
explanation of why the inverse transpose is used to transform normal vectors.</p>
<p>However, we also left a few loose ends untied. We found out about the cofactor matrix,
but we didn’t really see how that connects to the
<a href="https://computergraphics.stackexchange.com/a/1506/48">algebraic derivation</a> that transforming a
plane equation $N \cdot x + d = 0$ involves the inverse transpose. I just sort of handwaved the
proportionality between the two.</p>
<!--more-->
<p>Moreover, we saw that Grassmann $k$-vectors provide vectorial geometric objects with a natural
interpretation as carrying units of length, area, and volume, owing to their scaling behavior. But
we didn’t find anything similar for densities—units of <em>inverse</em> length, area, or volume.</p>
<p>As we’ll see in this article, there’s one more geometric concept we need to complete the
picture. Putting this new concept together with the Grassmann algebra we’ve already learned will
turn out to clarify and resolve these remaining issues.</p>
<p>Without further ado, let’s dive in!</p>
<div class="toc">
<ul>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#functions-as-vectors">Functions As Vectors</a></li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#linear-forms-and-the-dual-space">Linear Forms and the Dual Space</a><ul>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#the-natural-pairing">The Natural Pairing</a></li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#the-dual-basis">The Dual Basis</a></li>
</ul>
</li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#transforming-dual-vectors">Transforming Dual Vectors</a><ul>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#uniform-scaling">Uniform Scaling</a></li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#sheared-dual-vectors-and-the-inverse-transpose">Sheared Dual Vectors and the Inverse Transpose</a></li>
</ul>
</li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#so-whats-a-normal-vector-anyway">So What’s a Normal Vector, Anyway?</a></li>
</ul>
</div>
<h2 id="functions-as-vectors"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#functions-as-vectors" title="Permalink to this section">Functions As Vectors</a></h2>
<p>Most of this article will be concerned with functions taking and returning vectors of various kinds.
To understand what follows, it’s necessary to make a bit of a mental flip, which you
might find quite counterintuitive if you haven’t encountered it before.</p>
<p>The flip is this: <strong>functions that map into a vector space <em>are themselves</em> vectors</strong>.</p>
<p>That statement might not appear to make any sense at first! Vectors and functions are totally
different kinds of things, right, like apples and…chairs? How can a function literally <em>be</em> a
vector?</p>
<p>If you look up the <a href="https://en.wikipedia.org/wiki/Vector_space#Definition">technical definition of a vector space</a>,
you’ll find that it’s quite nonspecific about what the <em>structure</em> of a vector has to be. We often
think of them as arrows with a magnitude and direction, or as ordered lists of numbers (coordinates).
However, all you truly need for a vector space is a set of <em>things</em> that support two basic operations:
being added together, and being multiplied by scalars (here, real numbers). These operations just
need to obey a few reasonable axioms.</p>
<p>Well, functions can be added together! If we have two functions $f$ and $g$, we can add them
<em>pointwise</em> to produce a new function $h$, defined by $h(x) = f(x) + g(x)$ for every point $x$ in
the domain. Likewise, we can multiply a function pointwise by a scalar: $g(x) = a \cdot f(x)$.
These operations do satisfy the vector space axioms, and therefore any set of compatible
functions forms a vector space in its own right: a <em>function space</em>.</p>
<p>To put it a bit more formally: given a domain set $X$ (any kind of set, not necessarily a vector
space itself) and a range vector space $V$, the set of functions $f: X \to V$ forms a vector space
under pointwise addition and scalar multiplication. You need the range to be a vector space so you
can add and multiply the outputs of the functions, but the domain isn’t required to be a vector space—or
even a “space” per se at all; it could be a discrete set.</p>
<p>This realization that functions can be treated as vectors then lets us apply linear-algebra techniques
to understand and work with functions—a large branch of mathematics called
<a href="https://en.wikipedia.org/wiki/Functional_analysis">functional analysis</a>.</p>
<h2 id="linear-forms-and-the-dual-space"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#linear-forms-and-the-dual-space" title="Permalink to this section">Linear Forms and the Dual Space</a></h2>
<p>From this point forward, we’ll be concerned with a specific class of functions known as
<a href="https://en.wikipedia.org/wiki/Linear_form"><strong>linear forms</strong></a>.</p>
<p>If we have some vector space $V$ (such as 3D space $\Bbb R^3$, for instance), then a linear form
on $V$ is defined as a linear function $f: V \to \Bbb R$. That is, it’s a linear
function that takes a vector argument and returns a scalar.</p>
<p><em>(A note for the mathematicians: in this article I’m only talking about finite-dimensional vector
spaces over $\Bbb R$, so I may occasionally make a statement that doesn’t hold for general vector
spaces. Sorry!)</em></p>
<p>I like to visualize a linear form as a set of parallel, uniformly spaced planes (3D) or lines (2D): the
<a href="https://en.wikipedia.org/wiki/Level_set">level sets</a> of
the function at intervals of one unit in the output. Here are some examples:</p>
<p class="image-array"><img alt="A linear form, x + y" src="http://reedbeta.com/blog/normals-inverse-transpose-part-2/linear-form-1.png" style="width:14em" title="A linear form, x + y" />
<img alt="A linear form, −⅓x + ½y" src="http://reedbeta.com/blog/normals-inverse-transpose-part-2/linear-form-2.png" style="width:14em" title="A linear form, −⅓x + ½y" />
<img alt="A linear form, 2x + y" src="http://reedbeta.com/blog/normals-inverse-transpose-part-2/linear-form-3.png" style="width:14em" title="A linear form, 2x + y" /></p>
<p>The gradients here indicate the linear form’s orientation—the function is increasing with the
gradient’s opacity; the discrete lines mark where its output crosses an integer, and the opacity
wraps around to zero. Note that “bigger”
linear forms (in the sense of bigger output values) have more tightly-spaced lines, and vice versa.</p>
<p>As elaborated in the previous section, linear forms on a given vector space can themselves be treated as
vectors, in their own function space. Linear combinations of linear
functions are still linear, so they do form a closed vector space in their own right.</p>
<p>This vector space—the set of all linear forms on $V$—is important enough that
it has its own name: the <a href="https://en.wikipedia.org/wiki/Dual_space"><strong>dual space</strong></a> of $V$. It’s
denoted $V^*$. The elements of the dual space (the linear forms) are then called <strong>dual vectors</strong>,
or sometimes <em>covectors</em>.</p>
<h3 id="the-natural-pairing"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#the-natural-pairing" title="Permalink to this section">The Natural Pairing</a></h3>
<p>The fact that dual vectors are <em>linear</em> functions, and not general functions from $V$ to
$\Bbb R$, strongly restricts their behavior. Linear forms on an $n$-dimensional
vector space have only $n$ degrees of freedom, versus the infinite degrees of freedom that a
general function has. To put it another way, $V^*$ has the same dimensionality as $V$.</p>
<p>To see this more concretely: a linear form on $\Bbb R^n$ can be
fully specified by the values it returns when you evaluate it on the $n$ vectors of a basis.
The result it returns for any <em>other</em> vector can then be derived by linearity. For
example, if $f$ is a linear form on $\Bbb R^3$, and $v = (x, y, z)$ is an arbitrary vector, then:
$$
\begin{aligned}
f(v) &= f(x {\bf e_x} + y {\bf e_y} + z {\bf e_z}) \\
&= x \, f({\bf e_x}) + y \, f({\bf e_y}) + z \, f({\bf e_z})
\end{aligned}
$$
If you’re thinking that the above looks awfully like a dot product between $(x, y, z)$ and
$\bigl(f({\bf e_x}), f({\bf e_y}), f({\bf e_z}) \bigr)$—you’re right!</p>
<p>Indeed, the operation of evaluating a linear form has the properties of a <em>product</em> between the dual space and
the base vector space: $V^* \times V \to \Bbb R$. This product is called the <strong>natural pairing</strong>.</p>
<p>Like the vector dot product, the natural pairing results in a real number, and is bilinear—linear
on both sides. However, here we’re taking a product not of two vectors, but of a dual vector with
a “plain” vector. The linearity on the left side comes from pointwise adding/multiplying linear forms;
that on the right comes from the linear forms being, well, linear in their vector argument.</p>
<p>Going forward, I’ll denote the natural pairing by angle brackets, like this: $\langle w, v \rangle$.
Here $w$ is a dual vector in $V^*$, and $v$ is a vector in $V$. To reiterate, this is simply
<em>evaluating</em> the linear form $w$, as a function, on the vector $v$. But because functions are vectors,
and dual vectors in particular are <em>linear</em> functions, this operation also has the properties of a product.</p>
<p>The above equation looks like this in angle-bracket notation:
$$
\begin{aligned}
\langle w, v \rangle
&= \bigl\langle w, \, x {\bf e_x} + y {\bf e_y} + z {\bf e_z} \bigr\rangle \\
&= x \langle w, {\bf e_x} \rangle + y \langle w, {\bf e_y} \rangle + z \langle w, {\bf e_z} \rangle
\end{aligned}
$$
Note how this now looks like “just” an application of the distributive property—which it is!</p>
<h3 id="the-dual-basis"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#the-dual-basis" title="Permalink to this section">The Dual Basis</a></h3>
<p>The above construction can also be used to define a canonical basis for $V^*$, for a given basis
on $V$. Namely, we want to make the numbers $\langle w, {\bf e_x} \rangle, \langle w, {\bf e_y} \rangle, \langle w, {\bf e_z} \rangle$
be the <em>coordinates</em> of $w$ with respect to this basis, the same way that $x, y, z$ are coordinates
with respect to $V$’s basis. We can do this by defining <strong>dual basis vectors</strong> ${\bf e_x^*}, {\bf e_y^*}, {\bf e_z^*}$,
according to the following constraints:
$$
\begin{aligned}
\langle {\bf e_x^*}, {\bf e_x} \rangle &= 1 \\
\langle {\bf e_x^*}, {\bf e_y} \rangle &= 0 \\
\langle {\bf e_x^*}, {\bf e_z} \rangle &= 0
\end{aligned}
$$
and similarly for ${\bf e_y^*}, {\bf e_z^*}$. The nine total constraints can be summarized as:
$$
\langle {\bf e}_i^*, {\bf e}_j \rangle =
\begin{cases}
1 & \text{if } i = j, \\
0 & \text{if } i \neq j,
\end{cases}
\quad i, j \in \{ {\bf x, y, z} \}
$$
This dual basis always exists and is unique, given a valid basis on $V$ to start from.</p>
<p>Geometrically speaking, the dual basis consists of linear forms that measure the distance along
each axis—but the level sets of those linear forms are parallel to <em>all the other axes</em>. They’re
not necessarily perpendicular to the same axis that they’re measuring, unless the basis happens to be
orthonormal. This feature will be important a bit later!</p>
<p>By way of example, here are a couple of vector bases together with their corresponding dual bases:</p>
<p class="image-array"><img alt="An orthonormal basis and its corresponding dual basis" src="http://reedbeta.com/blog/normals-inverse-transpose-part-2/dual-basis-1.png" style="width:20em;padding:0 10pt" title="An orthonormal basis and its corresponding dual basis" />
<img alt="An non-orthonormal basis and its corresponding dual basis" src="http://reedbeta.com/blog/normals-inverse-transpose-part-2/dual-basis-2.png" style="width:20em;padding:0 10pt" title="An non-orthonormal basis and its corresponding dual basis" /></p>
<p>Here’s an example of a linear form decomposed into basis components, $w = p {\bf e_x^*} + q {\bf e_y^*}$:</p>
<p><img alt="A linear form as a sum of x and y basis components" src="http://reedbeta.com/blog/normals-inverse-transpose-part-2/dual-basis-3.png" style="width:20em" title="A linear form as a sum of x and y basis components" /></p>
<p>With the dual basis defined as above, if we express both a dual vector $w$ and a vector $v$ in terms
of their respective bases,
then the natural pairing $\langle w, v \rangle$ boils down to just the dot product of the
respective coordinates:
$$
\begin{aligned}
\langle w, v \rangle
&= \bigl\langle p {\bf e_x^*} + q {\bf e_y^*} + r {\bf e_z^*}, \; x {\bf e_x} + y {\bf e_y} + z {\bf e_z} \bigr\rangle \\
&= px + qy + rz
\end{aligned}
$$</p>
<h2 id="transforming-dual-vectors"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#transforming-dual-vectors" title="Permalink to this section">Transforming Dual Vectors</a></h2>
<p>In the preceding article, we learned that although vectors and <em>bivectors</em> may appear
structurally similar (they both have three components, in 3D space), they have different geometric
meanings and different behavior when subject to transformations—in particular, to scaling.</p>
<p>With dual vectors, we have a third example in this class! Dual vectors are again
“vectorial” objects (obeying the vector space axioms), again structurally similar to vectors and
bivectors (having three components, in 3D space), but with a different geometric meaning (linear
forms). This immediately suggests we look into dual vectors’ transformation behavior!</p>
<p>Dual vectors are linear forms, which are functions. So how do we transform a function?</p>
<p>The way I like to think about this is that the function’s output values are carried along with the
points of its domain when they’re transformed. Imagine labeling every point in the domain with the
function’s value at that point. Then apply the transformation to all the points; they move somewhere
else, but carry their label along with them. (Another way of thinking about it is that you’re
transforming the <em>graph</em> of the function, considered as a point-set in a one-higher-dimensional
space.)</p>
<p>To formalize this a bit more: suppose we transform vectors by some matrix $M$, and we want to
apply this transformation also to a function $f(v)$, yielding a new function $g(v)$. What we want is
that $g$ evaluated on a <em>transformed</em> vector should equal $f$ evaluated on the original vector:
$$
g(Mv) = f(v)
$$
Or, equivalently,
$$
g(v) = f(M^{-1}v)
$$
In other words, we can apply a transformation to a function by making a new function that first
applies the <em>inverse</em> transformation to its argument, then passes that to the old function.</p>
<p>Note that this only works if $M$ is invertible. If it isn’t, then our picture of “carrying the output
values along with the domain points” falls apart: a noninvertible $M$ can collapse many distinct domain
points into one, and then how could we decide what the function’s output should be at those points?</p>
<h3 id="uniform-scaling"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#uniform-scaling" title="Permalink to this section">Uniform Scaling</a></h3>
<p>Now that we understand how to apply a transformation to a function, let’s look at uniform scaling
as an example. We’ll scale by a factor $a > 0$, so that vectors transform as $v \mapsto av$. Then
functions will transform as $f(v) \mapsto f(v/a)$, per the previous section.</p>
<p>Let’s switch back to looking at this from a “dual vector” point of view instead of a
“function” point of view.
So, if $f(v) = \langle w, v \rangle$ for some dual vector $w$, then what happens when we scale by $a$?
$$
\begin{aligned}
\langle w, v \rangle \mapsto & \left\langle w, \frac{v}{a} \right\rangle \\
= & \left\langle \frac{w}{a}, v \right\rangle
\end{aligned}
$$
I’ve just moved the $1/a$ factor from one side of the angle brackets to the other, which is allowed
because it’s a bilinear operation. To summarize, we’ve found that the dual vector $w$ transforms as:
$$
w \mapsto \frac{w}{a}
$$</p>
<p>Hmm, interesting! When we scale vectors by $a$, then <strong>dual vectors scale by $\bm{1/a}$</strong>.
If you recall the previous article, we justified assigning units like “area” and “volume” to bivectors
and trivectors on the basis of their scaling behavior. Following that line of reasoning, we can now
conclude that <strong>dual vectors carry units of inverse length!</strong> </p>
<p>In fact, dual vectors represent <em>oriented linear densities</em>. They provide a quantitative way
of talking about situations where some kind of scalar “stuff”—such as probability, texel count,
opacity, a change in voltage/temperature/pressure, etc.—is spread out along one dimension in space. When you
pair the dual vector with a vector (i.e. evaluate the linear form on a vector), you’re asking “how
much of that ‘stuff’ does this vector span?”</p>
<p>Under a scaling, we want to preserve the amount of “stuff”. If we’re scaling <em>up</em>, then the density
of “stuff” will need to go <em>down</em>, as the same amount of stuff is now spread over a longer distance;
and vice versa. This property is implemented by the inverse scaling behavior of dual vectors.</p>
<h3 id="sheared-dual-vectors-and-the-inverse-transpose"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#sheared-dual-vectors-and-the-inverse-transpose" title="Permalink to this section">Sheared Dual Vectors and the Inverse Transpose</a></h3>
<p>We’ve seen how uniform scaling applies inversely to dual vectors. We could study nonuniform scaling
now, too, but it turns out that axis-aligned nonuniform scaling isn’t that interesting—it just
applies inversely to each axis, as you might expect. It’ll be more illuminating at this point to
look at what happens with a <em>shear</em>.</p>
<p>I’ll stick to 2D for this one. As an example transformation, we’ll shear the $y$ axis toward $x$ a
little bit:
$$
M = \begin{bmatrix}
1 & \tfrac{1}{2} \\
0 & 1
\end{bmatrix}
$$
Here’s what it looks like:</p>
<p><img alt="The shear applied to a standard vector basis" src="http://reedbeta.com/blog/normals-inverse-transpose-part-2/dual-scaling-1.png" style="width:28em" title="The shear applied to a standard vector basis" /></p>
<p>When we perform this transformation on a dual vector, what happens? When you look at it
visually, it’s pretty straightforward—the level sets (isolines) of the linear form will tilt to
follow the shear.</p>
<p><img alt="Animation of a linear form shearing" src="http://reedbeta.com/blog/normals-inverse-transpose-part-2/dual-scaling-2.gif" style="width:20em" title="Animation of a linear form shearing" /></p>
<p>But how do we express this as a matrix acting on the dual vector’s coordinates? Let’s focus on the
$\bf e_x^*$ component. Note that our transformation $M$ doesn’t affect the $x$-axis—it maps
$\bf e_x$ to itself. But what about $\bf e_x^*$?</p>
<p><img alt="Animation of eₓ* shearing" src="http://reedbeta.com/blog/normals-inverse-transpose-part-2/dual-scaling-3.gif" style="width:20em" title="Animation of eₓ* shearing" /></p>
<p>The $\bf e_x^*$ component of a dual vector <em>does</em> change under this transformation, because
the isolines pick up the shear! Or, to put it another way:
although distances along the $x$ axis (which $\bf e_x^*$ measures) don’t change here,
$\bf e_x^*$ still cares about what the other axes are doing because <em>it has to stay parallel to them</em>.
That’s one of the defining conditions for the dual basis to do its job.</p>
<p>In particular, we have that $\bf e_x^*$ maps to ${\bf e_x^*} - \tfrac{1}{2}{\bf e_y^*}$.
If we work it out the rest of the way, the full matrix that applies to the coordinates of
a dual vector is:
$$
\begin{bmatrix}
1 & 0 \\
-\tfrac{1}{2} & 1
\end{bmatrix}
$$
This is the inverse transpose of $M$!</p>
<p>We can loosely relate the effect of the inverse transpose here to that of the cofactor
matrix for bivectors, as seen in the preceding article. Like a bivector, each dual
basis element cares about what’s happening to the <em>other</em> axes (because it needs to keep parallel
to them)—but it also must scale inversely along its <em>own</em> axis. The determinant of $M$
gives the cumulative scaling along <em>all</em> the axes:
$$
\det M = \text{scaling on my axis} \cdot \text{scaling on other axes}
$$
We can algebraically rearrange this to:
$$
\frac{1}{\text{scaling on my axis}} = \frac{1}{\det M} \cdot \text{scaling on other axes}
$$
This matches the relation between the inverse transpose and the cofactor matrix.
$$
M^{-T} = \frac{1}{\det M} \cdot \text{cofactor}(M)
$$
I’m handwaving a lot here—a detailed geometric demonstration would take us off
into the weeds—but hopefully this gives at least a little bit of intuition for why the inverse
transpose matrix is the right thing to use for dual vectors.</p>
<h2 id="so-whats-a-normal-vector-anyway"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-2/#so-whats-a-normal-vector-anyway" title="Permalink to this section">So What’s a Normal Vector, Anyway?</a></h2>
<p>As we’ve seen, the level sets of a linear form are parallel lines in 2D, or planes in 3D. This implies
that we can define a plane by picking out a specific level set of a given dual vector:
$$
\langle w, v \rangle = d
$$
The dual vector $w$ is acting as a signed distance field for the plane.</p>
<p>We’ve also seen that when expressed in terms of matched basis-and-dual-basis components,
the natural pairing product $\langle w, v \rangle$ reduces to a dot product $w \cdot v$. And then
the above equation looks like the familiar plane equation:
$$
w \cdot v = d
$$
This shows that the dual vector’s coordinates with respect to the dual basis are <em>also</em> the coordinates
of a normal vector to the plane, in the standard vector basis.</p>
<p>So, normal vectors can be interpreted as dual vectors expressed in the dual basis, and that’s
why they transform with the inverse transpose!</p>
<p>But wait—in the last article, didn’t I just say that normal vectors should be interpreted as
bivectors, and therefore they transform with the cofactor matrix? Which one is it?</p>
<p>Ultimately, I don’t think there’s a definitive answer to this question! “Normal vector” as an idea is
a bit too vague—bivectors and dual vectors are <em>both</em> defensible ways to formalize the “normal vector”
concept. As we’ve seen, the way they transform is equivalent as far as <em>orientation</em>:
bivectors and dual vectors both transform to stay perpendicular to the plane they define, by either
$B \wedge v = d$ or $\langle w, v \rangle = d$, respectively. The difference between them is in
the units they carry and their scaling behavior: bivectors are areas, while dual vectors are inverse
lengths.</p>
<p>That’s all I have to say about transforming normal vectors! But we’ve got another question
still dangling. At the end of Part 1, I asked about vectorial quantities with negative scaling powers.
In dual vectors, we’ve now achieved scaling power −1. But what about −2 and −3? To find those,
we’re going to have to combine dual spaces with Grassmann algebra. We’ll do that in the
<a href="/blog/normals-inverse-transpose-part-3/">third and final part</a> of this series.</p>Normals and the Inverse Transpose, Part 1: Grassmann Algebra
http://reedbeta.com/blog/normals-inverse-transpose-part-1/
http://reedbeta.com/blog/normals-inverse-transpose-part-1/Nathan ReedSat, 07 Apr 2018 14:36:16 -0700http://reedbeta.com/blog/normals-inverse-transpose-part-1/#commentsGraphicsMath<p>A mysterious fact about linear transformations is that some of them, namely nonuniform scalings and
shears, make a puzzling distinction between “plain” vectors and normal vectors. When we transform
“plain” vectors with a matrix, we’re required to transform the normals with—for some
reason—the <em>inverse transpose</em> of that matrix. How are we to understand this?</p>
<p>It takes only <a href="https://computergraphics.stackexchange.com/a/1506/48">a bit of algebra</a> to show that
using the inverse transpose ensures that transformed normals will remain perpendicular to the
tangent planes they define. That’s fine as far as it goes, but it misses a deeper and more
interesting story about the geometry behind this—which I’ll explore over the next
few articles.</p>
<!--more-->
<div class="toc">
<ul>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#units-and-scaling">Units and Scaling</a></li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#grassmann-algebra">Grassmann Algebra</a><ul>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#basis-k-vectors">Basis $k$-Vectors</a></li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#the-wedge-product">The Wedge Product</a></li>
</ul>
</li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#transforming-k-vectors">Transforming $k$-Vectors</a><ul>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#bivectors-and-nonuniform-scaling">Bivectors and Nonuniform Scaling</a></li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#the-cofactor-matrix">The Cofactor Matrix</a></li>
</ul>
</li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#bivectors-and-normals">Bivectors and Normals</a></li>
<li><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#further-questions">Further Questions</a></li>
</ul>
</div>
<h2 id="units-and-scaling"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#units-and-scaling" title="Permalink to this section">Units and Scaling</a></h2>
<p>Before we dig into the meat of this article, though, let’s take a little apéritif. Consider
plain old <em>uniform</em> scaling (by the same factor along all axes). It’s hard to think
of a more innocuous transformation—it’s literally just multiplying all vectors by a scalar
constant.</p>
<p>But if we look carefully, there’s already something not quite trivial going on here.
Some quantities carry physical “dimensions” or “units”, like lengths, areas, and volumes. When we perform
a scaling transformation, these quantities are altered in a way that corresponds to their units.
Meanwhile, other quantities are “unitless” and don’t change under a scaling.</p>
<p>To be really explicit, let’s enumerate the possibilities for scaling behavior in 3D space.
Suppose we scale by a factor $a > 0$. Then:</p>
<ul>
<li><strong>Unitless numbers</strong> do not change—or in other words, they get multiplied by $a^0$.</li>
<li><strong>Lengths</strong> get multiplied by $a$.</li>
<li><strong>Areas</strong> get multiplied by $a^2$.</li>
<li><strong>Volumes</strong> get multiplied by $a^3$.</li>
</ul>
<p>And that’s not all—there are also <em>densities</em>, which vary inversely with the scale factor:</p>
<ul>
<li><strong>Linear densities</strong> get multiplied by $1/a$.</li>
<li><strong>Area densities</strong> get multiplied by $1/a^2$.</li>
<li><strong>Volume densities</strong> get multiplied by $1/a^3$.</li>
</ul>
<p>Think of things like “texels per length”, or “probability per area”, or “particles per volume”. If
you scale <em>up</em> a 3D model while keeping its textures the same size, then its texels-per-length
density goes <em>down</em>, and so on.</p>
<p>So, even restricting ourselves to just uniform scaling and looking at scalar (non-vector) values, we
already have a phenomenon where different quantities—which appear the same <em>structurally</em>, i.e.
they’re all just single-component scalars—are revealed to behave differently when a transformation
is applied, owing to the different units they carry. In particular, they carry different powers of
length, ranging from −3 to +3. A quantity with $k$ powers of length scales as $a^k$.</p>
<p>(We could also invent quantities that have scaling powers of ±4 or more, or even fractional scaling
powers. But I’ll leave those aside, as such things don’t have a strong geometric interpretation in 3D.)</p>
<p>Okay, maybe this is somehow reminiscent of the “plain vectors versus normals” thing. But how
does this work for vector quantities? How do nonuniform scalings affect this picture? And where does
the inverse transpose come into it? To really understand this, we’ll have to range farther into the
domains of math.</p>
<h2 id="grassmann-algebra"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#grassmann-algebra" title="Permalink to this section">Grassmann Algebra</a></h2>
<p>For the rest of this series, we’re going to be making use of
<a href="https://en.wikipedia.org/wiki/Exterior_algebra"><strong>Grassmann algebra</strong></a> (also called “exterior algebra”).
Since this is probably unfamiliar to many of my readers, I’ll give a pretty quick introduction to it.
For more background, see <a href="https://www.youtube.com/watch?v=WZApQkDBr5o">this talk by Eric Lengyel</a>,
or the first few chapters of Dorst et al’s <a href="http://www.geometricalgebra.net/"><em>Geometric Algebra for Computer Science</em></a>;
there are also many other references available around the web.</p>
<p>Grassmann algebra extends linear algebra to operate not just on vectors, but on additional “higher-grade”
geometric entities called <strong>bivectors</strong>, <strong>trivectors</strong>, and so on. These objects are collectively
known as <strong>$\bm k$-vectors</strong>, where $k$ is the <strong>grade</strong> or dimensionality of the object. They obey the same
mathematical rules as vectors do—they can be added together, and multiplied by scalars. However,
their geometric interpretation is different.</p>
<p>We often think of a vector as being sort of an abstract arrow—it has both a direction in space in which
the arrow points, and a magnitude, represented by the arrow’s length. A bivector is a lot like that,
but <em>planar</em> instead of linear. Instead of an arrow, it’s an abstract chunk of a flat surface.</p>
<p>Like vectors, bivectors also have directions, in the sense that a planar surface can face various
directions in space; and they have magnitudes, geometrically represented as the <em>area</em> of the
surface chunk. However, what they don’t have is a notion of <em>shape</em> within their plane. When you
picture a bivector as a piece of a plane, you’re free to imagine it as a square, a circle, a
parallelogram, or any funny shape you want, as long as it has the correct area.</p>
<div class="align-center"><figure >
<img src="http://reedbeta.com/blog/normals-inverse-transpose-part-1/grassmann-examples.png" alt="Diagram of geometric interpretation of vectors, bivectors, and trivectors" class="not-too-wide" title="Diagram of geometric interpretation of vectors, bivectors, and trivectors" />
<figcaption><p class="attribution"><a href="https://commons.wikimedia.org/wiki/File:N_vector_positive.svg">Maschen (Wikipedia)</a></p></figcaption>
</figure></div>
<p>Similarly, <em>trivectors</em> are three-dimensional vectorial quantities; they represent a chunk of space,
instead of a flat surface or an arrow. Again, they have no defined shape, only a magnitude—which
is now a <em>volume</em> instead of an area or length.</p>
<p>In 3D space, trivectors don’t really have a direction in a useful sense—or rather, there’s only
one possible direction, which is <em>parallel to space</em>. However, trivectors still come in two opposite
orientations, which we can denote as “positive” and “negative”, or alternatively “right-handed” and
“left-handed”. It’s much like how a vector can point either left or right along a 1D line, and we
can label those orientations as positive and negative if we like.</p>
<p>In higher dimensions, trivectors could also face different directions, as vectors and bivectors do.
Higher-dimensional spaces would even allow for quadvectors and higher grades. However, we’ll be
sticking to 3D for this series!</p>
<h3 id="basis-k-vectors"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#basis-k-vectors" title="Permalink to this section">Basis $k$-Vectors</a></h3>
<p>Just as you can break down a vector into components with respect to a basis, you can do the same with
bivectors and trivectors. When we write a vector $v$ in terms of coordinates, $v = (x, y, z)$,
what we’re really saying is that $v$ can be made up as a linear combination of basis vectors:
$$
v = x \, {\bf e_x} + y \, {\bf e_y} + z \, {\bf e_z}
$$
The basis vectors ${\bf e_x}, {\bf e_y}, {\bf e_z}$ can be taken as defining the direction
and scale of the coordinate $x, y, z$ axes. In the same way, a bivector $B$ can be formed from a
linear combination of <em>basis bivectors</em>:
$$
B = p \, {\bf e_{yz}} + q \, {\bf e_{zx}} + r \, {\bf e_{xy}}
$$
Here, ${\bf e_{xy}}$ would be a bivector of unit area oriented along the $xy$ plane, and
similarly for ${\bf e_{yz}}, {\bf e_{zx}}$. The basis bivectors correspond not to individual
coordinate axes, but to the planes spanned by <em>pairs</em> of axes. This defines “bivector coordinates”
$(p, q, r)$ by which we can identify or create any other bivector in the space.</p>
<p class="image-array"><img alt="A vector and its components along the axes" src="http://reedbeta.com/blog/normals-inverse-transpose-part-1/vector-components.png" style="height:20em;width:auto;padding:0 10pt" title="A vector and its components along the axes" />
<img alt="A bivector and its components along the axis planes" src="http://reedbeta.com/blog/normals-inverse-transpose-part-1/bivector-components.png" style="height:20em;width:auto;padding:0 10pt" title="A bivector and its components along the axis planes" /></p>
<p>The trivector case is less interesting:
$$
T = t \, {\bf e_{xyz}}
$$
As mentioned before, trivectors in 3D only have one possible direction, so they have only one basis
element: the unit trivector “along the $xyz$ space”, so to speak. All other trivectors are just some
scalar multiple of ${\bf e_{xyz}}$.</p>
<h3 id="the-wedge-product"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#the-wedge-product" title="Permalink to this section">The Wedge Product</a></h3>
<p>So, Grassmann algebra contains all these vector-like entities of different grades: ordinary vectors
(grade 1), bivectors (grade 2), and trivectors (grade 3). You can also think of plain old scalars
as being grade 0. Finally, to allow different grades to interoperate together, Grassmann algebra
defines an operation called the <strong>wedge product</strong>, or exterior product, denoted $\wedge$. This gives
you the ability to create a bivector by multiplying together two vectors. For example:
$$
{\bf e_x} \wedge {\bf e_y} = {\bf e_{xy}}
$$
In general, you can wedge any two vectors, and the result will be a bivector lying in the plane
spanned by those vectors; its magnitude will be the area of the parallelogram formed by the vectors
(like the cross product).</p>
<p>Note, however, that the bivector doesn’t “remember” the <em>specific</em> two vectors it was wedged from.
Any two vectors in the same plane, spanning a parallelogram of the same area (and orientation), will
generate the same bivector. A bivector can also be factored back into two vectors, but not uniquely.</p>
<p>You can also wedge together <em>three</em> vectors, or a bivector with a vector, to form a trivector.
$$
{\bf e_x} \wedge {\bf e_y} \wedge {\bf e_z} = {\bf e_{xy}} \wedge {\bf e_z} = {\bf e_{xyz}}
$$
This turns out to be equivalent to the “scalar triple product”, producing a trivector representing
the oriented volume of the parallelepiped formed by the three vectors.</p>
<p>The wedge product obeys most of the ordinary multiplication rules you know, such as associativity
and the distributive law. Scalar multiplication commutes with wedges—for scalar $a$, we have:
$$
(au) \wedge v = u \wedge (av) = a(u \wedge v)
$$
However, wedging two vectors together is <em>anticommutative</em> (again like the cross product). For
vectors $u, v$, we have:
$$
u \wedge v = -(v \wedge u)
$$
This has a few implications worth noting. First, any vector wedged with itself always gives zero:
$v \wedge v = 0$. Furthermore, any list of <em>linearly dependent</em> vectors, wedged together, will give
zero. For example, $u \wedge v = 0$ whenever $u$ and $v$ are collinear. In the case of three vectors,
$u \wedge v \wedge w = 0$ whenever $u, v, w$ are coplanar.</p>
<p>This also explains why grades beyond 3 don’t exist in 3D space. The wedge product of <em>four</em> 3D
vectors is always zero, because you can’t have four linearly independent vectors in 3D.</p>
<h2 id="transforming-k-vectors"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#transforming-k-vectors" title="Permalink to this section">Transforming $k$-Vectors</a></h2>
<p>Earlier, I asserted that you could think of the magnitude of a vector as a length,
that of a bivector as an area, and that of a trivector as a volume. But what justifies those
assignments of units to these quantities?</p>
<p>Earlier, we saw that lengths, areas, and volumes have distinct scaling behavior. Upon uniformly
scaling 3D space by a factor $a > 0$, lengths, areas, and volumes will scale as $a, a^2, a^3$,
respectively. We have the tools to see, now, that vectors, bivectors, and trivectors behave in the
same way.</p>
<p>Scaling a vector can be done by multiplying with the appropriate matrix:
$$
\begin{gathered}
v \mapsto Mv \\
\begin{bmatrix} x \\ y \\ z \end{bmatrix} \mapsto
\begin{bmatrix}
a & 0 & 0 \\
0 & a & 0 \\
0 & 0 & a
\end{bmatrix}
\begin{bmatrix} x \\ y \\ z \end{bmatrix}
= \begin{bmatrix} ax \\ ay \\ az \end{bmatrix} = av
\end{gathered}
$$
The vector $v$ as a whole, as well as its components $x, y, z$ and its scalar magnitude, all pick
up a factor $a$ upon scaling; so we can safely call them lengths. Hopefully, this is uncontroversial!</p>
<p>What about bivectors? To see how they behave under scaling (or any linear transformation), we can
turn to the wedge product. In 3D, any bivector can be factored as a wedge product of two vectors.
We already know how to transform vectors. Therefore, we can transform a bivector by transforming its
vector factors and re-wedging them:
$$
\begin{aligned}
B &= u \wedge v \\
(u \wedge v) &\mapsto (Mu) \wedge (Mv) \\
&= (au) \wedge (av) \\
&= a^2 (u \wedge v) \\
&= a^2 B
\end{aligned}
$$
Presto! Since the bivector has two vector factors, and each one scales by $a$, the bivector picks up
an overall factor of $a^2$, making it an area.</p>
<p>Trivectors too can be transformed by factoring them into vectors. It comes as no surprise to find
that their three vector factors give them an overall scaling of $a^3$. Just for completeness:
$$
\begin{aligned}
T &= (u \wedge v \wedge w) \\
(u \wedge v \wedge w) &\mapsto (Mu) \wedge (Mv) \wedge (Mw) \\
&= (au) \wedge (av) \wedge (aw) \\
&= a^3 (u \wedge v \wedge w) \\
&= a^3 T
\end{aligned}
$$</p>
<h3 id="bivectors-and-nonuniform-scaling"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#bivectors-and-nonuniform-scaling" title="Permalink to this section">Bivectors and Nonuniform Scaling</a></h3>
<p>Now, we can finally begin to address our original question. What complications come
into play when we start doing <em>nonuniform</em> scaling?</p>
<p>To investigate this, let’s study an example. We’ll scale by a factor of 3 along the $x$ axis,
leaving the other two axes alone. Our scaling matrix will therefore be:
$$
M = \begin{bmatrix}
3 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}
$$
For plain old vectors, this does the obvious thing: the $x$ component gets multiplied by 3, and the
$y, z$ components are unchanged. In general, this alters both the vector’s length and direction
in a way that depends on its initial direction—vectors close to the $x$ axis are going to be
stretched more, while vectors close to the $yz$ plane will be less affected.</p>
<p><img alt="Animation of a vector scaling along the x axis" src="http://reedbeta.com/blog/normals-inverse-transpose-part-1/vector-scaling.gif" style="height:20em;width:auto" title="Animation of a vector scaling along the x axis" /></p>
<p>What happens to a bivector when we perform this transformation? First, let’s just think about it
geometrically. A bivector represents a chunk of area with a particular planar facing direction. When
we stretch this out along the $x$ axis, we again expect both its direction and area to change. But
different bivectors will be affected differently: a bivector close to the $yz$ plane will again
be less affected by the scaling, while bivectors whose planes have a significant component along the
$x$ axis will be stretched more.</p>
<p><img alt="Animation of a bivector scaling along the x axis" src="http://reedbeta.com/blog/normals-inverse-transpose-part-1/bivector-scaling.gif" style="height:20em;width:auto" title="Animation of a bivector scaling along the x axis" /></p>
<p>Okay, now to the algebra. As we saw before, we can decompose any bivector $B$ into components
along axis-aligned basis bivectors:
$$
B = p \, {\bf e_{yz}} + q \, {\bf e_{zx}} + r \, {\bf e_{xy}}
$$
To apply our scaling $M$ to the bivector, we just need to see how $M$ affects the basis bivectors.
This can be done by factoring them into their component basis vectors and applying $M$ to those:
$$
\begin{aligned}
{\bf e_{yz}} = {\bf e_y} \wedge {\bf e_z}
\quad &\mapsto \quad (M{\bf e_y}) \wedge (M{\bf e_z}) = {\bf e_y} \wedge {\bf e_z} = {\bf e_{yz}} \\
{\bf e_{zx}} = {\bf e_z} \wedge {\bf e_x}
\quad &\mapsto \quad (M{\bf e_z}) \wedge (M{\bf e_x}) = {\bf e_z} \wedge 3{\bf e_x} = 3{\bf e_{zx}} \\
{\bf e_{xy}} = {\bf e_x} \wedge {\bf e_y}
\quad &\mapsto \quad (M{\bf e_x}) \wedge (M{\bf e_y}) = 3{\bf e_x} \wedge {\bf e_y} = 3{\bf e_{xy}}
\end{aligned}
$$
This matches the geometric intuition: $\bf e_{yz}$ didn’t change at all, while $\bf e_{zx}$
and $\bf e_{xy}$ both picked up a factor of 3 because their planes include the $x$ axis.</p>
<p>So, the overall effect of applying $M$ to the bivector $B$ is:
$$
B \mapsto p \, {\bf e_{yz}} + 3q \, {\bf e_{zx}} + 3r \, {\bf e_{xy}}
$$
Now, just as we would for a vector, we can also write out the transformation of $B$ as components
acted on by a matrix:
$$
\begin{bmatrix} p \\ q \\ r \end{bmatrix} \mapsto
\begin{bmatrix}
1 & 0 & 0 \\
0 & 3 & 0 \\
0 & 0 & 3
\end{bmatrix}
\begin{bmatrix} p \\ q \\ r \end{bmatrix}
= \begin{bmatrix} p \\ 3q \\ 3r \end{bmatrix}
$$
This is the same transformation we just derived, only written a different notation. But notice
something here: the matrix appearing in this equation is <em>not</em> the same matrix $M$ used to
transform vectors.</p>
<p>Apropos of nothing, I’m just going to mention that the inverse transpose of $M$ is <em>proportional</em>
to the matrix above:
$$
M^{-T} =
\begin{bmatrix}
\tfrac{1}{3} & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}
$$
HMM. 🤔🤔🤔</p>
<h3 id="the-cofactor-matrix"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#the-cofactor-matrix" title="Permalink to this section">The Cofactor Matrix</a></h3>
<p>In fact, the matrix we need for transforming bivectors is the
<a href="https://en.wikipedia.org/wiki/Minor_(linear_algebra)"><strong>cofactor matrix</strong></a> of $M$.</p>
<p>This is proportional to the inverse transpose by a factor of $\det M$. (The inverse of $M$ can be calculated as the
<a href="https://en.wikipedia.org/wiki/Invertible_matrix#Analytic_solution">transpose of its cofactor matrix</a>
divided by $\det M$.) Actually, the cofactor matrix is defined even when $M$ is noninvertible—a
nice property, since we <em>can</em> transform vectors using a noninvertible matrix, and we should be able
to do the same to bivectors!</p>
<p>Let’s take a closer look at why the cofactor matrix is the right thing. First
of all, what even is a “cofactor” here?</p>
<p>Each element of an $n \times n$ square matrix has a corresponding cofactor. The recipe for
calculating the cofactor of the element at row $i$, column $j$ is as follows:</p>
<ol>
<li>Start with the original $n \times n$ matrix, and delete the $i$th row and the $j$th column.
This reduces it to an $(n - 1) \times (n - 1)$ submatrix of all the remaining elements.</li>
<li>Calculate the determinant of this submatrix.</li>
<li>Multiply the determinant by $(-1)^{i+j}$, i.e. flip the sign if $i + j$ is odd. That’s the cofactor!</li>
</ol>
<p>Then, the cofactor <em>matrix</em> is just sticking all the cofactors into a new $n \times n$ matrix.</p>
<p>So how is it that this construction works to transform a bivector? Let’s look at the bivector’s first basis
component: $p \, {\bf e_{yz}}$. This term represents an area component in the $yz$ plane; as such,
it only cares what the transformation $M$ does to the $y$ and $z$ axes. Well, the recipe for the 1,1
cofactor of $M$ instructs us to extract the 2×2 submatrix that specifies what $M$ does to the $y$
and $z$ axes. Then we take its determinant, which is nothing but the factor by which area in the
$yz$ plane gets scaled!</p>
<p>Because of the way we chose our bivector basis—${\bf e_{yz}}, {\bf e_{zx}}, {\bf e_{xy}}$
<em>in that order</em>—each element of the cofactor matrix automatically calculates a determinant that
tells how $M$ scales area in the appropriate plane. Or, for the off-diagonal elements, how $M$ maps
area from one axis plane to another. In other words, the cofactors work out to be exactly the
coefficients needed to transform the axis components of a bivector.</p>
<p>(The sign factor in step 3 above, by the way, serves to fix up some order issues. Namely, without
the sign factor, we’d have $\bf e_{xz}$ instead of $\bf e_{zx}$. The latter is the
preferred choice of basis element, for various reasons of convention.)</p>
<p>Incidentally, although we’re focusing on the 3D case here, I’ll quickly note that in $n$ dimensions,
the cofactor matrix works to transform $(n-1)$-vectors (in the appropriate basis). In fact, to
transform $k$-vectors in general, you would want a matrix of $(n-k)$th <em>minors</em> (determinants of
submatrices with $n - k$ rows and columns deleted) of $M$.</p>
<h2 id="bivectors-and-normals"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#bivectors-and-normals" title="Permalink to this section">Bivectors and Normals</a></h2>
<p>At this point, I have to make a small confession. I’ve been hiding something up my sleeve for the
past few pages. The trick is this: <strong>bivectors are isomorphic to normal vectors
in 3D</strong>. In fact, the <em>components</em> $(p, q, r)$ of a bivector in our standard basis are exactly (up
to normalization) the $(x, y, z)$ components of a normal to the bivector’s plane!</p>
<p>Let’s see how this comes about. We saw earlier that wedging a set of linearly dependent vectors
together will give zero. This means that the plane of a bivector $B$ can be defined by the following
equation:
$$
B \wedge v = 0
$$
Any vector $v$ that lies in $B$’s plane will satisfy this equation, because it will form a linearly
dependent set with two vectors “inside” $B$ (two vectors that span the plane). Or, to put it another
way, the trivector spanned by $B$ and $v$ will have zero volume.</p>
<p>Suppose we expand this equation using our standard vector and bivector bases, and simplify:
$$
\begin{gathered}
(p \, {\bf e_{yz}} + q \, {\bf e_{zx}} + r \, {\bf e_{xy}})
\wedge (x \, {\bf e_x} + y \, {\bf e_y} + z \, {\bf e_z}) = 0 \\
(px \, {\bf e_{yzx}} + qy \, {\bf e_{zxy}} + rz \, {\bf e_{xyz}}) = 0 \\
(px + qy + rz) {\bf e_{xyz}} = 0 \\
px + qy + rz = 0 \\
\end{gathered}
$$
Let me annotate this a bit in case the steps weren’t clear. In the second line I’ve distributed the
wedge product out over all the basis terms; most of the terms fall out because they have two copies
of the same axis wedged in (for example, ${\bf e_{yz}} \wedge {\bf e_y} = 0$). In the third
line, I reordered the axes in all the trivectors to $\bf e_{xyz}$, which we can do as long as
we keep track of the sign flips—and here, they all have an even number of sign flips. Finally, I
factored $\bf e_{xyz}$ out of the whole thing and discarded it.</p>
<p>Now, the final line looks just like a dot product between vectors $(p, q, r)$ and $(x, y, z)$! Or
in other words, it looks like the usual plane equation $n \cdot v = 0$, with normal vector $n = (p, q, r)$.</p>
<p>This shows that the bivector coordinates $p, q, r$ with respect to the basis
${\bf e_{yz}}, {\bf e_{zx}}, {\bf e_{xy}}$ are <em>also</em> the coordinates of a normal to the
plane, in the standard vector basis ${\bf e_x}, {\bf e_y}, {\bf e_z}$; moreover, the
operations of <em>wedging</em> with a bivector and <em>dotting</em> with its corresponding normal are identical.
Formally, this is an application of <a href="https://en.wikipedia.org/wiki/Hodge_star_operator">Hodge duality</a>,
which (in 3D) interchanges bivectors and their normals—but more on that in a future article.</p>
<h2 id="further-questions"><a href="http://reedbeta.com/blog/normals-inverse-transpose-part-1/#further-questions" title="Permalink to this section">Further Questions</a></h2>
<p>We’ve seen that normal vectors in 3D can be thought of as Grassmann bivectors, at least to an
extent. We’ve also seen geometrically why the cofactor matrix is the right thing to use to transform
a bivector. This provides a somewhat more satisfying answer than “the algebra works out that way” to
our original question of why some transformations make a distinction between ordinary vectors and
normal vectors.</p>
<p>However, there’s still a few remaining issues that I’ve glossed over. I said
that bivectors are “isomorphic” to normal vectors, meaning there’s a one-to-one relationship between
them—but what’s that relationship, exactly? Related, why did we end up with the cofactor matrix instead of the
inverse transpose? They’re proportional to each other, and one could make a case that it doesn’t
really matter in practice which you use, as we usually don’t care about the <em>magnitudes</em> of normal
vectors (we typically normalize them anyway). But we (or, well, I) would still like to understand
the origin of this discrepancy.</p>
<p>Another question: in our “apéritif” at the top of this article, we encountered units with both
positive and negative scaling powers, ranging from −3 to +3. We’ve now seen that Grassmann
$k$-vectors have scaling powers of $k$, from 0 to 3. But what about vectorial quantities with
negative scaling powers? Do those exist, and if so, what are they?</p>
<p>In the <a href="/blog/normals-inverse-transpose-part-2/">next part</a> of this series, we’ll dig deeper into
this and complicate our geometric story still further. 🤓</p>Flows Along Conic Sections
http://reedbeta.com/blog/flows-along-conic-sections/
http://reedbeta.com/blog/flows-along-conic-sections/Nathan ReedTue, 12 Dec 2017 20:35:26 -0800http://reedbeta.com/blog/flows-along-conic-sections/#commentsGraphicsMath<p>Here’s a cute bit of math I figured out recently. It probably doesn’t have much practical
application, at least not for graphics programmers, but I thought it was fun and wanted to share it.</p>
<p>First of all, everyone knows about rotations: they make things go in circles! More formally, given
a plane to rotate in and a center point, rotations of any angle will preserve circles in the same
plane and with the same center. By “preserve circles”, I mean that the rotation will send every
point on the circle to somewhere on the same circle. The individual points move, but the <em>set</em>
of points comprising the circle is invariant under rotation.</p>
<!--more-->
<p>Moreover, rotations with a fixed plane and center form a one-parameter family of transformations:
they can be parameterized by a single degree of freedom, the angle. By varying the angle, you move
the points around on each circle. Another way of saying this is that the family of rotations defines
a <em>flow</em> along circles: if you take derivatives with respect to the rotation angle, you can get a
vector field that shows how each point is pushed around by the rotation—like a velocity field in a
fluid simulation. The family of concentric circles preserved by the rotation show up as the
<a href="https://en.wikipedia.org/wiki/Integral_curve">integral curves</a> of this vector field.</p>
<p>So far, so good. Now for the fun part: it happens that <strong>all conic sections</strong>, not just circles,
have a similar family of linear or affine transformations that preserve them and induce a flow
along them.</p>
<p>Here’s a shadertoy to demonstrate. It cycles through circles, ellipses, parabolas, and hyperbolas,
and in each case animates through transformations that preserve that conic. The background
coordinate grid shows what the transformation is doing to the space, and dots trace individual
points to show how they flow along the conic.</p>
<div class="embed-wrapper-outer" >
<div class="embed-wrapper-inner">
<iframe class="embed" type="text/html" allowfullscreen frameborder="0" src="https://www.shadertoy.com/embed/XtXfDS?paused=false&gui=false"></iframe>
</div>
</div>
<p>So, what are these transformations? Let’s look at each type of conic in turn.</p>
<p><strong>Circles</strong>. As we’ve seen, circles are preserved by rotations, which can be parameterized by
their angle $\theta$ and have a matrix of the form:
$$
\begin{bmatrix}
\cos\theta & -\sin\theta \\ \sin\theta & \cos\theta
\end{bmatrix}
$$</p>
<p><strong>Ellipses</strong>. Ellipses are just scaled circles, and the transformations that preserve them can be
derived by: scaling the ellipse to a circle, rotating, then unscaling back to the ellipse. If the
ellipse is axis-aligned and has aspect ratio $\alpha$, then the ellipse-preserving transformations
have the form:
$$
\begin{bmatrix}
\cos\theta & -\alpha\sin\theta \\ \frac{1}{\alpha}\sin\theta & \cos\theta
\end{bmatrix}
$$
Geometrically, this produces a rotation combined with some shear and nonuniform scaling that varies
with angle. Naturally, it reduces to the circle case when $\alpha = 1$.</p>
<p><strong>Parabolas</strong>. This one is different! It turns out there’s no continuous family of <em>linear</em>
transformations that map a parabola to itself. However, there does exist a family of <em>affine</em>
transformations that does so. For the parabolas $y = x^2 + k$, it looks like this:
$$
\begin{bmatrix} x \\ y \end{bmatrix} \mapsto
\begin{bmatrix}
1 & 0 \\ v & 1
\end{bmatrix}
\begin{bmatrix} x \\ y \end{bmatrix} +
\begin{bmatrix} \frac{1}{2}v \\ \frac{1}{4}v^2 \end{bmatrix}
$$
This consists of a shear along the $y$-axis, which sort of “rolls” the parabola, pushing some points
up and others down so that a different point becomes the vertex. Then the translation puts the
vertex back where it was to begin with. The family of transformations is parameterized by the shear
fraction, $v$.</p>
<p>Note that unlike the previous cases, this family isn’t periodic. It’s unbounded; $v$ can range over
all the real numbers. As $v$ gets farther from zero, the shear will get more and more extreme, and
the original coordinate system more and more distorted—but the parabola stays just where it is,
and its points just keep flowing along it.</p>
<p><strong>Hyperbolas</strong>. We’re back to linear transformations again for these ones. Hyperbolas are preserved
by <a href="https://en.wikipedia.org/wiki/Squeeze_mapping">squeeze mappings</a>, which are nonuniform scalings
that have reciprocal scale factors along two axes: if they scale one axis by a factor $a$, they
scale the other by $1/a$. The axes here must be aligned with the asymptotes of the hyperbola.</p>
<p>It turns out that a convenient way to parameterize these is in terms of the
<a href="https://en.wikipedia.org/wiki/Hyperbolic_angle">hyperbolic angle</a> (somewhat of a misnomer, as it
isn’t an <em>angle</em> in the usual sense at all). The hyperbolas $y^2 - x^2 = k$ are preserved by
transformations of the form:
$$
\begin{bmatrix}
\cosh u & \sinh u \\ \sinh u & \cosh u
\end{bmatrix}
$$
This kind of transformation is also known as a “hyperbolic rotation” or a
<a href="https://en.wikipedia.org/wiki/Lorentz_transformation">Lorentz transformation</a>; it’s central in
special relativity (although there usually parameterized differently). Like the parabolic case,
the family is unbounded; $u$, the hyperbolic angle, can range over all real numbers but induces a
consistent flow along the hyperbolas no matter how positive or negative it gets.</p>
<p>Incidentally, these families of transformations we’ve been discussing are examples of continuous
symmetry groups—<a href="https://en.wikipedia.org/wiki/Lie_group">Lie groups</a>. The flow vector field is the
generator of the <a href="https://en.wikipedia.org/wiki/Lie_algebra">Lie algebra</a> for that Lie group.</p>
<p>And, as an application of <a href="https://en.wikipedia.org/wiki/Noether%27s_theorem">Noether’s theorem</a>,
each family also has a corresponding conserved quantity—a particular function of the coordinates
that’s conserved (does not change) when a transformation is applied. Consequently, these quantities
are also constant along the corresponding conic sections. They can therefore serve to identify a
specific conic section amongst all the ones preserved by the same family of transformations.</p>
<ul>
<li>Circles’ conserved quantity is $x^2 + y^2$, the radius of the circle.</li>
<li>Ellipses (in the axis-aligned case) have conserved quantity $(x/\alpha)^2 + y^2$, an
aspect-ratio-corrected radius.</li>
<li>Parabolas (in the standard orientation and aspect ratio we considered) have conserved quantity
$y - x^2$, the height of the parabola above the origin.</li>
<li>Hyperbolas (in the standard orientation and aspect ratio we considered): the conserved quantity
is $y^2 - x^2$, which is the plus or minus the hyperbola’s distance of closest approach to the
origin.</li>
</ul>Conformal Texture Mapping
http://reedbeta.com/blog/conformal-texture-mapping/
http://reedbeta.com/blog/conformal-texture-mapping/Nathan ReedSun, 26 Nov 2017 17:28:39 -0800http://reedbeta.com/blog/conformal-texture-mapping/#commentsGraphicsMath<p>In two <a href="/blog/quadrilateral-interpolation-part-1/">previous</a> <a href="/blog/quadrilateral-interpolation-part-2/">articles</a>,
I’ve explored some unusual methods of texture mapping—beyond the conventional
approach of linearly interpolating UV coordinates across triangles. This post is a sort of
continuation-in-spirit of that work, but I’m no longer focusing specifically on quadrilaterals.</p>
<p>A problem that often afflicts texture mapping on smooth, curvy models (such as characters) is distortion:
in some regions, the texture may appear overly squashed,
stretched, or sheared on the 3D model. A related but distinct problem is that of different
regions of the model having different texel density, due to varying scale in the UV mapping.
I wanted to explore these issues mathematically. Are there ways to create
texture mappings that have low distortion by construction?</p>
<p>Ultimately, I didn’t come to an altogether satisfying resolution of this question, but I encountered
plenty of interesting math along the way and I want to share some of it. This post will be on the
more esoteric, less immediately-applicable side—but I hope you’ll find the topic intriguing
nonetheless.</p>
<!--more-->
<div class="toc">
<ul>
<li><a href="http://reedbeta.com/blog/conformal-texture-mapping/#quantifying-texture-distortion">Quantifying Texture Distortion</a></li>
<li><a href="http://reedbeta.com/blog/conformal-texture-mapping/#conformal-maps">Conformal Maps</a></li>
<li><a href="http://reedbeta.com/blog/conformal-texture-mapping/#mobius-transformations">Möbius Transformations</a></li>
<li><a href="http://reedbeta.com/blog/conformal-texture-mapping/#holomorphic-functions">Holomorphic Functions</a></li>
<li><a href="http://reedbeta.com/blog/conformal-texture-mapping/#invertibility-and-critical-points">Invertibility And Critical Points</a></li>
<li><a href="http://reedbeta.com/blog/conformal-texture-mapping/#compulsory-criticality">Compulsory Criticality</a></li>
<li><a href="http://reedbeta.com/blog/conformal-texture-mapping/#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="quantifying-texture-distortion"><a href="http://reedbeta.com/blog/conformal-texture-mapping/#quantifying-texture-distortion" title="Permalink to this section">Quantifying Texture Distortion</a></h2>
<p>How can we mathematically characterize “distortion” in a texture mapping? To begin with, we should
make a distinction between local and global forms of distortion. Local distortion would be visible
when zooming in on a small region of the model—looking at a single triangle, or a small
neighborhood around a point. Conversely, global distortion might only show up when you look at the
whole model and compare texture scale and orientation across widely separated points.</p>
<p>Some amount of global distortion is inevitable in mapping a flat, 2D texture to a non-flat, 3D
object. It’s not necessarily a problem, and it can even be useful in some cases to allow varying
texel density to concentrate more texels in more-important or more-detailed parts of a model. For
example, human head models usually give more texel density to the face region than to the sides, top,
and back of the head.</p>
<p>However, local distortion is usually undesirable. It changes the shapes of features in the texture,
distorts the shape of filter kernels operating in texture space, and gives unequal texel density along
different axes—bad news all around.</p>
<p>To measure local distortion, we can look at the tangent basis implied by the UVs assigned on the
mesh—the same basis we typically compute as part of the setup for normal and parallax mapping.</p>
<p>This basis can be computed per-triangle, and consists of the two 3D vectors within the triangle’s
plane that correspond to the texture’s U and V axes—known as the tangent and bitangent
vectors, respectively. (The triangle normal is usually included as a third basis vector, but we won’t
need that here.) If the triangle’s vertex positions are $p_1, p_2, p_3$ and the corresponding
UVs are $(u_1, v_1) \ldots (u_3, v_3)$, then the tangent and bitangent vectors $T, B$ can be defined by:
$$
\begin{aligned}
p_2 - p_1 &= (u_2 - u_1)T + (v_2 - v_1)B \\
p_3 - p_1 &= (u_3 - u_1)T + (v_3 - v_1)B
\end{aligned}
$$
These equations can be cast in matrix form, and solved as follows:
$$
\begin{bmatrix} T_x & B_x \\ T_y & B_y \\ T_z & B_z \end{bmatrix} =
\begin{bmatrix}
(p_{2x} - p_{1x}) & (p_{3x} - p_{1x}) \\
(p_{2y} - p_{1y}) & (p_{3y} - p_{1y}) \\
(p_{2z} - p_{1z}) & (p_{3z} - p_{1z})
\end{bmatrix}
\begin{bmatrix} (u_2 - u_1) & (u_3 - u_1) \\ (v_2 - v_1) & (v_3 - v_1) \end{bmatrix}^{-1}
$$
In the case of a general parameterized surface given by $p(u, v)$, the tangent basis at a point is
defined as $T = \partial p / \partial u, B = \partial p / \partial v$.</p>
<p>When using a tangent basis for normal mapping, you’d probably orthonormalize it at this point. Here,
we don’t want to do that—the “raw” tangent basis contains the information we want to extract
about texture distortion.</p>
<p>There are a couple of different ways a texture can be distorted locally. One is for it to be
non-uniformly scaled; another is for it to be sheared:</p>
<p class="image-array"><img alt="Non-uniformly scaled texture" src="http://reedbeta.com/blog/conformal-texture-mapping/quad_scaled.png" title="Non-uniformly scaled texture" />
<img alt="Sheared texture" src="http://reedbeta.com/blog/conformal-texture-mapping/quad_sheared.png" title="Sheared texture" /></p>
<p>Both of these effects can be measured by looking at the tangent basis. Nonuniform scaling can be
measured by comparing the lengths of $T$ and $B$, and shear can be measured by the angle between
them; an unsheared texture mapping should have $T$ and $B$ perpendicular.</p>
<p>A convenient metric for both forms of distortion is the eccentricity of the ellipse created by
transforming a unit circle from tangent space to model space. If the mapping is undistorted, it will
map the unit circle to another circle; if there’s any nonuniform scaling or shear present, the
circle will get elongated into an ellipse (though not necessarily an axis-aligned one):</p>
<div class="embed-wrapper-outer" >
<div class="embed-wrapper-inner">
<iframe class="embed" type="text/html" allowfullscreen frameborder="0" src="https://www.shadertoy.com/embed/MlXfDH?paused=false&gui=false"></iframe>
</div>
</div>
<p>How can we compute the eccentricity from the tangent basis? The major and minor radii of the ellipse
are the <a href="https://en.wikipedia.org/wiki/Singular-value_decomposition">singular values</a> of the
tangent-to-model transform (i.e. the $[T, B]$ matrix). I’ll skip the detailed derivation, but I
found that the major and minor radii $a, b$ can be expressed in terms of $T$ and $B$ as follows:
$$
\begin{aligned}
a^2 &= \tfrac{1}{2} \left[ (T^2 + B^2) + \sqrt{(T^2 - B^2)^2 + 4(T \cdot B)^2} \right] \\
b^2 &= \tfrac{1}{2} \left[ (T^2 + B^2) - \sqrt{(T^2 - B^2)^2 + 4(T \cdot B)^2} \right]
\end{aligned}
$$
The eccentricity of the ellipse can then be computed as:
$$
\epsilon = \sqrt{1 - \frac{b^2}{a^2}}
$$
This value equals 0 when the ellipse is a circle, and grows toward 1 as it becomes more elongated.</p>
<h2 id="conformal-maps"><a href="http://reedbeta.com/blog/conformal-texture-mapping/#conformal-maps" title="Permalink to this section">Conformal Maps</a></h2>
<p>If a texture mapping—either on a triangle mesh or a general parameterized surface—has no
local distortion anywhere, i.e. its eccentricity equals 0 at every point, then it belongs to
a class known as <a href="https://en.wikipedia.org/wiki/Conformal_map"><strong>conformal maps</strong></a>.</p>
<p>Conformal maps are a rich seam of mathematics, with a lot of connections to deep parts of
geometry, analysis, and mathematical physics. In two dimensions, they’re particularly powerful and
flexible (their usefulness falls off in higher dimensions).</p>
<p>Moreover, conformal maps are oddly aesthetically pleasing. 😄 There’s often a rather
soothing quality of smoothness to them, owing to their lack of local distortion.</p>
<p>The key geometric property of these maps is that they preserve angles. To be precise:
if two lines or curves intersect at a certain angle, then their images under a conformal
map will intersect with the same angle. However, <em>distances</em> aren’t preserved in general: a
conformal map may scale things up and down, with different scale factors at different points. As
a result, shapes and sizes of things may be distorted in a global sense.</p>
<p>Another way to express the same idea is that a conformal map can be approximated to first order near
any point as a similarity transformation—a linear transformation with no shear or nonuniform scaling,
only rotation and uniform scaling.</p>
<p>We can also relax the definition of a conformal map by allowing the eccentricity to be bounded by a
constant, $0 \leq \epsilon \leq \epsilon_{\text{max}}$, rather than requiring it to be exactly zero
everywhere. This is called a <a href="https://en.wikipedia.org/wiki/Quasiconformal_mapping"><strong>quasiconformal map</strong></a>.</p>
<h2 id="mobius-transformations"><a href="http://reedbeta.com/blog/conformal-texture-mapping/#mobius-transformations" title="Permalink to this section">Möbius Transformations</a></h2>
<p>To get some initial intuition for what conformal maps are like, it’s useful to narrow our focus to a
specific type of conformal map that’s easy to analyze and play with. For this, we’ll look at
<a href="https://en.wikipedia.org/wiki/M%C3%B6bius_transformation"><strong>Möbius transformations</strong></a>, which are
just about the simplest conformal maps that are interesting enough to be worth studying. (They’re
named after <a href="https://en.wikipedia.org/wiki/August_Ferdinand_M%C3%B6bius">August Ferdinand Möbius</a>,
the same fellow better-known for the Möbius strip; he also invented homogeneous coordinates.)</p>
<p>In 2D, the most straightforward way to define these maps is with complex numbers. A 2D Möbius
transformation has the form:
$$
f(z) = \frac{az + b}{cz + d}, \qquad z \in \mathbb{C}
$$
where $a, b, c, d \in \mathbb{C}$ are some constants, which should satisfy $ad - bc \neq 0$ (or the
transform will be degenerate).</p>
<div class="embed-wrapper-outer" >
<div class="embed-wrapper-inner">
<iframe class="embed" type="text/html" allowfullscreen frameborder="0" src="https://www.shadertoy.com/embed/4tXyWs?paused=false&gui=false"></iframe>
</div>
</div>
<p>Here’s a Shadertoy that applies a Möbius transformation to a coordinate grid, animating the
parameters over time.
While watching this, pay attention to the grid intersections. Though the straight lines become
curved, and the overall shapes may distort quite a bit, wherever two grid lines meet each other they
always remain perpendicular! That’s the conformal property at work.</p>
<p>A few interesting facts about Möbius transformations, in no particular order:</p>
<ul>
<li>They form a mathematical group. The composition of two Möbius transformations is another Möbius,
and the inverse of a Möbius is another Möbius!</li>
<li>Although it has four complex parameters (eight components total), a Möbius has only <em>six</em> degrees
of freedom. That’s because an overall complex factor multiplied into all the parameters has no
effect. In other words, parameters $(a, b, c, d)$ and $(ua, ub, uc, ud)$ specify the same
transformation, for any $u \neq 0 \in \mathbb{C}$.</li>
<li>Möbius transformations generally map lines to circles, and circles to other circles. (And,
occasionally, circles to lines.)</li>
<li>Möbius transformations can be defined in higher dimensions as well, and they behave analogously,
with (hyper)planes mapping to (hyper)spheres.</li>
<li>In 3D and higher, Möbius transformations are the <em>only</em> conformal maps that exist. In 2D, they’re
just a small subset of a much richer collection of conformal maps.</li>
</ul>
<p>The six degrees of freedom of a 2D Möbius are just enough that we can construct a transformation to
map any three chosen points to three others. This makes it tempting to think that we could use them
for texture mapping, by applying a Möbius to each triangle of a 3D model.</p>
<p>Unfortunately, the mapping will not in general be continuous from one triangle to the next: the
shared edge will be mapped to different circles by each triangle’s Möbius, and there are no more
degrees of freedom available to try to fix it.
There have been some papers, <a href="http://www.dmg.tuwien.ac.at/geom/ig/publications/2015/conformal2015/conformal2015.pdf">like this one</a>,
trying to patch together piecewise Möbius transformations with least-squares optimization, to produce
<em>approximate</em> conformal maps.</p>
<p>There’s a good deal more that could be said about Möbius transformations.
However, let’s move on for now and look at a broader set of conformal maps.</p>
<h2 id="holomorphic-functions"><a href="http://reedbeta.com/blog/conformal-texture-mapping/#holomorphic-functions" title="Permalink to this section">Holomorphic Functions</a></h2>
<p>From this point forward, we’ll restrict ourselves to the 2D case. Given that Möbius transformations
aren’t quite as flexible as we might like, how can we construct other types of conformal maps?</p>
<p>It’s no accident that we used complex numbers to define the Möbius transformation in the previous
section. Complex numbers, in fact, are intimately linked to conformal maps in 2D.</p>
<p>Why is this? If you recall, I mentioned earlier that one way to define a conformal map is that it
can be approximated to first order near any point as a similarity transformation. Well, multiplication
by a (nonzero) complex number implements a 2D similarity transformation: if $z = re^{i\theta}$, then
multiplication by $z$ will scale by $r$ and rotate by $\theta$.</p>
<p>As you may have guessed, “approximated to first order near a point” is a long-winded way of talking
about derivatives. So, what we’re saying is that for a function on $\mathbb{C}$ to be a conformal
map, its derivative at any point should act as a complex number. In other words, it should be
<em>complex differentiable</em>. Functions that satisfy this requirement are called
<a href="https://en.wikipedia.org/wiki/Holomorphic_function"><strong>holomorphic functions</strong></a>.</p>
<p>I should note that being complex-differentiable is different—and much more restrictive—than just being
differentiable as a vector function on $\mathbb{R}^2$. In other words, it’s not enough for the
$x$ and $y$ components of a mapping to <em>individually</em> be differentiable. As seen before, the
derivative at each point must take the form of a similarity transform; formally, the $x$ and $y$
components must satisfy the <a href="https://en.wikipedia.org/wiki/Cauchy%E2%80%93Riemann_equations">Cauchy–Riemann equations</a>,
which state that the mapping’s tangent and bitangent vectors must be orthogonal, and
of equal length, at each point. Only when these conditions are satisfied can you interpret the mapping
as a differentiable complex function of a complex variable.</p>
<p>Fortunately, the basic differentiation rules we learn in school for real-valued functions do
carry over to complex functions! In particular, all the basic arithmetic operations on complex numbers
are differentiable. So, to make a holomorphic function, all we have to do is write down
an algebraic formula—pretty much whatever we like—for a complex function $f(z)$. These functions
will always produce conformal maps, by construction.</p>
<p>We can also use exp, log, and trig functions, as well as many other special functions; they can be
can be extended into the complex domain and are holomorphic too. However, there are a few operations
we <em>can’t</em> use: the complex conjugate, magnitude, argument, or real or imaginary parts of a complex
number. Those <em>aren’t</em> holomorphic, it turns out. As long as we follow these rules, any function we
build will be holomorphic and therefore conformal.</p>
<p>So, okay! We know a lot about how to build functions to accomplish specific things. In fact, we can
try taking functions we’ve already got experience with, and just extending them to the complex domain.
For instance, take a 1D cubic Bézier curve with control points $c_0, c_1, c_2, c_3$:
$$
B(t) = (1-t)^3 c_0 + 3(1-t)^2 t c_1 + 3(1-t) t^2 c_2 + t^3 c_3
$$
We’ll use the same formula, but make everything complex numbers—both the control points and the
input variable.
$$
B(z) = (1-z)^3 c_0 + 3(1-z)^2 z c_1 + 3(1-z) z^2 c_2 + z^3 c_3
$$
Let’s see what it looks like! Here I’ve set it up to produce the identity map with a little bit of
animated (complex) wiggle in the tangents at the endpoints, 0 and 1.</p>
<div class="embed-wrapper-outer" >
<div class="embed-wrapper-inner">
<iframe class="embed" type="text/html" allowfullscreen frameborder="0" src="https://www.shadertoy.com/embed/llByzW?paused=false&gui=false"></iframe>
</div>
</div>
<p>Huh. Well, it’s doing…<em>something</em>. The mapping does seem to be conformal, for the most part—right
angles are staying right angles. But why are we seeing the unit square getting duplicated and kinda
merging with itself into curvy 8-sided and 12-sided figures? Why do the grid lines seem to break
and reconnect all the time? This is interesting to look at, but doesn’t seem too useful for texture
mapping. What’s going on?</p>
<h2 id="invertibility-and-critical-points"><a href="http://reedbeta.com/blog/conformal-texture-mapping/#invertibility-and-critical-points" title="Permalink to this section">Invertibility And Critical Points</a></h2>
<p>The problem comes down to <em>invertibility</em>. When we get this “multiple copies” phenomenon, what we’re
seeing is the complex function mapping multiple regions of its domain (the Shadertoy’s screen space)
to the same region of its range (the coordinate grid being visualized). In other words, the function
isn’t one-to-one—and therefore it fails to be invertible.</p>
<p>Stepping back to real-valued functions for a moment may help clarify. Here’s the graph of the real
function $f(x) = x^2$:</p>
<p><img alt="Graph of x²" class="not-too-wide" src="http://reedbeta.com/blog/conformal-texture-mapping/xsquared.png" title="Graph of x²" /></p>
<p>It’s a parabola, of course. It’s also one of the simplest examples of a non-invertible function.
Why? Because inputs $x$ and $-x$ both map to the same value, $x^2$. If you squint at it a bit, you
can see the graph as being made up of two distorted copies of the <em>positive half</em> of the real line.</p>
<p>The complex extension of this function $f(z) = z^2$, works the same way—but a bit more
dramatically, it gives us two distorted copies of <em>the entire complex plane</em>, squished together!</p>
<p><img alt="Graph of z²" class="not-too-wide" src="http://reedbeta.com/blog/conformal-texture-mapping/zsquared.png" title="Graph of z²" /></p>
<p>Now, a funny thing happens when we look at higher powers. When we restrict ourselves to the real
numbers only, $x^3$ is invertible, $x^4$ is not, $x^5$ is, and so on: odd powers are invertible,
while even ones aren’t. Even powers all map $x$ and $-x$ to the same value, while odd powers maintain
the distinction.</p>
<p>However, when extended to the complex plane, $z^n$ <em>always</em> fails to be invertible unless $n = 1$!
In fact, graphing $z^n$ gives you $n$ copies of the plane, squished together into wedges around the
origin. All of the copies are conformal mappings—but their scale becomes increasingly extreme as
you approach the origin, and the function is not strictly conformal <em>at</em> the origin.</p>
<div class="embed-wrapper-outer" >
<div class="embed-wrapper-inner">
<iframe class="embed" type="text/html" allowfullscreen frameborder="0" src="https://www.shadertoy.com/embed/llXBDH?paused=false&gui=false"></iframe>
</div>
</div>
<p>An example for the $n = 3$ case, which you can verify by working out the calculations if you like:
$$
\begin{aligned}
1^3 &= 1 \\
(-\tfrac{1}{2} + \tfrac{\sqrt{3}}{2} i)^3 &= 1 \\
(-\tfrac{1}{2} - \tfrac{\sqrt{3}}{2} i)^3 &= 1 \\
\end{aligned}
$$
Three distinct complex numbers, when cubed, all give the same result of 1.</p>
<p>In general, a holomorphic function will fail to be invertible wherever it has a <em>critical point</em>—a
point where its derivative equals zero. In the vicinity of such a place, the function will locally
behave like $z^n$: it will have $n$ copies of the surrounding region of the complex plane, squished
together into wedges around the critical point. Here, $n$ is one plus the order (aka multiplicity)
of the zero in the derivative.</p>
<p>This is what was going on in the Bézier example from the previous section. Since it was a cubic
polynomial, its derivative is quadratic, and quadratic polynomials have two zeros. So the cubic
Bézier curve has two critical points, which move around the plane as the curve’s parameters change.
When the critical points get too close to the region of interest (the unit square, say), we can see
two or even three copies of that region mushed together.</p>
<h2 id="compulsory-criticality"><a href="http://reedbeta.com/blog/conformal-texture-mapping/#compulsory-criticality" title="Permalink to this section">Compulsory Criticality</a></h2>
<p>If we want to build holomorphic functions that are guaranteed to be invertible, we need to avoid
critical points, i.e. zeros of the derivative. Unfortunately, this turns out to be more challenging
than you might expect.</p>
<p>It’s easy to make a real polynomial that doesn’t have any zeros, such as $f(x) = x^2 + 1$.
Correspondingly, it’s easy to make a real polynomial that’s everywhere invertible, by taking the
integral of one that doesn’t have any zeros: $\int (x^2 + 1) \, dx = \tfrac{1}{3}x^3 + x$, for
example.</p>
<p>However, a crucial difference between the real and complex domains comes into play here: while zeros are optional
for real polynomials, they are <em>mandatory</em> for complex ones. A complex polynomial of degree $n \geq 1$
always has <em>exactly</em> $n$ zeros (counted with multiplicity). For example, the zeros of $z^2 + 1$ are
at $z = \pm i$. Thus, a complex polynomial of degree $n \geq 2$ always has at least one critical
point, and possibly up to $n - 1$ of them.</p>
<p>In other words, it’s impossible for complex polynomials of degree 2 or higher to be globally invertible.</p>
<p>Polynomials aren’t the only functions out there, though. What about rational functions? They
obey a similar dictum: if a rational function is degree $p$ over degree $q$, then there are
potentially $p + q - 1$ critical points (remember the quotient rule)—not to mention anywhere from
1 to $q$ poles, where the denominator goes to zero and the rational function blasts off to infinity.
Incidentally, poles behave similarly to critical points in some ways: they come in different orders,
and a pole of order $n$ will have $n$ copies of the complex plane around it. So poles are <em>another</em>
way to break invertibility.</p>
<p>This leads to the somewhat depressing conclusion that the <em>only</em> polynomial or rational complex
functions that are everywhere invertible are those that have degree at-most-1 over degree at-most-1.
In other words: Möbius transformations.</p>
<h2 id="conclusion"><a href="http://reedbeta.com/blog/conformal-texture-mapping/#conclusion" title="Permalink to this section">Conclusion</a></h2>
<p>So, polynomial and rational functions aren’t good enough—we’d need to dig deeper if we’re to
find a class of invertible holomorphic functions more powerful than Möbius. One possibility might
be to define $f(z) = \int e^{g(z)} \, dz$, where $g(z)$ is some holomorphic function without poles.
Then $f(z)$ will have neither poles nor critical points, since its derivative is $e^{g(z)}$. (The
complex exponential function, like the real version, is everywhere nonzero.)</p>
<p>Now, if we step back for a moment, we don’t necessarily need <em>global</em> invertibility. If we’re mainly
interested in some bounded region—such as the unit square, for texture mapping—then it may well
be sufficient for our purposes to maintain <em>local</em> invertibility there. This could be done by
keeping critical points and poles far enough from the region of interest that they don’t weird things
out too much. That still seems like a challenging juggling act to perform, though—and moreover,
the more degrees of freedom we have in our function, the more critical points or poles we probably
have to worry about.</p>
<p>In the course of reading up on this subject, I also found <a href="http://www.cs.technion.ac.il/~gotsman/AmendedPubl/Ofir/hilbert.pdf">another paper</a>
that takes a quite different approach—based on <a href="https://en.wikipedia.org/wiki/Cauchy%27s_integral_formula">Cauchy’s integral formula</a>—to
constructing conformal maps. I might write about that in another post sometime—there’s a lot more
to this rabbit hole of math, and it’s interesting stuff, but ultimately it doesn’t seem very practical.</p>
<p>For more reading on the theory of holomorphic functions, see <a href="https://terrytao.wordpress.com/category/teaching/246a-complex-analysis/">Terry Tao’s complex analysis course notes</a>.
(Be warned, it’s a graduate-level course and the notes are pretty dense and formal.)</p>Quadrilateral Interpolation, Part 2
http://reedbeta.com/blog/quadrilateral-interpolation-part-2/
http://reedbeta.com/blog/quadrilateral-interpolation-part-2/Nathan ReedThu, 18 May 2017 21:15:44 -0700http://reedbeta.com/blog/quadrilateral-interpolation-part-2/#commentsGraphicsGPUMath<p>It’s been quite a while since the <a href="/blog/quadrilateral-interpolation-part-1/">first entry in this series</a>!
I apologize for the long delay—at the time, I’d intended to write at least one more entry, but I
couldn’t get the math to work and lost interest. However, I recently had occasion to
revisit this topic, and this time was able to make progress.</p>
<p>In this article, I’ll cover <strong>bilinear interpolation</strong> on quadrilaterals. Unlike the projective
interpolation covered in Part 1, this method will allow us to maintain regular UV spacing along
all four of the quad’s edges, regardless of its shape; but we’ll see that to achieve this, we’ll
have to accept a different kind of distortion to the texture.<!--more--></p>
<div class="toc">
<ul>
<li><a href="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/#the-story-so-far">The Story So Far</a></li>
<li><a href="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/#bilinear-interpolation">Bilinear Interpolation</a></li>
<li><a href="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/#properties">Properties</a></li>
<li><a href="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/#inversion">Inversion</a></li>
<li><a href="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/#implementation">Implementation</a></li>
<li><a href="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/#conclusion">Conclusion</a></li>
</ul>
</div>
<h2 id="the-story-so-far"><a href="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/#the-story-so-far" title="Permalink to this section">The Story So Far</a></h2>
<p>The central problem of this series is: how can we map a rectangular texture image
onto an arbitrary convex quadrilateral?</p>
<p>If we model the quad as two triangles, and apply ordinary (linear) texture mapping to the mesh,
we get something like this:</p>
<p><img alt="Brick texture on arbitrary quad, showing linear interpolation seam" src="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/quad_arbitrary_seam.png" title="Brick texture on arbitrary quad, showing linear interpolation seam" /></p>
<p>There’s a visible seam at the edge between the two triangles, where the derivatives of the mapping
change abruptly. We could improve the situation by subdividing the quad more finely, and assigning
appropriate UVs to the interior vertices; but perhaps we can’t or don’t wish to do that. Instead,
we’re looking at alternative methods for interpolating the UVs to avoid this problem altogether.</p>
<p>In <a href="/blog/quadrilateral-interpolation-part-1/">part 1</a>, I looked at <em>projective interpolation</em>,
which produces results like this:</p>
<p><img alt="Two projectively-interpolated quads with a seam visible between them" src="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/quad_arbitrary_proj_2.png" title="Two projectively-interpolated quads with a seam visible between them" /></p>
<p>This method, based on perspective projection, succeeds in removing the visible seam between
triangles in a quad. Unfortunately, it has a couple of other issues. First, the perspective-like
transformation tends to produce an unwanted “3D” effect, where a 2D quad that’s flat on the screen
comes to look like a 3D rectangle stretching off into the distance. Second, the UV spacing becomes
nonuniform along the edges of the quad—which introduces a $C^0$ seam between adjacent quads,
even worse than the original $C^1$ seams we were trying to fix!</p>
<h2 id="bilinear-interpolation"><a href="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/#bilinear-interpolation" title="Permalink to this section">Bilinear Interpolation</a></h2>
<p>If you’re reading this, you’re probably familiar with bilinear interpolation in the context of
texture sampling. At a point between texel centers, the sampling result is a blend of all four of
the nearest texels.</p>
<p><img alt="Bilinear interpolation of four neighboring texels" src="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/bilinear-texels.png" title="Bilinear interpolation of four neighboring texels" /></p>
<p>This can be expressed mathematically as follows. If $t_0 \ldots t_3$ are the four texel colors,
the interpolated result is:
$$\begin{aligned}
t(u, v) &= \text{lerp}\bigl(\text{lerp}(t_0, t_1, u), \text{lerp}(t_2, t_3, u), v\bigr) \\
&= (1-u)(1-v) t_0 + u(1-v) t_1 + (1-u)v t_2 + uv t_3
\end{aligned}$$</p>
<p>Let’s now define bilinear interpolation for a quadrilateral exactly the same way, except that
instead of four texel colors, we’ll have the four vertices of the quad. If the vertex positions are
$p_0 \ldots p_3$, then the position corresponding to a given UV on the quad is
$$
p(u, v) = \text{lerp}\bigl(\text{lerp}(p_0, p_1, u), \text{lerp}(p_2, p_3, u), v\bigr)
$$</p>
<p><img alt="Bilinear interpolation of vertices in a quad" src="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/bilinear-verts.png" title="Bilinear interpolation of vertices in a quad" /></p>
<p>This defines the forward UV mapping—from UV to position on the quad’s surface. To actually
implement this technique, we’re going to need to invert this equation, so we can write a pixel
shader that maps the pixel’s position back to the UV at which to sample the texture. I’ll show how
to do that a bit later; but first, let’s have a look at the results.</p>
<p><img alt="Two bilinearly-interpolated quads" src="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/quad_bilinear_2.png" title="Two bilinearly-interpolated quads" /></p>
<p>As hoped, bilinear interpolation both hides the join between the two triangles in each quad, and
keeps uniform spacing along the edges so that the texture will match between adjacent quads. (There’s
still a $C^1$ seam between the quads, where the mapping derivatives jump; but that’s unavoidable as
long as we insist that the texture completely fill the quad.)</p>
<h2 id="properties"><a href="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/#properties" title="Permalink to this section">Properties</a></h2>
<p>Despite the downsides of projective interpolation, it does have one nice feature: it preserves
straight lines. Any line in the original texture space will be mapped to another line by an arbitrary
projective transform. In the transformed grid below, note how all the horizontal, vertical, and
diagonal lines are still straight:</p>
<p><img alt="Two projectively-interpolated grids" src="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/grid_arbitrary_proj_2.png" title="Two projectively-interpolated grids" /></p>
<p>In bilinear interpolation, this is no longer the case—some lines in the original texture space
now come out curved:</p>
<p><img alt="Two bilinearly-interpolated grids" src="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/grid_bilinear_2.png" title="Two bilinearly-interpolated grids" /></p>
<p>The curvature introduced by the mapping is orientation-dependent. All the horizontal and vertical
lines from the original grid are still straight; this is because the bilinear interpolation formula
reduces to linear interpolation when either $u$ or $v$ is held fixed. However, most <em>diagonal</em> lines
will be mapped to curves. In particular, they’ll be mapped to quadratic splines, since the bilinear
interpolation formula becomes quadratic (in the general case) when $u$ and $v$ are held proportional
to each other.</p>
<p>Depending on the texture, this effect may not be very noticeable. In textures that don’t have
a lot of line-like features to begin with, or whose line-like features are mostly vertical
and horizontal (e.g. bricks), the distortion of diagonals is pretty hard to see.</p>
<p>By the way, the fact that bilinear interpolation creates quadratic splines along diagonals
<a href="http://blog.demofox.org/2016/12/08/evaluating-polynomials-with-the-gpu-texture-sampler/">can be exploited to evaluate splines in a GPU texture unit</a>.</p>
<h2 id="inversion"><a href="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/#inversion" title="Permalink to this section">Inversion</a></h2>
<p>(If you prefer not to wade through the mathy details and just want to see the code, feel free
to jump down to the <a href="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/#implementation">Implementation</a> section. In my derivation here, I’m indebted
to <a href="http://iquilezles.org/www/articles/ibilinear/ibilinear.htm">this article by Íñigo Quílez</a>.)</p>
<p>As seen above, the bilinear interpolation setup gives us an expression for the position of a point
in terms of its $u, v$ coordinates within the quad. Let $p_0 \ldots p_3$ be the quad’s vertices (in whatever
space the quad is defined—model space, world space, screen space). Then the position corresponding
to a given $u, v$ is:
$$\begin{aligned}
p(u, v) &= \text{lerp}\bigl(\text{lerp}(p_0, p_1, u), \text{lerp}(p_2, p_3, u), v\bigr) \\
&= (1-u)(1-v) p_0 + u(1-v) p_1 + (1-u)v p_2 + uv p_3
\end{aligned}$$
However, we’ll need to invert this equation in order to apply it in a pixel shader: we need to
calculate the UV at which to sample the texture (we can’t pass it down from the vertex shader
because that would give us linear interpolation, not bilinear). So we need to solve for $u, v$ in
terms of $p$, the pixel’s position within the quad.</p>
<p>First, let’s multiply out and regroup terms:
$$
0 = (p_0 - p) + (p_1 - p_0) u + (p_2 - p_0) v + (p_0 - p_1 - p_2 + p_3) uv
$$
The four vectors in parentheses here can readily be interpreted geometrically. We have the pixel’s
position relative to the origin of UV space; the two UV basis vectors; and one more,
$p_0 - p_1 - p_2 + p_3$. This vector expresses how the quad deviates from being a parallelogram.
It is the vector difference between the quad’s final point $p_3$ and where that point <em>would</em> be to
complete the parallelogram spanned by the UV basis vectors:</p>
<p><img alt="Vectors involved in inverse bilinear interpolation" src="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/invbilin-vectors.png" title="Vectors involved in inverse bilinear interpolation" /></p>
<p>For convenience, let’s give names to these vectors:
$$\begin{aligned}
q &\equiv p - p_0 \\
b_1 &\equiv p_1 - p_0 \\
b_2 &\equiv p_2 - p_0 \\
b_3 &\equiv p_0 - p_1 - p_2 + p_3
\end{aligned}$$
Then the equation we’re solving becomes:
$$
0 = -q + b_1 u + b_2 v + b_3 uv \qquad (*)
$$
Now let’s solve for $u$ in terms of $v$:
$$\begin{aligned}
q - b_2 v &= (b_1 + b_3 v) u \\
u &= \frac{q - b_2 v}{b_1 + b_3 v}
\end{aligned}$$
<em>But wait! These quantities are vectors! Didn’t your mother ever teach you that you cannot divide
vectors?</em> Don’t worry, folks; I know geometric algebra. 😄</p>
<p>In seriousness: when we’ve correctly solved for $v$, then the numerator and denominator of
this fraction must be parallel, so we can divide them to recover $u$ (and it will be a scalar). To
implement this in practice, we’ll just pick one of the vector’s coordinate components do the
calculation with.</p>
<p>Onward to solving for $v$. We can eliminate $u$ from equation $(*)$ by wedging both sides with
$b_1 + b_3 v$. (In case you’re not familiar with the wedge product, it’s a tool from
<a href="http://www.terathon.com/gdc12_lengyel.pdf">Grassmann algebra</a>, part of geometric algebra. For our
purposes here, you can treat it as the signed area of the parallelogram spanned by two vectors.
When you wedge a vector with itself, you get zero.)
$$\begin{aligned}
0 &= (-q + b_1 u + b_2 v + b_3 uv) \wedge (b_1 + b_3 v) \\
&= (b_1 \wedge q) + (b_3 \wedge q - b_1 \wedge b_2) v + (b_2 \wedge b_3) v^2
\end{aligned}$$
We now have a quadratic equation in $v$ and we can apply the usual quadratic formula. (<em>Wait!
These quantities are bivectors! One cannot simply apply the quadratic formula to bivectors!</em>
Again, it’s okay, since these bivectors all lie in the plane of the quad and are therefore
proportional to one other. Again, in practice we’ll just look at one coordinate component of
the bivectors.)</p>
<p>The two possible solutions to $v$ are:
$$
v = \frac{-B \pm \sqrt{B^2 - 4AC}}{2A}, \qquad
\begin{aligned}
A &\equiv b_2 \wedge b_3 \\
B &\equiv b_3 \wedge q - b_1 \wedge b_2 \\
C &\equiv b_1 \wedge q
\end{aligned}
$$
In practice, the discriminant is always positive inside the quad; also, only one of the two roots is
needed—which one depends on the winding of the quad (and on the coordinate system conventions).
Once you have the correct $v$, plug it into the formula for $u$ and you’re done.</p>
<h2 id="implementation"><a href="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/#implementation" title="Permalink to this section">Implementation</a></h2>
<p>Translating all this into shader code is fairly straightforward. We set up the $q, b_1, b_2, b_3$
vectors in the vertex shader, then solve for $u, v$ in the pixel shader. </p>
<p>One complication is that these vectors need to be calculated per quad, so you can’t have vertices
shared between quads. If you’re applying this to a mesh, it will need to be “unwelded” so that each
quad has distinct vertices. (You can still share vertices between the two triangles in each quad.)</p>
<p>Each vertex shader invocation also needs to know all four vertices of the quad it belongs to. To
avoid duplicating all the vertex positions many times in memory, we can use instancing: one instance
per quad, with the per-quad parameters stored in the instance vertex buffer (very similar to
rendering billboard particles).</p>
<p>Here’s what the shader might look like in pseudo-HLSL, for a 2D case where the quad is always in
the $xy$ plane:</p>
<div class="codehilite"><pre><span></span><span class="k">struct</span> <span class="n">InstData</span>
<span class="p">{</span>
<span class="kt">float2</span> <span class="n">p</span><span class="p">[</span><span class="mi">4</span><span class="p">];</span> <span class="c1">// Quad vertices</span>
<span class="p">};</span>
<span class="k">struct</span> <span class="n">V2P</span>
<span class="p">{</span>
<span class="kt">float4</span> <span class="n">pos</span> <span class="o">:</span> <span class="nd">SV_Position</span><span class="p">;</span>
<span class="kt">float2</span> <span class="n">q</span><span class="p">,</span> <span class="n">b1</span><span class="p">,</span> <span class="n">b2</span><span class="p">,</span> <span class="n">b3</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">void</span> <span class="n">Vs</span><span class="p">(</span>
<span class="k">in</span> <span class="kt">uint</span> <span class="n">iVtx</span> <span class="o">:</span> <span class="nd">SV_VertexID</span><span class="p">,</span>
<span class="k">in</span> <span class="n">InstData</span> <span class="n">inst</span><span class="p">,</span>
<span class="k">out</span> <span class="n">V2P</span> <span class="n">o</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">o</span><span class="p">.</span><span class="n">pos</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">p</span><span class="p">[</span><span class="n">iVtx</span><span class="p">];</span>
<span class="c1">// Set up inverse bilinear interpolation</span>
<span class="n">o</span><span class="p">.</span><span class="n">q</span> <span class="o">=</span> <span class="n">o</span><span class="p">.</span><span class="n">pos</span> <span class="o">-</span> <span class="n">inst</span><span class="p">.</span><span class="n">p</span><span class="p">[</span><span class="mo">0</span><span class="p">];</span>
<span class="n">o</span><span class="p">.</span><span class="n">b1</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">p</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">inst</span><span class="p">.</span><span class="n">p</span><span class="p">[</span><span class="mo">0</span><span class="p">];</span>
<span class="n">o</span><span class="p">.</span><span class="n">b2</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">p</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">-</span> <span class="n">inst</span><span class="p">.</span><span class="n">p</span><span class="p">[</span><span class="mo">0</span><span class="p">];</span>
<span class="n">o</span><span class="p">.</span><span class="n">b3</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">p</span><span class="p">[</span><span class="mo">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">inst</span><span class="p">.</span><span class="n">p</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">-</span> <span class="n">inst</span><span class="p">.</span><span class="n">p</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">+</span> <span class="n">inst</span><span class="p">.</span><span class="n">p</span><span class="p">[</span><span class="mi">3</span><span class="p">];</span>
<span class="p">}</span>
<span class="kt">float</span> <span class="n">Wedge2D</span><span class="p">(</span><span class="kt">float2</span> <span class="n">v</span><span class="p">,</span> <span class="kt">float2</span> <span class="n">w</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">v</span><span class="p">.</span><span class="n">x</span><span class="o">*</span><span class="n">w</span><span class="p">.</span><span class="n">y</span> <span class="o">-</span> <span class="n">v</span><span class="p">.</span><span class="n">y</span><span class="o">*</span><span class="n">w</span><span class="p">.</span><span class="n">x</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">Ps</span><span class="p">(</span>
<span class="k">in</span> <span class="n">V2P</span> <span class="n">i</span><span class="p">,</span>
<span class="k">out</span> <span class="kt">float4</span> <span class="n">color</span> <span class="o">:</span> <span class="nd">SV_Target</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Set up quadratic formula</span>
<span class="kt">float</span> <span class="n">A</span> <span class="o">=</span> <span class="n">Wedge2D</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">b2</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">b3</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">B</span> <span class="o">=</span> <span class="n">Wedge2D</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">b3</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">q</span><span class="p">)</span> <span class="o">-</span> <span class="n">Wedge2D</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">b1</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">b2</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">C</span> <span class="o">=</span> <span class="n">Wedge2D</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">b1</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">q</span><span class="p">);</span>
<span class="c1">// Solve for v</span>
<span class="kt">float2</span> <span class="n">uv</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">abs</span><span class="p">(</span><span class="n">A</span><span class="p">)</span> <span class="o"><</span> <span class="mf">0.001</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Linear form</span>
<span class="n">uv</span><span class="p">.</span><span class="n">y</span> <span class="o">=</span> <span class="o">-</span><span class="n">C</span><span class="o">/</span><span class="n">B</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="c1">// Quadratic form. Take positive root for CCW winding with V-up</span>
<span class="kt">float</span> <span class="n">discrim</span> <span class="o">=</span> <span class="n">B</span><span class="o">*</span><span class="n">B</span> <span class="o">-</span> <span class="mi">4</span><span class="o">*</span><span class="n">A</span><span class="o">*</span><span class="n">C</span><span class="p">;</span>
<span class="n">uv</span><span class="p">.</span><span class="n">y</span> <span class="o">=</span> <span class="mf">0.5</span> <span class="o">*</span> <span class="p">(</span><span class="o">-</span><span class="n">B</span> <span class="o">+</span> <span class="nb">sqrt</span><span class="p">(</span><span class="n">discrim</span><span class="p">))</span> <span class="o">/</span> <span class="n">A</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// Solve for u, using largest-magnitude component</span>
<span class="kt">float2</span> <span class="n">denom</span> <span class="o">=</span> <span class="n">i</span><span class="p">.</span><span class="n">b1</span> <span class="o">+</span> <span class="n">uv</span><span class="p">.</span><span class="n">y</span> <span class="o">*</span> <span class="n">i</span><span class="p">.</span><span class="n">b3</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="nb">abs</span><span class="p">(</span><span class="n">denom</span><span class="p">.</span><span class="n">x</span><span class="p">)</span> <span class="o">></span> <span class="nb">abs</span><span class="p">(</span><span class="n">denom</span><span class="p">.</span><span class="n">y</span><span class="p">))</span>
<span class="n">uv</span><span class="p">.</span><span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">q</span><span class="p">.</span><span class="n">x</span> <span class="o">-</span> <span class="n">i</span><span class="p">.</span><span class="n">b2</span><span class="p">.</span><span class="n">x</span> <span class="o">*</span> <span class="n">uv</span><span class="p">.</span><span class="n">y</span><span class="p">)</span> <span class="o">/</span> <span class="n">denom</span><span class="p">.</span><span class="n">x</span><span class="p">;</span>
<span class="k">else</span>
<span class="n">uv</span><span class="p">.</span><span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">q</span><span class="p">.</span><span class="n">y</span> <span class="o">-</span> <span class="n">i</span><span class="p">.</span><span class="n">b2</span><span class="p">.</span><span class="n">y</span> <span class="o">*</span> <span class="n">uv</span><span class="p">.</span><span class="n">y</span><span class="p">)</span> <span class="o">/</span> <span class="n">denom</span><span class="p">.</span><span class="n">y</span><span class="p">;</span>
<span class="n">color</span> <span class="o">=</span> <span class="n">tex</span><span class="p">.</span><span class="n">Sample</span><span class="p">(</span><span class="n">samp</span><span class="p">,</span> <span class="n">uv</span><span class="p">);</span>
<span class="p">}</span>
</pre></div>
<h2 id="conclusion"><a href="http://reedbeta.com/blog/quadrilateral-interpolation-part-2/#conclusion" title="Permalink to this section">Conclusion</a></h2>
<p>Bilinear interpolation solves the problem of mapping a rectangular texture to an arbitrary quad,
with a different set of trade-offs from the projective mapping we saw previously. On the plus side,
bilinear interpolation doesn’t produce as much of a faux-3D effect, and it always maintains uniform
UV spacing along the quad’s edges. On the other hand, it introduces curved diagonals, and it also
has a more complicated (and more expensive) pixel shader than projective interpolation.</p>
<p>We’ve eliminated the seam between the two triangles in a quad, but one lingering issue is the seam
between adjacent quads in a mesh. It would be nice if we could have some control over the tangents
of the UV mapping along the edge, so we could force them to match across that join. But that’s a
topic for another day! </p>A Programmer’s Introduction to Unicode
http://reedbeta.com/blog/programmers-intro-to-unicode/
http://reedbeta.com/blog/programmers-intro-to-unicode/Nathan ReedFri, 03 Mar 2017 22:56:16 -0800http://reedbeta.com/blog/programmers-intro-to-unicode/#commentsCoding<p>Ｕｎｉｃｏｄｅ! 🅤🅝🅘🅒🅞🅓🅔‽ 🇺🇳🇮🇨🇴🇩🇪! 😄 The very name strikes fear and awe into the hearts of programmers
worldwide. We all know we ought to “support Unicode” in our software (whatever that means—like
using <code>wchar_t</code> for all the strings, right?). But Unicode can be abstruse, and diving into the
thousand-page <a href="http://www.unicode.org/versions/latest/">Unicode Standard</a> plus its dozens of
supplementary <a href="http://www.unicode.org/reports/">annexes, reports</a>, and <a href="http://www.unicode.org/notes/">notes</a>
can be more than a little intimidating. I don’t blame programmers for still finding the whole thing
mysterious, even 30 years after Unicode’s inception.</p>
<p>A few months ago, I got interested in Unicode and decided to spend some time learning more about it
in detail. In this article, I’ll give an introduction to it from a programmer’s point of view.</p>
<!--more-->
<p>I’m going to focus on the character set and what’s involved in working with strings and files of Unicode text.
However, in this article I’m not going to talk about fonts, text layout/shaping/rendering, or
localization in detail—those are separate issues, beyond my scope (and knowledge) here.</p>
<div class="toc">
<ul>
<li><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#diversity-and-inherent-complexity">Diversity and Inherent Complexity</a></li>
<li><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#the-unicode-codespace">The Unicode Codespace</a><ul>
<li><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#codespace-allocation">Codespace Allocation</a></li>
<li><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#scripts">Scripts</a></li>
<li><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#usage-frequency">Usage Frequency</a></li>
</ul>
</li>
<li><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#encodings">Encodings</a><ul>
<li><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#utf-8">UTF-8</a></li>
<li><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#utf-16">UTF-16</a></li>
</ul>
</li>
<li><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#combining-marks">Combining Marks</a><ul>
<li><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#canonical-equivalence">Canonical Equivalence</a></li>
<li><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#normalization-forms">Normalization Forms</a></li>
<li><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#grapheme-clusters">Grapheme Clusters</a></li>
</ul>
</li>
<li><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#and-more">And More…</a></li>
</ul>
</div>
<h2 id="diversity-and-inherent-complexity"><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#diversity-and-inherent-complexity" title="Permalink to this section">Diversity and Inherent Complexity</a></h2>
<p>As soon as you start to study Unicode, it becomes clear that it represents a large jump in complexity
over character sets like ASCII that you may be more familiar with. It’s not just that Unicode
contains a much larger number of characters, although that’s part of it. Unicode also has a great
deal of internal structure, features, and special cases, making it much more than what one might
expect a mere “character set” to be. We’ll see some of that later in this article.</p>
<p>When confronting all this complexity, especially as an engineer, it’s hard not to find oneself asking,
“Why do we need all this? Is this really necessary? Couldn’t it be simplified?”</p>
<p>However, Unicode aims to faithfully represent the <em>entire world’s</em> writing systems. The Unicode
Consortium’s stated goal is “enabling people around the world to use computers in any language”.
And as you might imagine, the diversity of written languages is immense! To date, Unicode supports
135 different scripts, covering some 1100 languages, and there’s still a long tail of
<a href="http://linguistics.berkeley.edu/sei/">over 100 unsupported scripts</a>, both modern and historical,
which people are still working to add.</p>
<p>Given this enormous diversity, it’s inevitable that representing it is a complicated project.
Unicode embraces that diversity, and accepts the complexity inherent in its mission to include all
human writing systems. It doesn’t make a lot of trade-offs in the name of simplification, and it
makes exceptions to its own rules where necessary to further its mission.</p>
<p>Moreover, Unicode is committed not just to supporting texts in any <em>single</em> language, but also to
letting multiple languages coexist within one text—which introduces even more complexity.</p>
<p>Most programming languages have libraries available to handle the gory low-level details of text
manipulation, but as a programmer, you’ll still need to know about certain Unicode features in order
to know when and how to apply them. It may take some time to wrap your head around it all, but
don’t be discouraged—think about the billions of people for whom your software will be more
accessible through supporting text in their language. Embrace the complexity!</p>
<h2 id="the-unicode-codespace"><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#the-unicode-codespace" title="Permalink to this section">The Unicode Codespace</a></h2>
<p>Let’s start with some general orientation. The basic elements of Unicode—its “characters”, although
that term isn’t quite right—are called <em>code points</em>. Code points are identified by number,
customarily written in hexadecimal with the prefix “U+”, such as
<a href="http://unicode.org/cldr/utility/character.jsp?a=A">U+0041 “A” <span class="smallcaps">latin capital letter a</span></a> or
<a href="http://unicode.org/cldr/utility/character.jsp?a=θ">U+03B8 “θ” <span class="smallcaps">greek small letter theta</span></a>. Each
code point also has a short name, and quite a few other properties, specified in the
<a href="http://www.unicode.org/reports/tr44/">Unicode Character Database</a>.</p>
<p>The set of all possible code points is called the <em>codespace</em>. The Unicode codespace consists of
1,114,112 code points. However, only 128,237 of them—about 12% of the codespace—are actually
assigned, to date. There’s plenty of room for growth! Unicode also reserves an additional 137,468
code points as “private use” areas, which have no standardized meaning and are available for
individual applications to define for their own purposes.</p>
<h3 id="codespace-allocation"><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#codespace-allocation" title="Permalink to this section">Codespace Allocation</a></h3>
<p>To get a feel for how the codespace is laid out, it’s helpful to visualize it. Below is a map of the
entire codespace, with one pixel per code point. It’s arranged in tiles for visual coherence;
each small square is 16×16 = 256 code points, and each large square is a “plane” of 65,536 code
points. There are 17 planes altogether.</p>
<p><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/codespace-map.png"><img alt="Map of the Unicode codespace (click to zoom)" src="http://reedbeta.com/blog/programmers-intro-to-unicode/codespace-map.png" title="Map of the Unicode codespace (click to zoom)" /></a></p>
<p>White represents unassigned space. Blue is assigned code points, green is private-use areas, and
the small red area is surrogates (more about those later).
As you can see, the assigned code points are distributed somewhat sparsely, but concentrated in the
first three planes.</p>
<p>Plane 0 is also known as the “Basic Multilingual Plane”, or BMP. The BMP contains essentially all
the characters needed for modern text in any script, including Latin, Cyrillic, Greek, Han (Chinese),
Japanese, Korean, Arabic, Hebrew, Devanagari (Indian), and many more.</p>
<p>(In the past, the codespace was just the BMP and no more—Unicode was originally conceived as a
straightforward 16-bit encoding, with only 65,536 code points. It was expanded to its current size
in 1996. However, the vast majority of code points in modern text belong to the BMP.)</p>
<p>Plane 1 contains historical scripts, such as Sumerian cuneiform and Egyptian hieroglyphs, as well as
emoji and various other symbols. Plane 2 contains a large block of less-common and historical Han
characters. The remaining planes are empty, except for a small number of rarely-used formatting
characters in Plane 14; planes 15–16 are reserved entirely for private use.</p>
<h3 id="scripts"><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#scripts" title="Permalink to this section">Scripts</a></h3>
<p>Let’s zoom in on the first three planes, since that’s where the action is:</p>
<p><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/script-map.png"><img alt="Map of scripts in Unicode planes 0–2 (click to zoom)" src="http://reedbeta.com/blog/programmers-intro-to-unicode/script-map.png" title="Map of scripts in Unicode planes 0–2 (click to zoom)" /></a></p>
<p>This map color-codes the 135 different scripts in Unicode. You can see how Han
<nobr>(<span class="swatch" style="background-color:#6bd8d3"></span>)</nobr> and Korean
<nobr>(<span class="swatch" style="background-color:#ce996a"></span>)</nobr> take up
most of the range of the BMP (the left large square). By contrast, all of the European, Middle
Eastern, and South Asian scripts fit into the first row of the BMP in this diagram.</p>
<p>Many areas of the codespace are adapted or copied from earlier encodings. For
example, the first 128 code points of Unicode are just a copy of ASCII. This has clear benefits
for compatibility—it’s easy to losslessly convert texts from smaller encodings into Unicode (and
the other direction too, as long as no characters outside the smaller encoding are used).</p>
<h3 id="usage-frequency"><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#usage-frequency" title="Permalink to this section">Usage Frequency</a></h3>
<p>One more interesting way to visualize the codespace is to look at the distribution of usage—in
other words, how often each code point is actually used in real-world texts. Below
is a heat map of planes 0–2 based on a large sample of text from Wikipedia and Twitter (all
languages). Frequency increases from black (never seen) through red and yellow to white.</p>
<p><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/heatmap-wiki+tweets.png"><img alt="Heat map of code point usage frequency in Unicode planes 0–2 (click to zoom)" src="http://reedbeta.com/blog/programmers-intro-to-unicode/heatmap-wiki+tweets.png" title="Heat map of code point usage frequency in Unicode planes 0–2 (click to zoom)" /></a></p>
<p>You can see that the vast majority of this text sample lies in the BMP, with only scattered
usage of code points from planes 1–2. The biggest exception is emoji, which show up here as the
several bright squares in the bottom row of plane 1.</p>
<h2 id="encodings"><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#encodings" title="Permalink to this section">Encodings</a></h2>
<p>We’ve seen that Unicode code points are abstractly identified by their index in the codespace,
ranging from U+0000 to U+10FFFF. But how do code points get represented as bytes, in memory or in
a file?</p>
<p>The most convenient, computer-friendliest (and programmer-friendliest) thing to do would be to just
store the code point index as a 32-bit integer. This works, but it consumes 4 bytes per code point,
which is sort of a lot. Using 32-bit ints for Unicode will cost you a bunch of extra storage,
memory, and performance in bandwidth-bound scenarios, if you work with a lot of text.</p>
<p>Consequently, there are several more-compact encodings for Unicode. The 32-bit integer encoding is
officially called UTF-32 (UTF = “Unicode Transformation Format”), but it’s rarely used for storage.
At most, it comes up sometimes as a temporary internal representation, for examining or operating on
the code points in a string.</p>
<p>Much more commonly, you’ll see Unicode text encoded as either UTF-8 or UTF-16. These are both
<em>variable-length</em> encodings, made up of 8-bit or 16-bit units, respectively. In these schemes,
code points with smaller index values take up fewer bytes, which saves a lot of memory for
typical texts. The trade-off is that processing UTF-8/16 texts is more programmatically involved,
and likely slower.</p>
<h3 id="utf-8"><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#utf-8" title="Permalink to this section">UTF-8</a></h3>
<p>In UTF-8, each code point is stored using 1 to 4 bytes, based on its index value.</p>
<p>UTF-8 uses a system of binary prefixes, in which the high bits of each byte mark whether it’s a
single byte, the beginning of a multi-byte sequence, or a continuation byte; the remaining bits,
concatenated, give the code point index. This table shows how it works:</p>
<table>
<thead>
<tr>
<th>UTF-8 (binary)</td>
<th>Code point (binary)</td>
<th>Range</td>
</tr>
</thead>
<tbody>
<tr>
<td class="mono">0xxxxxxx</td>
<td class="mono">xxxxxxx</td>
<td>U+0000–U+007F</td>
</tr>
<tr>
<td class="mono">110xxxxx 10yyyyyy</td>
<td class="mono">xxxxxyyyyyy</td>
<td>U+0080–U+07FF</td>
</tr>
<tr>
<td class="mono">1110xxxx 10yyyyyy 10zzzzzz</td>
<td class="mono">xxxxyyyyyyzzzzzz</td>
<td>U+0800–U+FFFF</td>
</tr>
<tr>
<td class="mono">11110xxx 10yyyyyy 10zzzzzz 10wwwwww</td>
<td class="mono">xxxyyyyyyzzzzzzwwwwww</td>
<td>U+10000–U+10FFFF</td>
</tr>
</tbody>
</table>
<p>A handy property of UTF-8 is that code points below 128 (ASCII characters) are encoded as single
bytes, and all non-ASCII code points are encoded using sequences of bytes 128–255. This has a couple
of nice consequences. First, any strings or files out there that are already in ASCII can also be
interpreted as UTF-8 without any conversion. Second, lots of widely-used string programming
idioms—such as null termination, or delimiters (newlines, tabs, commas, slashes, etc.)—will
just work on UTF-8 strings. ASCII bytes never occur inside
the encoding of non-ASCII code points, so searching byte-wise for a null terminator or a delimiter
will do the right thing.</p>
<p>Thanks to this convenience, it’s relatively simple to extend legacy ASCII programs and APIs to handle
UTF-8 strings. UTF-8 is very widely used in the Unix/Linux and Web worlds, and many programmers
argue <a href="http://utf8everywhere.org/">UTF-8 should be the default encoding everywhere</a>.</p>
<p>However, UTF-8 isn’t a drop-in replacement for ASCII strings in all respects. For instance,
code that iterates over the “characters” in a string will need to decode UTF-8 and iterate over
code points (or maybe grapheme clusters—more about those later), not bytes. When you measure the
“length” of a string, you’ll need to think about whether you want the length in bytes, the length
in code points, the width of the text when rendered, or something else.</p>
<h3 id="utf-16"><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#utf-16" title="Permalink to this section">UTF-16</a></h3>
<p>The other encoding that you’re likely to encounter is UTF-16. It uses 16-bit words, with
each code point stored as either 1 or 2 words.</p>
<p>Like UTF-8, we can express the UTF-16 encoding rules in the form of binary prefixes:</p>
<table>
<thead>
<tr>
<th>UTF-16 (binary)</td>
<th>Code point (binary)</td>
<th>Range</td>
</tr>
</thead>
<tbody>
<tr>
<td class="mono">xxxxxxxxxxxxxxxx</td>
<td class="mono">xxxxxxxxxxxxxxxx</td>
<td>U+0000–U+FFFF</td>
</tr>
<tr>
<td class="mono">110110xxxxxxxxxx 110111yyyyyyyyyy</td>
<td class="mono">xxxxxxxxxxyyyyyyyyyy + 0x10000</td>
<td>U+10000–U+10FFFF</td>
</tr>
</tbody>
</table>
<p>A more common way that people talk about UTF-16 encoding, though, is in terms of code points called
“surrogates”. All the code points in the range U+D800–U+DFFF—or in other words, the code points
that match the binary prefixes <code>110110</code> and <code>110111</code> in the table above—are reserved specifically
for UTF-16 encoding, and don’t represent any valid characters on their own. They’re only meant
to occur in the 2-word encoding pattern above, which is called a “surrogate pair”. Surrogate code
points are illegal in any other context! They’re not allowed in UTF-8 or UTF-32 at all.</p>
<p>Historically, UTF-16 is a descendant of the original, pre-1996 versions of Unicode, in which there
were only 65,536 code points. The original intention was that there would be no different “encodings”;
Unicode was supposed to be a straightforward 16-bit character set. Later, the codespace was expanded
to make room for a long tail of less-common (but still important) Han characters, which the Unicode
designers didn’t originally plan for. Surrogates were then introduced, as—to put it bluntly—a
kludge, allowing 16-bit encodings to access the new code points.</p>
<p>Today, Javascript uses UTF-16 as its standard string representation: if you ask for the length of a
string, or iterate over it, etc., the result will be in UTF-16 words, with any
code points outside the BMP expressed as surrogate pairs. UTF-16 is also used by the Microsoft Win32 APIs;
though Win32 supports either 8-bit or 16-bit strings, the 8-bit version unaccountably
still doesn’t support UTF-8—only legacy code-page encodings, like ANSI. This leaves UTF-16 as the
only way to get proper Unicode support in Windows.</p>
<p>By the way, UTF-16’s words can be stored either little-endian or big-endian. Unicode has no opinion
on that issue, though it does encourage the convention of putting
<a href="http://unicode.org/cldr/utility/character.jsp?a=FEFF">U+FEFF <span class="smallcaps">zero width no-break space</span></a>
at the top of a UTF-16 file as a <a href="https://en.wikipedia.org/wiki/Byte_order_mark">byte-order mark</a>,
to disambiguate the endianness. (If the file doesn’t match the system’s endianness, the BOM will be
decoded as U+FFFE, which isn’t a valid code point.)</p>
<h2 id="combining-marks"><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#combining-marks" title="Permalink to this section">Combining Marks</a></h2>
<p>In the story so far, we’ve been focusing on code points. But in Unicode, a “character” can be more
complicated than just an individual code point!</p>
<p>Unicode includes a system for <em>dynamically composing</em> characters, by combining multiple code points
together. This is used in various ways to gain flexibility without causing a huge combinatorial
explosion in the number of code points.</p>
<p>In European languages, for example, this shows up in the application of diacritics to letters. Unicode supports
a wide range of diacritics, including acute and grave accents, umlauts, cedillas, and many more.
All these diacritics can be applied to any letter of any alphabet—and in fact, <em>multiple</em>
diacritics can be used on a single letter.</p>
<p>If Unicode tried to assign a distinct code point to every possible combination of letter and
diacritics, things would rapidly get out of hand. Instead, the dynamic composition system enables you to construct the
character you want, by starting with a base code point (the letter) and appending additional code
points, called “combining marks”, to specify the diacritics. When a text renderer sees a sequence
like this in a string, it automatically stacks the diacritics over or under the base
letter to create a composed character.</p>
<p>For example, the accented character “Á” can be expressed as a string of two code points:
<a href="http://unicode.org/cldr/utility/character.jsp?a=A">U+0041 “A” <span class="smallcaps">latin capital letter a</span></a>
plus <a href="http://unicode.org/cldr/utility/character.jsp?a=0301">U+0301 “◌́” <span class="smallcaps">combining acute accent</span></a>.
This string automatically gets rendered as a single character: “Á”.</p>
<p>Now, Unicode does also include many “precomposed” code points, each representing a letter with some
combination of diacritics already applied, such as <a href="http://unicode.org/cldr/utility/character.jsp?a=Á">U+00C1 “Á” <span class="smallcaps">latin capital letter a with acute</span></a>
or <a href="http://unicode.org/cldr/utility/character.jsp?a=ệ">U+1EC7 “ệ” <span class="smallcaps">latin small letter e with circumflex and dot below</span></a>.
I suspect these are mostly inherited from older encodings that were assimilated into Unicode, and
kept around for compatibility. In practice, there are precomposed code points for most of the common
letter-with-diacritic combinations in European-script languages, so they don’t use dynamic
composition that much in typical text.</p>
<p>Still, the system of combining marks does allow for an <em>arbitrary number</em> of diacritics to be
stacked on any base character. The reductio-ad-absurdum of this is <a href="https://eeemo.net/">Zalgo text</a>,
which works by ͖͟ͅr͞aṋ̫̠̖͈̗d͖̻̹óm̪͙͕̗̝ļ͇̰͓̳̫ý͓̥̟͍ ̕s̫t̫̱͕̗̰̼̘͜a̼̩͖͇̠͈̣͝c̙͍k̖̱̹͍͘i̢n̨̺̝͇͇̟͙ģ̫̮͎̻̟ͅ ̕n̼̺͈͞u̮͙m̺̭̟̗͞e̞͓̰̤͓̫r̵o̖ṷs҉̪͍̭̬̝̤ ̮͉̝̞̗̟͠d̴̟̜̱͕͚i͇̫̼̯̭̜͡ḁ͙̻̼c̲̲̹r̨̠̹̣̰̦i̱t̤̻̤͍͙̘̕i̵̜̭̤̱͎c̵s ͘o̱̲͈̙͖͇̲͢n͘ ̜͈e̬̲̠̩ac͕̺̠͉h̷̪ ̺̣͖̱ḻ̫̬̝̹ḙ̙̺͙̭͓̲t̞̞͇̲͉͍t̷͔̪͉̲̻̠͙e̦̻͈͉͇r͇̭̭̬͖,̖́ ̜͙͓̣̭s̘̘͈o̱̰̤̲ͅ ̛̬̜̙t̼̦͕̱̹͕̥h̳̲͈͝ͅa̦t̻̲ ̻̟̭̦̖t̛̰̩h̠͕̳̝̫͕e͈̤̘͖̞͘y҉̝͙ ̷͉͔̰̠o̞̰v͈͈̳̘͜er̶f̰͈͔ḻ͕̘̫̺̲o̲̭͙͠ͅw̱̳̺
͜t̸h͇̭͕̳͍e̖̯̟̠ ͍̞̜͔̩̪͜ļ͎̪̲͚i̝̲̹̙̩̹n̨̦̩̖ḙ̼̲̼͢ͅ ̬͝s̼͚̘̞͝p͙̘̻a̙c҉͉̜̤͈̯̖i̥͡n̦̠̱͟g̸̗̻̦̭̮̟ͅ ̳̪̠͖̳̯̕a̫͜n͝d͡ ̣̦̙ͅc̪̗r̴͙̮̦̹̳e͇͚̞͔̹̫͟a̙̺̙ț͔͎̘̹ͅe̥̩͍ a͖̪̜̮͙̹n̢͉̝ ͇͉͓̦̼́a̳͖̪̤̱p̖͔͔̟͇͎͠p̱͍̺ę̲͎͈̰̲̤̫a̯͜r̨̮̫̣̘a̩̯͖n̹̦̰͎̣̞̞c̨̦̱͔͎͍͖e̬͓͘ ̤̰̩͙̤̬͙o̵̼̻̬̻͇̮̪f̴ ̡̙̭͓͖̪̤“̸͙̠̼c̳̗͜o͏̼͙͔̮r̞̫̺̞̥̬ru̺̻̯͉̭̻̯p̰̥͓̣̫̙̤͢t̳͍̳̖ͅi̶͈̝͙̼̙̹o̡͔n̙̺̹̖̩͝ͅ”̨̗͖͚̩.̯͓</p>
<p>A few other places where dynamic character composition shows up in Unicode:</p>
<ul>
<li>
<p><a href="https://en.wikipedia.org/wiki/Vowel_pointing">Vowel-pointing notation</a> in Arabic and Hebrew.
In these languages, words are normally spelled with some of their vowels left out. They then have
diacritic notation to indicate the vowels (used in dictionaries, language-teaching
materials, children’s books, and such). These diacritics are expressed with combining marks.
<table class="borderless">
<tr><td>A Hebrew example, with <a href="https://en.wikipedia.org/wiki/Niqqud">niqqud</a>:</td><td>אֶת דַלְתִּי הֵזִיז הֵנִיעַ, קֶטֶב לִשְׁכַּתִּי יָשׁוֹד</td></tr>
<tr><td>Normal writing (no niqqud):</td><td>את דלתי הזיז הניע, קטב לשכתי ישוד</td></tr>
</table></p>
</li>
<li>
<p><a href="https://en.wikipedia.org/wiki/Devanagari">Devanagari</a>, the script used to write Hindi, Sanskrit,
and many other South Asian languages, expresses certain vowels as combining marks attached
to consonant letters. For example, “ह” + “ि” = “हि” (“h” + “i” = “hi”).</p>
</li>
<li>
<p>Korean characters stand for syllables, but they are composed of letters called <a href="https://en.wikipedia.org/wiki/Hangul#Letters">jamo</a>
that stand for the vowels and consonants in the syllable. While there are code points for precomposed Korean
syllables, it’s also possible to dynamically compose them by concatenating their jamo.
For example, “ᄒ” + “ᅡ” + “ᆫ” = “한” (“h” + “a” + “n” = “han”).</p>
</li>
</ul>
<h3 id="canonical-equivalence"><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#canonical-equivalence" title="Permalink to this section">Canonical Equivalence</a></h3>
<p>In Unicode, precomposed characters exist alongside the dynamic composition system. A consequence of
this is that there are multiple ways to express “the same” string—different sequences of code
points that result in the same user-perceived characters. For example, as we saw earlier, we can
express the character “Á” either as the single code point U+00C1, <em>or</em> as the string of two code
points U+0041 U+0301.</p>
<p>Another source of ambiguity is the ordering of multiple diacritics in a single character.
Diacritic order matters visually when two diacritics apply to the same side of the base character,
e.g. both above: “ǡ” (dot, then macron) is different from “ā̇” (macron, then dot). However, when
diacritics apply to different sides of the character, e.g. one above and one below, then the order
doesn’t affect rendering. Moreover, a character with multiple diacritics might have one of the
diacritics precomposed and others expressed as combining marks.</p>
<p>For example, the Vietnamese letter “ệ” can be expressed in <em>five</em> different ways:</p>
<ul>
<li>Fully precomposed: U+1EC7 “ệ”</li>
<li>Partially precomposed: U+1EB9 “ẹ” + U+0302 “◌̂”</li>
<li>Partially precomposed: U+00EA “ê” + U+0323 “◌̣”</li>
<li>Fully decomposed: U+0065 “e” + U+0323 “◌̣” + U+0302 “◌̂”</li>
<li>Fully decomposed: U+0065 “e” + U+0302 “◌̂” + U+0323 “◌̣”</li>
</ul>
<p>Unicode refers to set of strings like this as “canonically equivalent”. Canonically equivalent
strings are supposed to be treated as identical for purposes of searching, sorting, rendering,
text selection, and so on. This has implications for how you implement operations on text.
For example, if an app has a “find in file” operation and the user searches for “ệ”, it should, by
default, find occurrences of <em>any</em> of the five versions of “ệ” above!</p>
<h3 id="normalization-forms"><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#normalization-forms" title="Permalink to this section">Normalization Forms</a></h3>
<p>To address the problem of “how to handle canonically equivalent strings”, Unicode defines several
<em>normalization forms</em>: ways of converting strings into a canonical form so that they can be
compared code-point-by-code-point (or byte-by-byte).</p>
<p>The “NFD” normalization form fully <em>decomposes</em> every character down to its component base and
combining marks, taking apart any precomposed code points in the string. It also sorts the combining
marks in each character according to their rendered position, so e.g. diacritics that go below the
character come before the ones that go above the character. (It doesn’t reorder diacritics in the
same rendered position, since their order matters visually, as previously mentioned.)</p>
<p>The “NFC” form, conversely, puts things back together into precomposed code points as much as
possible. If an unusual combination of diacritics is called for, there may not be any precomposed
code point for it, in which case NFC still precomposes what it can and leaves any remaining
combining marks in place (again ordered by rendered position, as in NFD).</p>
<p>There are also forms called NFKD and NFKC. The “K” here refers to <em>compatibility</em> decompositions,
which cover characters that are “similar” in some sense but not visually identical. However, I’m not
going to cover that here.</p>
<h3 id="grapheme-clusters"><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#grapheme-clusters" title="Permalink to this section">Grapheme Clusters</a></h3>
<p>As we’ve seen, Unicode contains various cases where a thing that a user thinks of
as a single “character” might actually be made up of multiple code points under the hood. Unicode
formalizes this using the notion of a <em>grapheme cluster</em>: a string of one or more code points that
constitute a single “user-perceived character”.</p>
<p><a href="http://www.unicode.org/reports/tr29/">UAX #29</a> defines the rules for what, precisely, qualifies
as a grapheme cluster. It’s approximately “a base code point followed by any number of combining
marks”, but the actual definition is a bit more complicated; it accounts for things like Korean
jamo, and <a href="http://blog.emojipedia.org/emoji-zwj-sequences-three-letters-many-possibilities/">emoji ZWJ sequences</a>.</p>
<p>The main thing grapheme clusters are used for is text <em>editing</em>: they’re often the most sensible
unit for cursor placement and text selection boundaries. Using grapheme clusters for these purposes
ensures that you can’t accidentally chop off some diacritics when you copy-and-paste text, that
left/right arrow keys always move the cursor by one visible character, and so on.</p>
<p>Another place where grapheme clusters are useful is in enforcing a string length limit—say, on a
database field. While the true, underlying limit might be something like the byte length of the string
in UTF-8, you wouldn’t want to enforce that by just truncating bytes. At a minimum, you’d want to
“round down” to the nearest code point boundary; but even better, round down to the nearest <em>grapheme
cluster boundary</em>. Otherwise, you might be corrupting the last character by cutting off a diacritic,
or interrupting a jamo sequence or ZWJ sequence.</p>
<h2 id="and-more"><a href="http://reedbeta.com/blog/programmers-intro-to-unicode/#and-more" title="Permalink to this section">And More…</a></h2>
<p>There’s much more that could be said about Unicode from a programmer’s perspective! I haven’t gotten
into such fun topics as case mapping, collation, compatibility decompositions and confusables,
Unicode-aware regexes, or bidirectional text. Nor have I said anything yet about implementation
issues—how to efficiently store and look-up data about the sparsely-assigned code points, or how
to optimize UTF-8 decoding, string comparison, or NFC normalization. Perhaps I’ll return to some of
those things in future posts.</p>
<p>Unicode is a fascinating and complex system. It has a many-to-one mapping between bytes and
code points, and on top of that a many-to-one (or, under some circumstances, many-to-many) mapping
between code points and “characters”. It has oddball special cases in every corner. But no one ever
claimed that representing <em>all written languages</em> was going to be <em>easy</em>, and it’s clear that
we’re never going back to the bad old days of a patchwork of incompatible encodings.</p>
<p>Further reading:</p>
<ul>
<li><a href="http://www.unicode.org/versions/latest/">The Unicode Standard</a></li>
<li><a href="http://utf8everywhere.org/">UTF-8 Everywhere Manifesto</a></li>
<li><a href="https://eev.ee/blog/2015/09/12/dark-corners-of-unicode/">Dark corners of Unicode</a> by Eevee</li>
<li><a href="http://site.icu-project.org/">ICU (International Components for Unicode)</a>—C/C++/Java libraries
implementing many Unicode algorithms and related things</li>
<li><a href="https://docs.python.org/3/howto/unicode.html">Python 3 Unicode Howto</a></li>
<li><a href="https://www.google.com/get/noto/">Google Noto Fonts</a>—set of fonts intended to cover all
assigned code points</li>
</ul>