Implicit coordinate transforms are weird

There’s a wide class of coordinate transforms that are typically given backwards. Witness spherical polar coordinates:

$x = r \cos \phi \sin \theta \\ y = r \sin \phi \sin \theta \\ z = r \cos \theta \\$

Typically we already know what our cartesian coordinates $(x,y,z)$ are, and we want to express them in this fancy new coordinate system $(r,\phi,\theta)$. That is, we want a map

$\Phi : (x, y, z) \mapsto (r, \phi, \theta),$

but it looks like we’ve only been given the inverse map

$\Phi^{-1} : (r, \phi, \theta) \mapsto (x, y, z) = (r \cos \phi \sin \theta, r \sin \phi \sin \theta, r \cos \theta).$

Now, really we know how to invert these expressions. But doing calculus with inverse functions like $\tan^{-1}(y/x)$ is no fun at all, and besides we can imagine situations where no inverse exists.

What we’re interested in is what becomes of the basis vectors $(\partial_x, \partial_y, \partial_z)$ and covectors $(dx, dy, dz)$ when we change to spherical polar coordinates.

Let’s imagine that the manifold $\mathcal{M}$ is “$\mathbb{R}^3$ with cartesian lines drawn on”, and the manifold $\mathcal{N}$ is “$\mathbb{R}^3$ with spherical lines drawn on”. Obviously these are both $\mathbb{R}^3$, but our reasoning will be completely general.

Recall that a map $\Phi : \mathcal{M} \longrightarrow \mathcal{N}$ induces a ‘pullback’ $\Phi^*$ that takes functions/covectors on $\mathcal{N}$ to functions/covectors on $\mathcal{M}$; and a ‘pushforward’ $\Phi_*$ that takes curves/vectors on $\mathcal{M}$ to curves/vectors on $\mathcal{N}$. That is, the pullback $\Phi^*$ operates ‘backwards’ to the direction of the original map $\Phi$.

But this is exactly the same as saying that the pullback induced by the inverse map $\Phi^{-1}$ will operate in the expected ‘forwards’ direction. So, $\Phi^{-1*}$ takes functions/covectors on $\mathcal{M}$ to functions/covectors on $\mathcal{N}$. So, given that we only have access to $\Phi^{-1}$ right now, it looks like we can successfully work out what our covectors will look like in spherical coordinates.

Another way of phrasing this is that the exterior derivative commutes with pullbacks. Let $f$ be a function on $\mathcal{M}$ and $v$ a vector field on $\mathcal{N}$. Then

$(\Phi^{-1*} df)(v) = df(\Phi^{-1}_*v) = (\Phi^{-1}_*v)(f) \\ = v(\Phi^{-1*}f) = d(\Phi^{-1*}f)(v) \\ \Rightarrow \Phi^{-1*}df = d(\Phi^{-1*}f).$

A correct method for covectors

But now let $f$ be the coordinate function for the coordinate $x$, i.e. $f(x,y,z) = x$. Then

$\Phi^{-1*}dx = d(\Phi^{-1*}x) = d(r \cos \phi \sin \theta) \\ = \cos \phi \sin \theta dr - r \sin \phi \sin \theta d\phi - r \cos \phi \cos \theta d\theta,$

using the fact that we know $(\Phi^{-1*}f)(r,\phi,\theta) = f(\Phi^{-1}(r,\phi,\theta))$ from above, and standard facts about the exterior derivative $d$ (Leibniz rule over multiplication etc.).

Rinse and repeat for the other basis covectors:

$\Phi^{-1*}dy = \sin\phi \sin\theta dr + r \cos\phi \sin\theta d\phi + r \sin\phi \cos\theta d\theta \\ \Phi^{-1*}dz = \cos\theta dr - r \sin \theta.$

So given a covector $\eta$ in cartesian coordinates $\eta = \eta_x dx + \eta_y dy + \eta_z dz$ we now know how substitute for $(dx,dy,dz)$, writing $\eta$ in spherical coordinates.

An incorrect method for vectors

Let’s try and naively apply the calculus we already know, so try the following (for the $z$ basis vector):

$\partial_z = \frac{\partial}{\partial z} = \frac{\partial r}{\partial z} \frac{\partial}{\partial r} + \frac{\partial \theta}{\partial z} \frac{\partial}{\partial \theta} \\ = \left( \frac{\partial r cos \theta}{\partial r} \right)^{-1} \frac{\partial}{\partial r} + \left( \frac{\partial r cos \theta}{\partial \theta} \right)^{-1} \frac{\partial}{\partial \theta} \\ = \frac{1}{\cos \theta} \partial_r - \frac{1}{r \sin \theta} \partial_\theta.$

Now when we contract this with our earlier expression for $dz$, we should get

$dz(\partial z) = 1.$

$(\Phi^{-1*}dz)(\Phi_* \partial_z) = (\cos\theta dr - r\sin\theta d\theta) \left(\frac{1}{\cos \theta} \partial_r - \frac{1}{r \sin \theta} \partial_\theta\right) \\ = 2(!)$

What went wrong? We neglected to consider contributions to $\partial_z$ that might arise from other coordinate vectors being rotated into the $z$ direction due to the coordinate change (this sentence doesn’t really make sense, but then again, we’re trying to ‘explain’ a contradiction).

A correct method for vectors

Write out completely general expressions for $(\partial_x, \partial_y, \partial_z)$:

$\Phi_* \partial_x = A \partial_r + B \partial_\phi + C \partial_\theta \\ \Phi_* \partial_y = D \partial_r + E \partial_\phi + F \partial_\theta \\ \Phi_* \partial_z = G \partial_r + H \partial_\phi + I \partial_\theta$

All we know about these basis vectors is that, when contracted with the basis covectors, we should obtain the identity matrix, even when they’ve been written out in spherical coordinates:

$(\Phi^{-1*}dx^i)(\Phi_* \partial_{x^j}) = dx^i(\Phi^{-1}_* \Phi_* \partial_{x^j}) \\ = dx^i(\mathrm{id}_* \partial_{x^j}) = dx^i(\partial_{x^j}) = \delta^i_j.$

($\mathrm{id}$ is just the identity map)

So we repeatedly apply this property to the expression above, essentially inverting the 3-by-3 matrix that has components $A, B, \ldots$.

For example, for $\partial_z$ we get

$\Phi_*(\partial_z) = \cos \theta \partial_r - \frac{\sin \theta}{r} \partial_\theta,$

which gives the correct result when contracted with $dz$.

Conclusion

The essential difference between vectors and covectors is that, under maps, one of them moves one way and the other one moves the other way. Hopefully the little parable in this blogpost has illustrated this fact.

When you have a metric you can talk about them having indices in different places, but that allows you to forget about the difference between them altogether! The interesting differences between vectors and covectors come into play when:

• You don’t necessarily know what the metric is.
• You’re using maps between manifolds/coordinate systems whose inverses don’t necessarily exist (for example, the projection onto a submanifold has no inverse).

The fact that the exterior derivative commutes with pullbacks also explains why it’s covectors that show up in integrals, thanks to the ‘change of variables’ formula

$\int_V \eta = \int_{\Phi(V)} \Phi^{*-1}(\eta).$

It also explains why it’s so easy to find the form of the metric in new coordinates, because the metric is a rank (0,2)-tensor, i.e. a sum of pairs of covectors, tensor-producted together:

$g = g_{ij} dx^i \otimes dx^j,$

and we can just substitute for $dx^i$ in the new coordinates and we’re done!

What’s the deal with tautological 1-forms?

Epistemic status: All pretty standard derivations, except the last section on mechanics which is a bit hand-wavy.

When formulating mechanics on cotangent bundles, one comes across an object called the ‘tautological 1-form’ (often denoted $\theta$) which is supposedly key to the whole process. Here I will attempt to describe what this 1-form is, why it is useful, and the role it plays in the fundamentals of classical mechanics.

Pullbacks and Pushforwards

First a word about smooth maps between manifolds, and the operations derived from them. Let $\mathcal{M}$ and $\mathcal{N}$ be smooth manifolds, and $\Phi : \mathcal{M} \longrightarrow \mathcal{N}$ be a smooth map, not necessarily invertible. Furthermore, let $f : \mathcal{N} \longrightarrow \mathbb{R}$ be a smooth function, let $X : \mathcal{M} \longrightarrow \mathcal{TM}$ be a vector field on $\mathcal{M}$, and let $\eta : \mathcal{N} \longrightarrow \mathcal{T^*N}$ be a covector field on $\mathcal{N}$.

We can use $\Phi$ to ‘pullback’ functions on $\mathcal{N}$ into functions on $\mathcal{M}$, like so:

$\Phi f : \mathcal{M} \longrightarrow \mathbb{R}, \:\:\:\:\:\:\:\: (\Phi f)(x) = f(\Phi(x)),$

and taking advantage of that, we now have a way to ‘pushforward’ vector fields on $\mathcal{M}$ into vector fields on $\mathcal{N}$:

$\Phi_* X : \mathcal{N} \longrightarrow \mathcal{TN}, \:\:\:\:\:\:\:\: \left.(\Phi_* X)(f)\right|_{\Phi(p)} = \left.X(\Phi f)\right|_{p}$

which then also gives a way to ‘pullback’ covector fields on $\mathcal{N}$ into covector fields on $\mathcal{M}$:

$\Phi^*\eta : \mathcal{M} \longrightarrow \mathcal{T^*M}, \:\:\:\:\:\:\:\: \left.(\Phi^*\eta)(X)\right|_{p} = \left.\eta(\Phi_* X)\right|_{\Phi(p)}.$

I have written a bunch of vertical “evaluate-here” bars for clarification. It is common to be rather casual about the difference between a vector (lives in $\mathcal{TM}$) and a vector field (a function $\mathcal{M} \longrightarrow \mathcal{TM}$), and similarly for covectors. Typically, the various kinds of product are evaluated pointwise, e.g. if $f, g, h$ are functions then $\left.fgh\right|_x = f(x)g(x)h(x)$.

The tautological 1-form itself

Now let $\mathcal{Q}$ be a smooth manifold, and specialise the above discussion to the case $\mathcal{N} = \mathcal{Q}$, $\mathcal{M} = \mathcal{T^*Q}$. Let $q$ be coordinates on $\mathcal{Q}$, and $(p,q)$ be coordinates on $\mathcal{T^*Q}$; that is, points on $\mathcal{T^*Q}$ are 1-forms associated to a particular point in $\mathcal{Q}$: $(p, q) \equiv \left.p_i dq^i\right|_q$. Having these two equivalent ways to look at points on a cotangent bundle is an important point which we shall return to later.

For the current purpose we will study the map

$\pi : \mathcal{T^*Q} \longrightarrow \mathcal{Q}, \:\:\:\:\:\:\:\: \pi(p, q) = q,$

that is, simply the projection map from $\mathcal{T^*Q}$ ‘down’ to $\mathcal{Q}$ – it just tells us the point on $\mathcal{Q}$ that the covector was living at.

Now for a mystical statement: the tautological 1-form is both the pullback $\pi^*$ interpreted as a 1-form, and also has the coordinate expression $\theta = p_i dq^i$. How on earth can both these things be true, and besides, how can one ‘interpret a pullback as a 1-form’?!

That last claim is actually not too bad: a map $\mathcal{T^*Q} \longrightarrow \mathcal{Q}$ induces a pullback $\mathcal{T^*Q} \longrightarrow \mathcal{T^*T^*Q}$, but this map has exactly the domain and range of a covector field on $\mathcal{T^*Q}$! Of course, this requires swapping between the perspectives of $p$ as a coordinate on $\mathcal{T^*Q}$ and $p$ as a 1-form in its own right.

We can use $(\eta,p,q)$ for coordinates on $\mathcal{T^*T^*Q}$, equivalently writing $\eta = \eta^i dp_i + \eta_i dq^i$ where in a slight abuse of notation we’ve written $\eta^i$ for the first $n$ coordinates (the $dp$ components) and $\eta_i$ for the second $n$ coordinates (the $dq$ components).

Now to investigate $\pi$ and its induced pullbacks and pushforwards. Let $f : \mathcal{Q} \longrightarrow \mathbb{R}$ be a function on $\mathcal{Q}$, and let $X : \mathcal{T^*Q} \longrightarrow \mathcal{TT^*Q}$ be a vector field on $\mathcal{T^*Q}$, which in coordinates (using the same abuse of notation as before) we will write $X = X_i \partial_{p_i} + X^i \partial_{q^i}$.

Recalling $\pi(p,q) = q$, the pushforward of $X$ under $\pi$ is then

$\left.(\pi_* X)(f)\right|_{\pi(p,q)} = \left.X(f(\pi(p,q)))\right|_{(p,q)} \\ = \underbrace{\left.X_i \partial_{p_i}(f(q))\right|_{(p,q)}}_{=0} + \left.X^i \partial_{q_i}(f(q))\right|_{(p,q)},$

so the coordinates of our pushed-forward vector field on $\mathcal{TQ}$ are $(\pi_* X)^i = X^i$, i.e. just the $q$ components of $X$.

Now we can look at how $\pi^*$ acts on covector fields $\eta : \mathcal{Q} \longrightarrow \mathcal{T^*Q}$ (in coordinates $\eta = \eta_i dq^i$):

$\left.(\pi^* \eta)(X)\right|_{(p,q)} = \left.\eta(\pi_* X)\right|_{\pi(p,q)} \\ = \left.\eta_i dq^i\right|_q \left(\left.X^j \partial_{q^j}\right|_{(p,q)}\right) = \left.\eta_i\right|_q\left.X^i\right|_{(p,q)}.$

This means that the action of $\pi_*$ is basically to place $\eta$ straight into $\mathcal{T^*T^*Q}$ unchanged, with all $dp$ components set to zero:

$\pi^* : \mathcal{T^*Q} \longrightarrow \mathcal{T^*T^*Q}, \\ \pi^* : (p,q) \longmapsto (p, p, q), \\ \left.\pi^*\eta\right|_{(p,q)} = \left.\eta_i\right|_q \left.dq^i\right|_{(p,q)}.$

And this is the source of the coordinate expression for $\pi^*$ that I quoted above – it means that if we take $\pi^*$ to be a covector field on $\mathcal{T^*Q}$, denoted $\theta$, then

$\left.\theta\right|_{(p,q)} = \left.p_i\right|_q \left.dq^i\right|_{(p,q)},$

or $\theta = p_i dq^i$ for short. If you like you can think of the action of $\pi^*$ as stripping off the $dq$ components that belong to $\mathcal{T^*Q}$ and replacing them with $dq$ components that belong to $\mathcal{T^*T^*Q}$.

How does it ‘cancel’ pullbacks?

Now look at a general covector field $\alpha : \mathcal{Q} \longrightarrow \mathcal{T^*Q}$. Treating $\alpha$ as a map (a similar trick to above) means it induces a pushforward $\alpha_* : \mathcal{TQ} \longrightarrow \mathcal{TT^*Q}$ and a pullback $\alpha^* : \mathcal{T^*T^*Q} \longrightarrow \mathcal{T^*Q}$.

Let $F : \mathcal{T^*Q} \longrightarrow \mathbb{R}$ be a function, and $Y : \mathcal{Q} \longrightarrow \mathcal{TQ}$ a vector field with coordinate expression $Y = Y^i \partial_{q^i}$, which is pushed-forward like so:

$(\alpha_* Y)(F) = Y(\alpha F) = Y^i \partial_{q^i} (F(\alpha, q)) \\ = \underbrace{Y^i \frac{\partial F}{\partial p_j} \left. \frac{\partial p_j}{\partial q^i} \right|_{p = \alpha}}_{=0} + Y^i \frac{\partial F}{\partial q_j} \left. \frac{\partial q_j}{\partial q^i} \right|_{p = \alpha} = \left. Y^i \right|_q \left. \partial_{q^i} F(p,q) \right|_{(\alpha, q)}.$

We can use this to find how $\alpha^*$ acts on covector fields $\beta : \mathcal{T^*Q} \longrightarrow \mathcal{T^*T^*Q}$:

$\left. (\alpha^* \beta)(Y) \right|_q = \left. \beta(\alpha_* Y) \right|_{(\alpha,q)} = \left. \beta_i \right|_{(\alpha,q)} \left. Y^i \right|_q.$

But what if we specialise to $\beta = \theta = p_i dq^i$, the interesting 1-form we were looking at above? We get

$\left. \alpha^* \theta \right|_q = \left. p_i dq^i \right|_{(\alpha,q)} = \left. \alpha_i dq^i \right|_q = \left. \alpha \right|_q,$

which is exactly the 1-form $\alpha$ that we started off with! This is the reason that $\theta$ is said to ‘cancel’ a pullback, as it gives us back the 1-form that we were using to create the pullback in the first place.

The basics of mechanics

How does this all link into physics? For mechanics you need a symplectic manifold $\mathcal{M}$ along with an 2-form called $\omega$ that satisfies various properties; notably that $d\omega = 0$, so that at least locally we can find an $\alpha$ such that $d\alpha = \omega$. Abstractly, we want to find paths $\Gamma : \mathbb{R} \longrightarrow \mathcal{M}$ such that the action integral $I$ is minimised:

$I(\Gamma) \equiv \int_\Gamma \alpha.$

Now, there are various ways to come up with symplectic manifolds, but the relevant one for physicists is ‘phase space’, i.e. $\mathcal{M} = \mathcal{T^*Q}$, the cotangent bundle of some manifold $\mathcal{Q}$ (where we think of $\mathcal{Q}$ as being the ‘real’ physical space that we see around us, perhaps 4D space-time or similar). And it turns out that the logical choice of $\omega$ is to take $\omega \equiv -d\theta$ (the minus sign being a mere convention) where $\theta = p_i dq^i$. Thanks to the discussion above we now know exactly what this object is (spoiler: it’s the tautological 1-form!).

Traditionally a physicist would have something called an action functional $S$ that takes curves $\gamma : \mathbb{R} \longrightarrow \mathcal{Q}$ and gives a real number, and they would then find the curve $\gamma$ that minimises that number ($S$ is called a functional because it also depends on the first derivatives of $\gamma$). By parametrising $\gamma$ by the time coordinate the normal Euler-Lagrange equations are derived. However, we wish to stay agnostic about which coordinate represents time! So we will keep our paths parametrised by arc length, i.e. $\gamma : [0,1] \longrightarrow \mathcal{Q}$.

Let $\gamma_q$ be a class of curves in $\mathcal{Q}$ parametrised by arc length with fixed starting point $\gamma_q(0) = q_0$ and ending point $\gamma_q(1) = q$. Now define $W(q) \equiv S[\gamma_q] + W_0$ for some constant $W_0 \equiv W(q_0)$, where $\gamma_q$ is defined to be the curve with endpoints $(q_0,q)$ that minimises $S[\gamma_q]$. This function $W : \mathcal{Q} \longrightarrow \mathbb{R}$ is called Hamilton’s principal function, and note that it depends only on positions, and not the momenta! We now calculate

$S[\gamma_q] = W(q) - W(q_0) \\ \:\:\:\: = \int_{\partial \gamma_q} W \\ \ \:\:\:\: = \int_{\gamma_q} dW \:\:\:\:\:\:\:\: (1) \\ \ \:\:\:\: = \int_{\gamma_q} (dW)^* \theta \:\:\:\:\:\:\:\: (2) \\ \ \:\:\:\: = \int_{dW(\gamma_q)} \theta \:\:\:\:\:\:\:\: (3) \\ \ \:\:\:\: = \int_\Gamma \theta.$

The numbering refers to the following results:

1. Generalised Stokes’ theorem. Here the ‘boundary’ of $\gamma_q$ is just its two endpoints.
2. The ‘cancelling’ property described above: $\eta^*(\theta) = \eta$
3. A standard property of integrals of pullbacks: $\int_U \Phi^*(\eta) = \int_{\Phi(U)} \eta$

So we see that the process of minimising $S$ is just a special case of the general theory of minimisation problems on symplectic manifolds. To lift the path $\gamma$ from $\mathcal{Q}$ to the symplectic phase space $\mathcal{T^*Q}$, we used the 1-form $dW$ as a pullback, similarly to the trick we pulled above with $\pi^*$. That means that the momenta along $\gamma$ are

$p_i = \frac{\partial W}{\partial q^i}.$

Ignoring any details of the expression for the action $S$, how do we derive a more familiar set of differential equations that determine $\gamma$? We can pick out one of the coordinates, say $q^0$, and call it time $t$, and similarly one of the momenta, say $p_0$, and call it energy $-H$ (again, minus sign by convention), so that

$H = -\frac{\partial W}{\partial t}.$

Our aim is to eliminate $W$ in favour of the other coordinates (we will still write $q^i$ and $p_i$ for the other (n-1) coordinates).

Recall that $d^2W = 0$ by the definition of the exterior derivative, so we can immediately write down the first of Hamilton’s equations:

$\frac{\partial}{\partial t}\left(\frac{\partial W}{\partial q^i}\right) = \frac{\partial}{\partial q^i}\left(\frac{\partial W}{\partial t}\right) \\ \frac{\partial p_i}{\partial t} = -\frac{\partial H}{\partial q^i}.$

And since subtracting a total differential from $dW$ retains the $d^2 = 0$ property, we can define a quantity that gives us the second of Hamilton’s equations:

$dA \equiv dW - d(p_iq^i) \\ \frac{\partial}{\partial t}\left(\frac{\partial A}{\partial p_i}\right) = \frac{\partial}{\partial p_i}\left(\frac{\partial A}{\partial t}\right) \\ \frac{\partial q_i}{\partial t} = \frac{\partial H}{\partial p_i}.$

Note that we now have explicitly

$\theta = p_i dq^i - Hdt \\ \omega = dq^i \wedge dp_i - dt \wedge dH$

for our symplectic structure, and we ended up with the familiar Hamilton’s equations

$\frac{\partial p_i}{\partial t} = -\frac{\partial H}{\partial q^i} \\ \frac{\partial q_i}{\partial t} = \frac{\partial H}{\partial p_i}.$

And so, as if by magic, we’ve recovered the traditional formalism of Hamiltonian mechanics as a special case of minimisation procedures on symplectic manifolds.

Of course, we didn’t have to use the 0th coordinate to represent time/energy. Really, time and position are distinguished from each other by the form of the Lorentzian metric, which has not yet entered into our method. It’s true that non-relativistic mechanics will inevitably privilege a time variable; but, the action for a free relativistic point particle is nicely Lorentz-invariant:

$S[\gamma] = \int_\gamma \left.g(X,X)\right|_{\gamma(s)}ds$

where $s$ is arc length, $g$ the metric, and $X$ the 4-vector tangent to $\gamma$.

This blog post was inspired by

• John Baez’s two posts on parallels between thermodynamics and mechanics
• The fact that the Wikipedia page on the tautological 1-form is so abstruse

You may also be interested in

• A 1-page summary of ‘Abstract Hamiltonian mehanics’, which describes an approach which is agnostic about time coordinates (it does not discuss minimisation procedures though)
• An article on how to do geometric Hamilton-Jacobi mechanics properly. It’s possible to derive a single nonlinear differential equation for $W$ (called the ‘Hamilton-Jacobi equation’) which depends explicitly on the time coordinate and the Hamiltonian $H$. The associated time-agnostic (‘non-autonomous’) method is fairly difficult, and this article discusses all the details.

Volume forms on the mass-shell

The setting for dynamics is the cotangent bundle $\mathcal{T}^*\mathcal{M}$ of a manifold $\mathcal{M}$ with pseudo-Riemannian metric $g_{\mu\nu}$; relevant observables can be functions of both position and momentum. For example, the distribution function $f(x,p)$, which is the number density of particles in phase space ($x^\mu$ and $p^\mu$ are coordinates on $\mathcal{M}$ and $\mathcal{T}^*_x\mathcal{M}$ respectively).

The spatial volume form (integral measure) is $\mathrm{vol}(\mathcal{N}) \equiv \sqrt{g(x)} dx^1 \wedge \ldots \wedge dx^n$, where $g(x)$ is the determinant of the metric evaluated at $x \in M$; and the volume form for the full phase space is $dx^1 \wedge \ldots \wedge dx^n \wedge dp_1 \wedge \ldots \wedge dp_n$. If we want to integrate out the momentum-dependence of some observable, we need just the momentum-part of the volume form. From the two expressions above we can see that this is $\mathrm{vol}(\mathcal{T}^*_x\mathcal{M}) = \frac{1}{\sqrt{g(x)}} dp^1 \wedge \ldots \wedge dp^n$.

However, paths that obey the classical equations of motion are constrained to lie on the mass-shell: in flat Lorentzian spacetime this is a hyperboloid in the momentum cotangent space $\left(\mathbb{R}^{1,3}\right)^*$, given by $-p_0^2 + p_i^2 = -m^2$, where $p_0$ is the energy, $p_i$ the spatial 3-momentum and m the rest mass of a particular particle. However, in fully general curved spacetime the corresponding condition gives a hypersurface $\mathcal{N} \equiv \{(x,p) \in \mathcal{T}^*\mathcal{M} \: | \: p^\mu p_\mu = -m^2 \}$.

Because the permitted region of phase space has been restricted, we can eliminate one momentum component from the integral, treating it as a function of the other coordinates. Conventionally we pick $p_0$ to be this unwanted component, writing $p_0 = p_0(p_i)$, and note that the equation for $\mathcal{N}$ can be written $p_\mu p^\mu = p_0 p^0 + p_i p^i = -m^2$, which allows us to solve for $p_0$.

We then find the volume form induced on $\mathcal{N}$. Let $n$ be the unit vector field normal to $\mathcal{N}$:

$n = \left.\frac{d(p_\mu p^\mu)}{\left\|d(p_\mu p^\mu)\right\|}\right|_\mathcal{N}$

The denominator works out to be $-4m^2$. Noting that the metric is a function of $x$ alone, we perform the derivative on the numerator and so find that

$n = \frac{2p_\mu dp^\mu}{-4m^2} = \frac{-p_\mu dp^\mu}{2m^2}$

(the normal index convention for vectors and covectors is swapped round, as we are working on $\mathcal{T}^*\mathcal{M}$, so the coordinates $p_\mu$ are covariant to begin with).

In general, the volume form induced from a manifold $\mathcal{M}$ onto a submanifold $\mathcal{N}$ with normal VF $n$ is

$\mathrm{vol}(\mathcal{N}) = \left. n \:\lrcorner\: \mathrm{vol}(\mathcal{M}) \right|_\mathcal{N}.$

For the present purposes we therefore have

$\sqrt{g(x)} \mathrm{vol}(\mathcal{N}) = n \:\lrcorner\: \left( dp_0 \wedge dp_1 \wedge dp_2 \wedge dp_3 \right) \\ \:\: = (n \:\lrcorner\: dp_0) dp_1 \wedge dp_2 \wedge dp_3 - (n \:\lrcorner\: dp_1) dp_0 \wedge dp_2 \wedge dp_3 \\ \:\:\:\:\:\: + (n \:\lrcorner\: dp_2) dp_0 \wedge dp_1 \wedge dp_3 - (n \:\lrcorner\: dp_3) dp_0 \wedge dp_1 \wedge dp_2$

We differentiate the condition $p^\mu p_\mu = -m^2$ with respect to a spatial component $p_i$ and rearrange, giving (note the positions of the indices):

$\frac{\partial p_0}{\partial p_i} = \frac{-p^i}{p^0} \\ dp_0 = \frac{\partial p_0}{\partial p_i}dp_i = \frac{-p^i}{p^0}dp_i.$

Expressions of the form $dp_0 \wedge dp_i \wedge dp_j$ can be simplified to involve only the $k$-component of $dp_0$ ( $i \neq k \neq j$ ), as repetition of a form in a wedge product sets the entire expression to zero. We also have $n \:\lrcorner\: dp_\mu = n_\mu = \frac{-p_\mu}{2m^2}$.

So, putting it all together,

$\sqrt{g(x)} \mathrm{vol}(\mathcal{N}) = \\ \:\: \left( \frac{-p_0}{2m^2} \right) dp_1 \wedge dp_2 \wedge dp_3 - \left( \frac{-p_1}{2m^2} \frac{-p^1}{p^0} \right) dp_1 \wedge dp_2 \wedge dp_3 \\ \:\:\:\:\:\: + \left( \frac{-p_2}{2m^2} \frac{-p^2}{p^0} \right) dp_2 \wedge dp_1 \wedge dp_3 - \left( \frac{-p_3}{2m^2} \frac{-p^3}{p^0} \right) dp_3 \wedge dp_1 \wedge dp_2 \\ \:\: = \frac{-1}{2 m^2 p^0}\left(p_0p^0 + p_ip^i\right) dp_1 \wedge dp_2 \wedge dp_3 \\ \:\: = \frac{-(-m^2)}{2 m^2 p^0} dp_1 \wedge dp_2 \wedge dp_3$

And so the final result is

$\mathrm{vol}(\mathcal{N}) = \frac{dp_1 \wedge dp_2 \wedge dp_3}{2 p^0 \sqrt{g(x)}} \\$

which has the expected form. Integrals over momentum space therefore look like

$\int_\mathcal{N} f(p_i) \frac{dp_1 \wedge dp_2 \wedge dp_3}{2 E(p_i) \sqrt{g(x)}}.$

Everything we wrote down was manifestly covariant, so this volume form transforms in the correct way under general coordinate transformations. The rest mass $m$ does not appear in the final volume form, so we are free to set $m = 0$ if we choose, as is the case with photons.

Let’s say you have an array $P$, containing $N$ (~millions) points $P_i \in \mathbb{R}^3$. Perhaps it’s the output of an n-body simulation or something more complicated. Anyway, suppose you also have several other arrays of size $N$, each listing some quantity that is associated with each point $P_i$ – for example $M_i$, the mass at each point.

The task: find some small contiguous region $\Omega \subset \mathbb{R}^3$, and calculate some function at every point within it. For example, approximate the integral of the density by summing over the mass at each point:

$\displaystyle \int_{\Omega} \rho(x) d^3x \approx \sum_{P_i \in \Omega} M_i$

Let us introduce an operation $A \star B = C$, where $|A| = |B|$, $B$ contains only zeroes or ones, and $C$ contains $A_i$ iff $B_i = 1$; also $|C| \leq |A|$, but $C$ maintains the order of elements from $A$. Assume that this operation can be calculated efficiently (in parallel), as opposed to sequentially traversing the elements of an array, which is much slower (this is the case in the IDL programming language, for example).

The solution to the problem at hand is to produce a mask $S$ containing $N$ elements with values the characteristic function $\chi_\Omega$ at each point – that is, $S_i = \chi_\Omega(P_i) = 1$ if $P_i \in \Omega$ and $0$ otherwise. Then the final calculation is

$\displaystyle \sum_{P_i \in \Omega} M_i = \sum_i \left(M \star S\right)_i$

and the advantage is that we can reuse the mask $S$ for additional calculations.

Now suppose that $\Omega$ is itself a very large set, and that we have another, smaller region $\Omega' \subset \Omega$ on which we would also like to perform calculations. Let $P' = P \star S$ and $S' = \chi_{\Omega'}(P')$. Then we can compose our masks to find the $M$ values that lie within $\Omega'$.

$\displaystyle \sum_{P_i \in \Omega'} M_i = \sum_i \left((M \star S) \star S'\right)_i.$

What are the $P$-indices of points that are in the region $\Omega'$? To find the answer we do:

$\displaystyle A = [0,1,2,\ldots,|S|] \\ B = (A \star S) \star S'$

and $B$ now contains the desired $P$-indices.

In the IDL programming language, $A = B \star C$ is written as

A = B[where(C)]


A more general (and also efficient) way of dealing with large, unstructured lists of coordinates is to use an oct-tree.

From notated music to audible sounds

This is the second post in a series devoted to music from a mathematical point of view. The first post dealt with written intervals and notes; the moral of that post was that there is some structure (a vector space) hidden inside the way we talk about intervals and notes, which we can (and should) take advantage of.

In this post, I will make the transition from notated music to audible noises, still in a way that is aimed at my hypothetical musically-ignorant mathematician.

Revision of previous ideas

Notated intervals form a two-dimensional vector space. Pitches form a two-dimensional affine space, with intervals as the ‘difference’ vectors. See the previous post for details.

I take audible sounds to be the space of frequencies as measured in units of hertz (cycles per second). However, what we’re really interested in are the ratios between these frequencies. The absolute values only come into play when we choose an arbitrary reference point off which to base all our absolute pitches. (Choosing a reference point is different from using a non-standard tuning system – you can have equal temperament, but at Baroque pitch (A = 415), for example).

Pitch ratios are, of course, combined by multiplication, but we can still write the operation as addition provided we understand that they are being `added’ in log-space:

$f_1 = f_2f_3 \:\:\: \Longleftrightarrow \:\:\: \log{f_1} = \log{f_2} + \log{f_3}.$

In practice, these interval ratios will always be formed by taking rational numbers to rational powers.

Constraints on a tuning system

Different musical instruments are suited to different methods of tuning notes. For example, the human voice can trivially produce pitches at any frequency in a certain range – and the same for string instruments. Wind instruments have a fixed number of ‘holes’, plus some standardised ways of shifting the basic pitches around. Brass instruments are even more restricted, and the notes they can play are closely related to the harmonic series.

Keyboard instruments are a somewhat different beast – in theory you could associate a button/key with any note imaginable, but due to practical limitations a one-dimensional array of keys is used. This obviously causes issues when we try to match up a notation system based on a two-dimensional system of intervals to the keys available. Therefore we’ll need to come up with some way of reducing (“projecting”) our two dimensions down to a single dimension. This is the Fundamental Keyboard Problem.

Intervals with rational coefficients

When defining a tuning system, what is typically given are particular ratios for certain intervals. Suppose we have a tuning system $t : \mathcal{I} \longrightarrow \mathbb{R}$, i.e. a map that takes intervals to pitch ratios. We fix two intervals, $t(i_1) = f_1$ and $t(i_2) = f_2$. Assuming it is not the case that $i_1 \propto i_2$, these two intervals span $\mathcal{I}$, so $t(i)$ is now fixed for all $i \in \mathcal{I}$. This is because any $i \in \mathcal{I}$ can be written in the $i_1, i_2$ basis,

$i = \alpha\cdot i_1 + \beta\cdot i_2$

and hence

$t(i) = \alpha\cdot t(i_1) + \beta\cdot t(i_2) \equiv f_1^\alpha f_2^\beta$

Many well-known tuning systems can be specified this way. They are called syntonic tuning systems, or rank-2 tuning systems. However, in practice there is only one interval ratio that is free to be specified arbitrarily, because the other fixed interval is always $t(\mathsf{P8}) = 2$, otherwise octaves aren’t pure!

This gives rise to the main problem: two non-octave intervals can’t be simultaneously pure. This is distinct from the problem of designing keyboard instruments. The diatonic scale of Ptolemy specifies pure intervals for all eight steps of the major scale:

degree ratio
P1 1
M2 9/8
M3 5/4
P4 4/3
P5 3/2
M6 5/3
M7 15/8
P8 2

(There exist numerous slight variations of the Ptolemaic scale, as well as the minor scale etc.)

With a syntonic temperament, we can only get a few of these ‘correct’, unless we happen to get lucky with our ratios. P1 and P8 are correct by definition; then $i$ (e.g. P5) can be specified freely; then, perhaps $\mathsf{P8} - i$ (e.g. P4) will come out correct too. After that you’re out of luck.

Syntonic tuning systems

In Pythagorean tuning, the given intervals are $3/2$ for the perfect fifth, and $2$ for the octave. As indicated above, this completely specifies the tuning. The procedure for general intervals is then as follows:

• Define a map $t : \mathcal{I} \longrightarrow \mathbb{R}$ that takes intervals to pitch ratios, and define it for the two chosen basis intervals, e.g.

$t(\mathsf{P5}) = \frac{3}{2}\\ t(\mathsf{P8}) = 2$

• Write your chosen interval in terms of the new basis and calculate the appropriate ratio, e.g.

$\mathsf{M6} = 3\cdot \mathsf{P5} - 1\cdot\mathsf{P8} \\ t(\mathsf{M6}) = \left(\frac{3}{2}\right)^3 \left(2\right)^{-1} = \frac{27}{16}$

• Then, for notes, define a new map $T : \mathcal{P} \longrightarrow \mathbb{R}$
• Fix the origin under $T$, i.e. $T(p_0) = f_0$ for some note $p_0$ and pitch $f_0$; the common choice is $p_0 = \mathsf{A}$, and $f_0 = 440\:\mathrm{Hz}$
• Extend $T$ to all notes by

$T(p) = t(p - p_0)\times T(p_0)$

For example,

$T(\mathsf{F\sharp}) = t(\mathsf{M6}) \times T(\mathsf{A}) = \frac{27}{16} \times 440\:\mathrm{Hz} = 742.5\:\mathrm{Hz}$

Here is a table of some common syntonic tuning systems, in each case assuming that the second constrained interval is $\mathsf{P8} \longrightarrow 2$:

Tuning system Fixed interval
Pythagorean $\mathsf{P5} \longrightarrow \frac{2}{3}$
Quarter-comma meantone $\mathsf{M3} \longrightarrow \frac{5}{4}$
Sixth-comma meantone $\mathsf{A4} \longrightarrow \frac{45}{32}$
Third-comma meantone $\mathsf{m3} \longrightarrow \frac{6}{5}$
Schismatic $8\cdot\mathsf{P4} \longrightarrow 10$

Note that we quickly enter the realm of irrational numbers: for example, under quarter-comma meantone, $\mathsf{P5} \longrightarrow \left(\frac{5}{4}\right)^\frac{1}{4}\left(2\right)^\frac{1}{2} \approx 1.495$.

You can immediately see that different tuning systems give different trades-off: quarter-comma meantone provides you with sweet-sounding (and narrow) major thirds, while abandoning the pure fifths of Pythagorean tuning.

There is a link here between theory and practice: in Medieval music, for which Pythagorean tuning was used, phrase-endings rarely feature major thirds – normally open fifths and octaves are the only intervals considered ‘pure’ enough to end a phrase. In Renaissance and Baroque music, major thirds are used much more often, and this coincides with the use of quarter-comma meantone tuning.

Keyboard instruments with syntonic temperaments

Let us design a keyboard that will use notes from a syntonic temperament $t : \mathcal{I} \longrightarrow \mathbb{R}$ (with fixed interval $i$, origin note $b$, and note-mapping $T : \mathcal{P} \longrightarrow \mathbb{R}$ ); we know that octaves will be pure, so we make our one-dimensional keyboard periodic at the octave, and then place $n$ keys in each octave. Each key (attached to a physical string or pipe) will be tuned to some definite frequency $f \in \{T(p) \: | \: p \in \mathcal{P} \}$.

Now we’ll attempt to distribute notes from our temperament to the physical keys on the keyboard. Starting at note $b$ (with frequency $T(b)$ ), assign the notes $(b\pm k\cdot i) \: \mathrm{mod} \: \mathsf{P8}$ to their keys (with frequencies $T\left((b\pm k\cdot i) \: \mathrm{mod} \: \mathsf{P8}\right)$ ), ending at $k = n$ ($\pm 1$ depending on whether $n$ is odd or even). Unfortunately in general the cycle is not closed, as $\left(n\cdot i \: \mathrm{mod} \: \mathsf{P8}\right) \neq \mathsf{P1}$. This is called a wolf interval, and its existence limits the usefulness of syntonic tuning systems for keyboards.

To minimise disruption, the wolf interval is normally chosen to be one that is little used if playing in keys with a low number of sharps and flats; for example $\mathsf{G\sharp} - \mathsf{E\flat} = \mathsf{A3}$: under Pythagorean tuning, the A3 is about $1.35$, to contrast with the pure P4 which is exactly $\frac{4}{3}$.

Keyboard instruments with equal temperaments

Returning to the Fundamental Keyboard Problem, we see that the solution is to project the two dimensions of notated intervals down to a one-dimensional subspace. This necessarily involves one interval being set to zero (or to one, multiplicatively speaking). Our search therefore is effectively for syntonic tuning systems where the fixed ratios are $\mathsf{P8} \longrightarrow 2$ and $i \longrightarrow 1$ for some interval $i$.

Before we know what $i$ is, can we say what such a tuning system would look like? Well, if we pick an interval $j \in \mathcal{I}, j \neq i$, and use $i, j$ as our new basis, then because $i \longrightarrow 1$, we can generate all intervals with $\alpha\cdot j$ for some rational $\alpha$. Furthermore, we can pick $j$ carefully so that all intervals can actually be represented by $\alpha\cdot j$ for integral $\alpha$. Then we can use $j$ as a convenient “unit” with which to construct our notation system or keyboard. If $\mathsf{P8} = n\cdot j$, then the tuning system is called $n$-equal temperament.

A bit of experimentation (or suitably clever calculation) results in some promising-looking candidates for $i$:

$i$ $j$ $n$
$\mathsf{A1}$ $\mathsf{M2}$ 7
$\mathsf{d2}$ $\mathsf{A1}, \mathsf{m2}$ 12
$\mathsf{dd2}$ $\mathsf{d2}$ 19
$\mathsf{d^4 3}$ $\mathsf{d2}$ 31
$\mathsf{d^7 6}$ $\mathsf{d2}$ 53

(The $j$ interval is non-unique, as various intervals become identified under equal temperaments.)

As you may have guessed already, the favourite choice here is $n = 12$ and $i = \mathsf{d2} \longrightarrow 1$. This means that $\mathsf{A1} \longrightarrow 2^\frac{1}{12}$, and $\mathsf{m2} \longrightarrow 2^\frac{1}{12}$. So A1 and m2 are identified, and are used as the generator $j$. They are referred to interchangeably as a “semitone”. The other useful property of 12-equal temperament is that $\mathsf{P5} \longrightarrow 2^\frac{7}{12} \approx 1.498$, which is extremely close to the Pythagorean value!

Thus the use of 12-equal temperament to resolve the Fundamental Keyboard Problem leads directly to keyboards with 12 keys per octave; seven “white” notes ${\mathsf{A},\mathsf{B},\mathsf{C},\mathsf{D},\mathsf{E},\mathsf{F},\mathsf{G}}$, and five “black” notes ${\mathsf{A\sharp},\mathsf{C\sharp},\mathsf{D\sharp},\mathsf{F\sharp},\mathsf{G\sharp}}$. There are no more notes to account for, because the equivalency of A1 and m2 means that notes that differ by these intervals are identified, e.g. $\mathsf{B\sharp} \equiv \mathsf{C}$ and $\mathsf{F\sharp} \equiv \mathsf{G\flat}$.

Twelve notes per octave is also fairly convenient given the size of human hands, and how difficult the resulting instrument is to play.

Other instruments

Consider an ensemble of dynamically-tunable instruments (string instruments, human voices, etc.). If this ensemble plays a major chord, there’s no reason why the players can’t all agree to tune it totally purely – with ratios of $1, \frac{5}{4}, \frac{3}{2}$.

As a general strategy, the ensemble could choose to fix just a few notes overall, and then tweak any chord slightly to maximise harmonicity. Or, locally fix any note that is constant between successive chords, and change all the other notes around it.

These systems of constant readjustment have one big advantage – much nicer-sounding intervals – and several major annoyances, which are:

• There’s no longer an unambiguous mapping between written notes and sounding frequencies. This may or may not offend you greatly, depending on how you axiomatise musical notation (you can probably guess my position…)
• A tendency for the pitch of the entire ensemble to drift over time (particularly with the second system).
• Cannot include certain instruments in the ensemble (any keyboard instruments, certain wind instruments).

Nevertheless, it is hypothesised that certain ensembles (string quartets, unaccompanied choirs) do in fact adjust their intonation in this way.

Cheap & Easy differential forms

There’s a way of motivating the notions of tangent vectors and covectors that’s hinted at but generally glossed over – at least in the physics courses that I take. This post is a quick overview, serving mostly as a reminder to myself of the topic. Please excuse the lack of rigour.

I will use the Einstein summation convention throughout,

$x^iy_iz^j \equiv \sum\limits_i \: x^iy_iz^j,$

and hopefully by the end I’ll even have explained why it makes sense.

Tangent vectors

We have an $n$-dimensional manifold $M$, which contains points, but not vectors. You cannot subtract two points on a manifold and expect to get something useful; imagine a line drawn between two points on the Earth’s surface. It would go awkwardly underground, and wouldn’t measure any sort of quantity that’s appreciable by inhabitants on the surface.

Let $\gamma \: : \: \mathbb{R} \longrightarrow M$ be a curve on $M$. It takes some real parameter (lets call it $t$ ) and spits out points in $M$ along a line, as you evolve $t$. Let’s call the coordinates of these points $p^i(t)$ in some coordinate system, and $p'^i(t)$ in some other coordinate system. Then we can find a ‘velocity’ vector $\dot{\gamma}$, tangent to the curve, whose coordinates are $\left(\frac{dp^1}{dt}, \frac{dp^2}{dt}, \ldots, \frac{dp^n}{dt}\right)$. The coordinates of $\dot{\gamma}$ in the primed coordinate system are then given by the chain rule,

$\frac{dp'^i}{dt} = \frac{dp'^i}{dp^j}\frac{dp^j}{dt}.$

This motivates the study of all objects that transform this way, and they are called contravariant vectors, or contravectors, or just vectors.

Now, so far the vectors are just $n$-tuples of numbers, with no particular geometric significance. I will however write down a vector $\mathbf{v}$ with a basis by pairing up its components $v^i$ with a basis $\mathbf{e}_i$, as well as the same in the primed coordinate system:

$\mathbf{v} = v^i\mathbf{e}_i = v'^i\mathbf{e'}_i,$

and for now these basis vectors $\mathbf{e}_i$ are formal placeholders. All we can say is, whatever the choice of $\mathbf{e}_i$, they will have to transform using the inverse of the transformation matrix used by $v^i$, in order that the expression above remains true in any coordinate system.

A vector lives at a single point $p$ of $M$, in a space called ‘the tangent space to $M$ at $p$ ‘, or $T_pM$ for short – imagine a flat plane ( $T_pM$ ) balancing on top of a lumpy surface ( $M$ ), touching it at a point ( $p$ ). If $\mathbf{v}$ varies from point to point, it is strictly a vector field, or in other words a function $\mathbf{v} \: : \: M \longrightarrow T_pM$; in this case we can just say that it lives in $TM$, and we have to be careful not to forget about its position-dependence even if we suppress it occasionally for notational convenience.

Differential operators

Let $f \: : \: M \longrightarrow \mathbb{R}$ be a scalar field (just a real-valued function) on our manifold $M$. We can write differentiation of $f$ along a vector (the directional derivative along $\mathbf{v}$ ) in three ways, all defined to be equivalent

$\left( \nabla_\mathbf{v} f \right) (p) \equiv \frac{df(p + t\mathbf{v})}{dt}\Big|_{t = 0} \equiv v^i\frac{\partial f}{\partial x^i},$

and note that we’re not really adding the vector $\mathbf{v}$ to the point $p$, because we’re evaluating the expression at $t = 0$. The $p$-dependence is casually suppressed in the last expression.

We might worry that this is coordinate system-dependent, so lets try to write the same quantity down in the primed coordinate system, using the transformation properties of $\mathbf{v}$ that we already know, and the chain rule:

$v'^i\frac{\partial f}{\partial x'^i} = \frac{\partial x'^i}{\partial x^j} v^j \frac{\partial f}{\partial x^k}\frac{\partial x^k}{\partial x'^i} = v^j \frac{\partial f}{\partial x^k} \delta^k_j = v^k \frac{\partial f}{\partial x^k},$

so our directional derivative is coordinate-invariant after all! Note that multiplying the coordinates of a matrix with those of its inverse (and summing according to the Einstein convention) gives the Kronecker delta, which is why we can swap out $j$ for $k$ in the last expression.

Coordinate-invariance shouldn’t surprise us too much, because the first two ways of writing the directional derivative made no mention of any coordinate system for $\mathbf{v}$.

Now, recall that first-order differential operators on real functions of $n$ variables all take the form

$D f = a^i\frac{\partial f}{\partial x^i},$

and so if we just interpret the values $a^i$ as components of a vector, we’ve found a one-to-one correspondence between vectors and first-order differential operators (strictly speaking it’s between vector components and operators, but all the transformation matrices between coordinate systems are one-to-one too so it doesn’t matter).

This correspondence with differential operators hints strongly at what quantities to use as our basis vectors – the individual derivative operators $\frac{\partial}{\partial x^i}$ certainly transform in the correct way. We now make the formal identification

$\frac{\partial}{\partial x^i} \equiv \mathbf{e}_i.$

I say formal because we will not treat these basis vectors like ‘proper’ derivative symbols, as their ‘true’ meaning will only come into play in certain carefully-defined situations.

Let’s make the following abbreviations: $\frac{\partial}{\partial x^i} \equiv \partial_i$ and $\frac{\partial}{\partial x'^i} \equiv \partial'_i$ when talking about operators; and $\frac{\mathbf{\partial}}{\mathbf{\partial} x^i} \equiv \partial_i \equiv \mathbf{e}_i$ and $\frac{\mathbf{\partial}}{\mathbf{\partial} x'^i} \equiv \partial'_i \equiv \mathbf{e'}_i$ when talking about basis vectors.

Linear functionals

A linear functional is a function $\alpha \: : \: TM \longrightarrow \mathbb{R}$ that satisfies linearity, i.e.

$\alpha(\mathbf{u}) = x \in \mathbb{R} \\ (\alpha + \beta)(\mathbf{u}) = \alpha(\mathbf{u}) + \beta(\mathbf{u}) \\ \alpha(\mathbf{u} + \mathbf{v}) = \alpha(\mathbf{u}) + \alpha(\mathbf{v}).$

Linear functionals are ‘vector-like’, but live in a space called $T^*M$, rather than the $TM$ that contains vectors. They are totally determined by their action on the basis vectors of $TM$, so can be written down in components:

$\alpha_i = \alpha(\mathbf{e}_i) \\ \alpha(\mathbf{v}) = \alpha_i v^i \\ \alpha = \alpha_i\eta^i \\ \eta^i(\mathbf{v}) = v^i,$

where the $\eta^i$ are some as-yet-mysterious basis for our linear functionals. Note the position of the indices on each quantity: the Einstein summation convention is working correctly, even if we don’t necessarily know yet what sort of quantities we’re dealing with.

The expression $\alpha(\mathbf{v}) = \alpha_i v^i$ must be coordinate independent, as the left hand side makes no reference to any coordinate system; and we already know how to transform $v^i$. Therefore the components $\alpha_i$ must use the opposite transformation, $\alpha'_i = \alpha_j \frac{\partial x^j}{\partial x'^i}$. So we have

$\alpha'_i v'^i = \alpha_j \frac{\partial x^j}{\partial x'^i} \frac{\partial x'^i}{\partial x^k} v^k = \alpha_j \delta^j_k v^k = \alpha_j v^j = \alpha(\mathbf{v}).$

These linear functionals are also called covariant vectors, covectors, differential one-forms, or one-forms. Remember that both $\alpha$ and $\mathbf{v}$ can have $p$-dependence in general, making them covector fields and vector fields respectively.

Total differential of a function

The following formula for the ‘total’ differential of a function should be familiar:

$df = \partial_i f dx^i,$

where $p$-dependence has been suppressed on both sides. However, we don’t currently have a way to make geometric sense of the individual coordinate differentials $dx^i$. This expression must be coordinate-independent (no mention of coordinates is made on the left side), so the coordinate differentials must transform as

$dx'^i = \frac{\partial x'^i}{\partial x^j} dx^j.$

This is exactly how the basis for our covectors $\eta^i$ transforms! So we can make the formal identification $\eta^i \equiv dx^i$, much like how we earlier decided that $\mathbf{e}_i \equiv \partial_i$.

The full component-wise expression for the action of our covectors on our vectors is

$\alpha(\mathbf{v}) = \alpha_i v^j dx^i(\partial_j) = \alpha_i v^j \delta^i_j = \alpha_i v^i.$

The only trick here is $dx^i(\partial_j) = \delta^i_j$, which we have defined to be true.

Our expression for $df$ now comes in handy as a way to generate new covectors. In fact, covectors generated in this way have the following useful property:

$df(\mathbf{v}) = \partial_i f v^i = \nabla_\mathbf{v} f.$

You may have spotted by now the value of the Einstein summation convention – as long as you keep your up-indices on vector components (and covector bases), and down-indices on covector components (and vector bases), any scalar you end up with will be coordinate-independent. This is a useful ‘type-check’ on any expression; if the indices don’t match, something must be wrong (or you’ve violated relativity by finding a preferred coordinate system).

I finish with three warnings:

• Covectors generated from functions (like $df$ above) are not the only kind! Any linear combination of the basis covectors $dx^i$ is a covector, and in general the arbitrary covector $a_i dx^i$ will not be the differential of any function at all.
• The components of vectors transform in the opposite way to the components of covectors. The basis vectors transform oppositely to the vector components, and to the basis covectors. This is confusing! Hence physicists like to pretend that basis vectors don’t exist, and only work with components. This is a convenient way to work for many computations, but you can end up getting confused when your basis vectors change from point-to-point (as they do on most non-trivial manifolds and coordinate systems):

$\frac{d\mathbf{v}}{dt} = \frac{d}{dt}\left(v^i \mathbf{e}_i\right) = \dot{v}^i\mathbf{e}_i + v^i\mathbf{\dot{e}}_i.$

Mathematicians never write any down coordinates, say they are working in a ‘coordinate-free’ way, and act all clever about it.

• There is one more way to write the directional derivative, which is

$\mathbf{v}\left(df\right) = v^i \frac{\partial f}{\partial x^j} \partial_i(dx^j) = v^i \frac{\partial f}{\partial x^j} \delta^j_i = v^i \frac{\partial f}{\partial x^i} = \nabla_\mathbf{v} f,$

treating $\mathbf{v}$ as a function $T^*M \longrightarrow \mathbb{R}$. Unfortunately you also see people write the above as

$\mathbf{v}\left(f\right) = v^i \partial_i f = \nabla_\mathbf{v} f,$

which is very confusing, as it conflicts with our careful definitions of what the basis vectors and covectors mean – such is life.

Algebraic structure of musical intervals and pitches

Here’s the first in what will hopefully be a series of related posts about one particular (limited) aspect of the interaction between music and mathematics. In my mind, I’ll be explaining things to a hypothetical musically uneducated mathematician, who should nevertheless end up with an understanding as good as any bona-fide musician’s – in the tradition of certain physics books of which I am a fan.

I begin by revising, in unnecessary rigorous detail, what you already knew about musical pitches and intervals.

Musical intervals are the signed distance between musical notes, as written on a traditional (Western) five-lined musical stave. For completeness, I will first summarise traditional pitch and interval notation.

Pitch syntax

Pitches are a pair, consisting of a letter and an accidental: $P = (N, a)$, $P \in \mathcal{P}$, where

$N \in \{A,B,C,\ldots,A',\ldots\}, \\ a \in \{\natural,\sharp,\flat,\sharp\sharp,\flat\flat,\ldots,\sharp^n,\flat^n,\ldots\}$

leading to constructions such as C♮, F♯, B♭ etc. These pitches correspond in a slightly irregular way to the horizontal lines (and gaps between them) on the stave, but I will not go into the details here. All pairs $(N, a)$ correspond to valid pitches. The set of pitches is actually extended upwards beyond that written above with super-prime symbols ($A', B'$ ), and downwards with sub-primes ($C_{'},D_{'}$ ).

The accidentals are pronounced as follows: ♮ is a natural, ♭ is a flat, ♯ is a sharp.

Pitches form an affine space, with intervals as the difference type (subtraction of two pitches). I will not define this subtraction until we have a clearer idea of the algebra of intervals.

Interval syntax

Intervals are also a pair, consisting of a quality and a number: $I = (q, n)$, $I \in \mathcal{I}$, where

$q \in \{\mathsf{P,M,m,A,d,AA,dd},\ldots,\mathsf{A}^n,\mathsf{d}^n,\ldots\}, \\ n \in \{\ldots,\mathsf{-3,-2,1,2,3},\ldots\}$

leading to interval names such as P5, M3, m6 etc. Note that the set which $n$ belongs to is not ℤ – the $n$ are simply an arbitrary label (nominal numbers), and their arithmetic is tied up with the overall algebra of musical intervals in a complex way that does not correspond to the conventional notion of integers. Intervals form an associative abelian algebra (generally written as addition), together with the four additional operations of augmentation, diminution, inversion and negation.

The interval qualities listed above are pronounced, respectively, perfect, major, minor, augmented, diminished, doubly augmented etc. The interval numbers are pronounced as ordinal numbers, with the special case that $8$ is an octave, and $1$ a unison.

Note also that not all combinations $(q, n)$ are permitted; in particular, there are certain rules that allow one to construct valid intervals. Start with one of the eleven valid base intervals:

$\mathcal{I}' = \{\mathsf{P1}, \mathsf{m2}, \mathsf{M2}, \mathsf{m3}, \mathsf{M3}, \mathsf{P4}, \mathsf{P5}, \mathsf{m6}, \mathsf{M6}, \mathsf{m7}, \mathsf{M7}\}.$

The total set $\mathcal{I}$ of valid intervals is actually periodic: take $\mathcal{I}'$, and in the positive $n$ direction $\mathcal{I}$ has period 7 and the entire inverted set (the result of the map $n \longrightarrow -n$ ) is also in $\mathcal{I}$, i.e.

$(q,n) \in \mathcal{I}' \implies (q,n \pm 7m) \in \mathcal{I}^{\prime\prime},$

and then also

$(q,n) \in \mathcal{I}^{\prime\prime} \implies (q, -n) \in \mathcal{I} \:\: \mathrm{and} \:\: (q, n) \in \mathcal{I}.$

Operations on intervals

I will freely interchange two forms of notation for the same interval: $\mathsf{P5}$ and $(P,5)$ for convenience. I will use $+$ for interval vector addition and addition of intervals to pitches, and $-$ for interval vector subtraction and subtraction of pitches. I will use a dot $\cdot$ for scalar multiplication of interval vectors by integers.

Before I get to the complete decision procedure for intervallic addition below, a limited form of interval addition can be defined on $\mathcal{I}'$,

$\mathsf{P1} + I = I\\ (\mathsf{M},n) - (m,n) = \mathsf{m2}\\ (\mathsf{m},n+1) - (\mathsf{M},n) = \mathsf{m2}\\ (\mathsf{m},n+1) - (\mathsf{P},n) = \mathsf{m2},\\$

which can be extended in the obvious way to $\mathcal{I}$ (of course, each case is defined only for $(q,n)$ that actually exist in $\mathcal{I}$ ).

Then, you can augment intervals:

$aug(q,n) = \begin{cases} (\mathsf{A},n) & \mbox{if } q = \mathsf{P} \\ (\mathsf{M},n) & \mbox{if } q = \mathsf{m} \\ (\mathsf{A},n) & \mbox{if } q = \mathsf{M} \\ (\mathsf{m},n) & \mbox{if } q = \mathsf{d} \mbox{ and } (\mathsf{m},n) \in \mathcal{I} \\ (\mathsf{P},n) & \mbox{if } q = \mathsf{d} \mbox{ and } (\mathsf{P},n) \in \mathcal{I} \\ (\mathsf{A}^{i+1},n) & \mbox{if } q = \mathsf{A}^i \\ (\mathsf{d}^{i-1},n) & \mbox{if } q = \mathsf{d}^i,\\ \end{cases}$

and diminish them:

$dim(q,n) = \begin{cases} (\mathsf{d},n) & \mbox{if } q = \mathsf{P} \\ (\mathsf{m},n) & \mbox{if } q = \mathsf{M} \\ (\mathsf{d},n) & \mbox{if } q = \mathsf{M} \\ (\mathsf{M},n) & \mbox{if } q = \mathsf{A} \mbox{ and } (\mathsf{M},n) \in \mathcal{I} \\ (\mathsf{P},n) & \mbox{if } q = \mathsf{A} \mbox{ and } (\mathsf{P},n) \in \mathcal{I} \\ (\mathsf{A}^{i-1},n) & \mbox{if } q = \mathsf{A}^i.\\ (\mathsf{d}^{i+1},n) & \mbox{if } q = \mathsf{d}^i \\ \end{cases}$

Note that $aug(dim(I)) = dim(aug(I)) = I$, for all $I$.

We can now define addition on intervals. Let $I_1 = (q_1,n_1)$ and $I_2 = (q_2,n_2)$. Then,

$I_1 + I_2 = I_3 = (q_3, n_3),$

finding $q_3$ and $n_3$ according to the following procedure: augment or diminish $I_1$ until you have a new interval $I'_1 \in \mathcal{I}$, along the way calculating an augmentation index $j_1$, where you increment $j_1$ once for each augmentation, and decrement once for each diminution. Repeat for $I'_2$. Then perform the following interval addition, which is possible because addition is already defined on $\mathcal{I}$:

$I'_1 + I'_2 = I'_3.$

Then, perform the appropriate number ($j_1 + j_2$ ) of augmentations on $I'_3$, to give $I_3$:

$aug^{j_1+j_2}(I'_3) = (q_3,n_3).$

Incidentally, it is always true that $n_3 = n_1 + n_2 - 1$.

Free abelian groups

Quick revision of free abelian groups. A free abelian group $G$ of rank $n$ satisfies the following: there exists at least one set $B \subset G$, with $|B| = n$, such that every element of $G$ can be written as a linear combination of the elements of $B$, with integer coefficients, i.e. $g = b_1^{i_1}b_2^{i_2}\ldots b_n^{i_n}$.

Free abelian groups can be thought of as vector spaces over the integers (ℤ as the field of scalars, $G$ as the group of vectors).

A rank-$n$ free abelian group is isomorphic to the group of $n$-tuples of integers, with the group operation pairwise addition:

$(a_1,a_2,\ldots) + (b_1,b_2,\ldots) = (a_1 + b_1, a_2 + b_2, \ldots),$

or in other words, any rank-$n$ free abelian group is isomorphic to the direct sum of $n$ copies of ℤ:

$G \cong \mathbb{Z} \oplus \mathbb{Z} \oplus \ldots \oplus \mathbb{Z}.$

There may be many different choices of basis set $b$, much like a vector space with no preferred basis.

Intervals as a free abelian group

As you can probably tell, our previous method of interval addition is a horrible mess. Luckily we can prove that intervals form a rank-2 free abelian group (said proof consists of a pile of tedious case analysis). Hence we can find a two-element basis; after which interval addition proceeds easily, being reduced to element-wise addition of pairs of integers.

We need to find any pair of linearly independent intervals to use as a basis – to decompose any arbitrary interval into a linear combination of the two basis intervals. Once we have one basis, we can use them to easily generate all the other bases. However, it’s not immediately obvious how to find our first pair of linearly independent intervals. Luckily, I have such a pair up my sleeve already: (A1, d2). They must be linearly independent, because

$n\cdot \mathsf{A1} = (\mathsf{A}^n, n - n + 1) = (\mathsf{A}^n,1) \neq \mathsf{d2} \:\:\: \forall n.$

We will now take the opportunity to simplify our augmentation and diminution operations,

$aug(I) = I + \mathsf{A1} \\ dim(I) = I - \mathsf{A1}.$

Decomposing arbitrary intervals into a basis requires some more tedious case analysis, so here are just a few examples:

$\mathsf{m2} = \mathsf{A1} + \mathsf{d2}\\ \mathsf{P5} = 7\cdot \mathsf{A1} + 4\cdot \mathsf{d2}\\ \mathsf{M6} = 9\cdot \mathsf{A1} + 5\cdot \mathsf{d2}\\ \mathsf{P8} = 12\cdot \mathsf{A1} + 7\cdot \mathsf{d2}.$

If we define the map $\phi : \mathcal{I} \longrightarrow \mathbb{Z}\times\mathbb{Z}$ to be the map that gives the decomposition into the (A1, d2) basis, then the above could be written as

$\phi(\mathsf{m2}) = (1,1)\\ \phi(\mathsf{P5}) = (7,4)\\ \phi(\mathsf{M6}) = (9,5)\\ \phi(\mathsf{P8}) = (12,7).$

The basis (A1, d2) is convenient insofar as A1 matches up with what we think of as semitones, and d2 simply counts an interval’s number.

Now that we can add and subtract intervals easily, I can concisely define the two remaining special operations on intervals, inversion and negation:

$inv(I) = \mathsf{P8} - I\\ neg(I) = \mathsf{P1} - I.$

Pitch space

Of course, merely being able to add and subtract intervals is pretty useless on its own. What we really want to do is use intervals to hop around the space of pitches. The rules for pitch arithmetic are only barely less irregular than for intervals.

Note: adding an interval to a pitch is called transposition, and it is technically a separate operation from interval addition (it has a different type signature), but we shall use the $+$ symbol for it anyway.

In our notation from the first section, adding an octave (P8) adds a prime symbol to a letter name

$(N,a) + \mathsf{P8} = (N',a)\\ (N_{'},a) + \mathsf{P8} = (N,a)$

with the obvious extension to multiple octave subtraction, and multiple primes and sub-primes.

Adding and subtracting A1 corresponds to adding and subtracting sharps and flats:

$(N,\natural) + \mathsf{A1} = (N,\sharp)\\ (N,\flat) + \mathsf{A1} = (N,\natural)$

with the obvious extension to double sharps (♯♯) and double flats (♭♭) and arbitrary numbers of accidentals.

All that remains is to give the intervals between the natural (♮) pitches:

$\mathsf{A'\natural} - \mathsf{G\natural} = \mathsf{M2}\\ \mathsf{G\natural} - \mathsf{F\natural} = \mathsf{M2}\\ \mathsf{F\natural} - \mathsf{E\natural} = \mathsf{m2}\\ \mathsf{E\natural} - \mathsf{D\natural} = \mathsf{M2}\\ \mathsf{D\natural} - \mathsf{C\natural} = \mathsf{M2}\\ \mathsf{C\natural} - \mathsf{B\natural} = \mathsf{m2}\\ \mathsf{B\natural} - \mathsf{A\natural} = \mathsf{M2}.$

The equivalent of choosing a basis for our intervals is finding a coordinate system for our pitches. To do this we must convert the pitch affine space $\mathcal{P}$ into a vector space (Wikipedia suggests the name Pointed space) by choosing an origin. At this point, any choice of origin is arbitrary and meaningless, but it becomes important when we get to tuning systems, which will be the subject of the next post in this series.

For example, let us define a map $\psi : \mathcal{P} \longrightarrow \mathbb{Z}\times\mathbb{Z}$ with origin

$\psi(\mathsf{A\natural}) = O.$

Then, to find the coordinates of an arbitrary pitch $P$, we compute

$\psi(P) = O + \phi(P - O).$

Of course, it is easiest to simply define $O = (0,0)$.

We now no longer need to worry about how to represent pitches, and will focus on intervals for the purposes of basis changes.

Change of interval basis

Given that we now have one valid basis, the problem of further changes of basis reduces to linear algebra in two dimensions.

Let $\phi(I) = (m,n)$ be our interval in the (A1,d2) representation, and let $\phi(I_1) = (a,b)$ and $\phi(I_2) = (c,d)$ be a different pair of linearly independent basis intervals. Then

$x\cdot(a,b) + y\cdot(c,d) = (m,n),$

which is simply a system of two linear equations, to be solved by determinants in the usual way:

$x = \frac{dm - cn}{ad - bc}, \:\:\:\: y = \frac{an - bm}{ad - bc}.$

Clearly the solution will not always be in the integers, so we may sometimes choose to extend our scalar field to the rationals (particularly when we come to tuning systems). Here are the examples from the previous-but-one section, but demonstrating the (P5,P8) basis:

$\mathsf{m2} = -5\cdot\mathsf{P5} + 3\cdot\mathsf{P8}\\ \mathsf{P5} = 1\cdot\mathsf{P5} + 0\cdot \mathsf{P8}\\ \mathsf{M6} = 3\cdot \mathsf{P5} - 1\cdot\mathsf{P8}\\ \mathsf{P8} = 0\cdot \mathsf{P5} + 1\cdot\mathsf{P8},$

and again with the (M2,m2) basis:

$\mathsf{m2} = 0\cdot\mathsf{M2} + 1\cdot\mathsf{m2}\\ \mathsf{P5} = 3\cdot\mathsf{M2} + 1\cdot\mathsf{m2}\\ \mathsf{M6} = 4\cdot \mathsf{M2} + 1\cdot\mathsf{m2}\\ \mathsf{P8} = 5\cdot \mathsf{M2} + 2\cdot \mathsf{m2}.$

Finally, here is a diagram showing pitch space, with arrows representing 3 choices of interval basis.

The ideas in this post are implemented concretely in two software projects: the Haskell Music Suite (also available on Hackage), and in AbstractMusic (the latter being my own personal research project).

Posted in maths, music | 1 Comment