From notated music to audible sounds

This is the second post in a series devoted to music from a mathematical point of view. The first post dealt with written intervals and notes; the moral of that post was that there is some structure (a vector space) hidden inside the way we talk about intervals and notes, which we can (and should) take advantage of.

In this post, I will make the transition from notated music to audible noises, still in a way that is aimed at my hypothetical musically-ignorant mathematician.

Revision of previous ideas

Notated intervals form a two-dimensional vector space. Pitches form a two-dimensional affine space, with intervals as the ‘difference’ vectors. See the previous post for details.

Addition in log-space

I take audible sounds to be the space of frequencies as measured in units of hertz (cycles per second). However, what we’re really interested in are the ratios between these frequencies. The absolute values only come into play when we choose an arbitrary reference point off which to base all our absolute pitches. (Choosing a reference point is different from using a non-standard tuning system – you can have equal temperament, but at Baroque pitch (A = 415), for example).

Pitch ratios are, of course, combined by multiplication, but we can still write the operation as addition provided we understand that they are being `added’ in log-space:

$f_1 = f_2f_3 \:\:\: \Longleftrightarrow \:\:\: \log{f_1} = \log{f_2} + \log{f_3}.$

In practice, these interval ratios will always be formed by taking rational numbers to rational powers.

Constraints on a tuning system

Different musical instruments are suited to different methods of tuning notes. For example, the human voice can trivially produce pitches at any frequency in a certain range – and the same for string instruments. Wind instruments have a fixed number of ‘holes’, plus some standardised ways of shifting the basic pitches around. Brass instruments are even more restricted, and the notes they can play are closely related to the harmonic series.

Keyboard instruments are a somewhat different beast – in theory you could associate a button/key with any note imaginable, but due to practical limitations a one-dimensional array of keys is used. This obviously causes issues when we try to match up a notation system based on a two-dimensional system of intervals to the keys available. Therefore we’ll need to come up with some way of reducing (“projecting”) our two dimensions down to a single dimension. This is the Fundamental Keyboard Problem.

Intervals with rational coefficients

When defining a tuning system, what is typically given are particular ratios for certain intervals. Suppose we have a tuning system $t : \mathcal{I} \longrightarrow \mathbb{R}$ , i.e. a map that takes intervals to pitch ratios. We fix two intervals, $t(i_1) = f_1$ and $t(i_2) = f_2$ . Assuming it is not the case that $i_1 \propto i_2$ , these two intervals span $\mathcal{I}$ , so $t(i)$ is now fixed for all $i \in \mathcal{I}$ . This is because any $i \in \mathcal{I}$ can be written in the $i_1, i_2$ basis,

$i = \alpha\cdot i_1 + \beta\cdot i_2$

and hence

$t(i) = \alpha\cdot t(i_1) + \beta\cdot t(i_2) \equiv f_1^\alpha f_2^\beta$

Many well-known tuning systems can be specified this way. They are called syntonic tuning systems, or rank-2 tuning systems. However, in practice there is only one interval ratio that is free to be specified arbitrarily, because the other fixed interval is always $t(\mathsf{P8}) = 2$ , otherwise octaves aren’t pure!

This gives rise to the main problem: two non-octave intervals can’t be simultaneously pure. This is distinct from the problem of designing keyboard instruments. The diatonic scale of Ptolemy specifies pure intervals for all eight steps of the major scale:

degree	ratio
P1	1
M2	9/8
M3	5/4
P4	4/3
P5	3/2
M6	5/3
M7	15/8
P8	2

(There exist numerous slight variations of the Ptolemaic scale, as well as the minor scale etc.)

With a syntonic temperament, we can only get a few of these ‘correct’, unless we happen to get lucky with our ratios. P1 and P8 are correct by definition; then $i$ (e.g. P5) can be specified freely; then, perhaps $\mathsf{P8} - i$ (e.g. P4) will come out correct too. After that you’re out of luck.

Syntonic tuning systems

In Pythagorean tuning, the given intervals are $3/2$ for the perfect fifth, and $2$ for the octave. As indicated above, this completely specifies the tuning. The procedure for general intervals is then as follows:

Define a map $t : \mathcal{I} \longrightarrow \mathbb{R}$ that takes intervals to pitch ratios, and define it for the two chosen basis intervals, e.g.

$t(\mathsf{P5}) = \frac{3}{2}\\ t(\mathsf{P8}) = 2$

Write your chosen interval in terms of the new basis and calculate the appropriate ratio, e.g.

$\mathsf{M6} = 3\cdot \mathsf{P5} - 1\cdot\mathsf{P8} \\ t(\mathsf{M6}) = \left(\frac{3}{2}\right)^3 \left(2\right)^{-1} = \frac{27}{16}$

Then, for notes, define a new map $T : \mathcal{P} \longrightarrow \mathbb{R}$
Fix the origin under $T$ , i.e. $T(p_0) = f_0$ for some note $p_0$ and pitch $f_0$ ; the common choice is $p_0 = \mathsf{A}$ , and $f_0 = 440\:\mathrm{Hz}$
Extend $T$ to all notes by

$T(p) = t(p - p_0)\times T(p_0)$

For example,

$T(\mathsf{F\sharp}) = t(\mathsf{M6}) \times T(\mathsf{A}) = \frac{27}{16} \times 440\:\mathrm{Hz} = 742.5\:\mathrm{Hz}$

Here is a table of some common syntonic tuning systems, in each case assuming that the second constrained interval is $\mathsf{P8} \longrightarrow 2$ :

Tuning system	Fixed interval
Pythagorean	$\mathsf{P5} \longrightarrow \frac{2}{3}$
Quarter-comma meantone	$\mathsf{M3} \longrightarrow \frac{5}{4}$
Sixth-comma meantone	$\mathsf{A4} \longrightarrow \frac{45}{32}$
Third-comma meantone	$\mathsf{m3} \longrightarrow \frac{6}{5}$
Schismatic	$8\cdot\mathsf{P4} \longrightarrow 10$

Note that we quickly enter the realm of irrational numbers: for example, under quarter-comma meantone, $\mathsf{P5} \longrightarrow \left(\frac{5}{4}\right)^\frac{1}{4}\left(2\right)^\frac{1}{2} \approx 1.495$ .

You can immediately see that different tuning systems give different trades-off: quarter-comma meantone provides you with sweet-sounding (and narrow) major thirds, while abandoning the pure fifths of Pythagorean tuning.

There is a link here between theory and practice: in Medieval music, for which Pythagorean tuning was used, phrase-endings rarely feature major thirds – normally open fifths and octaves are the only intervals considered ‘pure’ enough to end a phrase. In Renaissance and Baroque music, major thirds are used much more often, and this coincides with the use of quarter-comma meantone tuning.

Keyboard instruments with syntonic temperaments

Let us design a keyboard that will use notes from a syntonic temperament $t : \mathcal{I} \longrightarrow \mathbb{R}$ (with fixed interval $i$ , origin note $b$ , and note-mapping $T : \mathcal{P} \longrightarrow \mathbb{R}$ ); we know that octaves will be pure, so we make our one-dimensional keyboard periodic at the octave, and then place $n$ keys in each octave. Each key (attached to a physical string or pipe) will be tuned to some definite frequency $f \in \{T(p) \: | \: p \in \mathcal{P} \}$ .

Now we’ll attempt to distribute notes from our temperament to the physical keys on the keyboard. Starting at note $b$ (with frequency $T(b)$ ), assign the notes $(b\pm k\cdot i) \: \mathrm{mod} \: \mathsf{P8}$ to their keys (with frequencies $T\left((b\pm k\cdot i) \: \mathrm{mod} \: \mathsf{P8}\right)$ ), ending at $k = n$ ( $\pm 1$ depending on whether $n$ is odd or even). Unfortunately in general the cycle is not closed, as $\left(n\cdot i \: \mathrm{mod} \: \mathsf{P8}\right) \neq \mathsf{P1}$ . This is called a wolf interval, and its existence limits the usefulness of syntonic tuning systems for keyboards.

To minimise disruption, the wolf interval is normally chosen to be one that is little used if playing in keys with a low number of sharps and flats; for example $\mathsf{G\sharp} - \mathsf{E\flat} = \mathsf{A3}$ : under Pythagorean tuning, the A3 is about $1.35$ , to contrast with the pure P4 which is exactly $\frac{4}{3}$ .

Keyboard instruments with equal temperaments

Returning to the Fundamental Keyboard Problem, we see that the solution is to project the two dimensions of notated intervals down to a one-dimensional subspace. This necessarily involves one interval being set to zero (or to one, multiplicatively speaking). Our search therefore is effectively for syntonic tuning systems where the fixed ratios are $\mathsf{P8} \longrightarrow 2$ and $i \longrightarrow 1$ for some interval $i$ .

Before we know what $i$ is, can we say what such a tuning system would look like? Well, if we pick an interval $j \in \mathcal{I}, j \neq i$ , and use $i, j$ as our new basis, then because $i \longrightarrow 1$ , we can generate all intervals with $\alpha\cdot j$ for some rational $\alpha$ . Furthermore, we can pick $j$ carefully so that all intervals can actually be represented by $\alpha\cdot j$ for integral $\alpha$ . Then we can use $j$ as a convenient “unit” with which to construct our notation system or keyboard. If $\mathsf{P8} = n\cdot j$ , then the tuning system is called $n$ -equal temperament.

A bit of experimentation (or suitably clever calculation) results in some promising-looking candidates for $i$ :

$i$	$j$	$n$
$\mathsf{A1}$	$\mathsf{M2}$	7
$\mathsf{d2}$	$\mathsf{A1}, \mathsf{m2}$	12
$\mathsf{dd2}$	$\mathsf{d2}$	19
$\mathsf{d^4 3}$	$\mathsf{d2}$	31
$\mathsf{d^7 6}$	$\mathsf{d2}$	53

(The $j$ interval is non-unique, as various intervals become identified under equal temperaments.)

As you may have guessed already, the favourite choice here is $n = 12$ and $i = \mathsf{d2} \longrightarrow 1$ . This means that $\mathsf{A1} \longrightarrow 2^\frac{1}{12}$ , and $\mathsf{m2} \longrightarrow 2^\frac{1}{12}$ . So A1 and m2 are identified, and are used as the generator $j$ . They are referred to interchangeably as a “semitone”. The other useful property of 12-equal temperament is that $\mathsf{P5} \longrightarrow 2^\frac{7}{12} \approx 1.498$ , which is extremely close to the Pythagorean value!

Thus the use of 12-equal temperament to resolve the Fundamental Keyboard Problem leads directly to keyboards with 12 keys per octave; seven “white” notes ${\mathsf{A},\mathsf{B},\mathsf{C},\mathsf{D},\mathsf{E},\mathsf{F},\mathsf{G}}$ , and five “black” notes ${\mathsf{A\sharp},\mathsf{C\sharp},\mathsf{D\sharp},\mathsf{F\sharp},\mathsf{G\sharp}}$ . There are no more notes to account for, because the equivalency of A1 and m2 means that notes that differ by these intervals are identified, e.g. $\mathsf{B\sharp} \equiv \mathsf{C}$ and $\mathsf{F\sharp} \equiv \mathsf{G\flat}$ .

Twelve notes per octave is also fairly convenient given the size of human hands, and how difficult the resulting instrument is to play.

Other instruments

Consider an ensemble of dynamically-tunable instruments (string instruments, human voices, etc.). If this ensemble plays a major chord, there’s no reason why the players can’t all agree to tune it totally purely – with ratios of $1, \frac{5}{4}, \frac{3}{2}$ .

As a general strategy, the ensemble could choose to fix just a few notes overall, and then tweak any chord slightly to maximise harmonicity. Or, locally fix any note that is constant between successive chords, and change all the other notes around it.

These systems of constant readjustment have one big advantage – much nicer-sounding intervals – and several major annoyances, which are:

There’s no longer an unambiguous mapping between written notes and sounding frequencies. This may or may not offend you greatly, depending on how you axiomatise musical notation (you can probably guess my position…)
A tendency for the pitch of the entire ensemble to drift over time (particularly with the second system).
Cannot include certain instruments in the ensemble (any keyboard instruments, certain wind instruments).

Nevertheless, it is hypothesised that certain ensembles (string quartets, unaccompanied choirs) do in fact adjust their intonation in this way.