There’s a way of motivating the notions of tangent vectors and covectors that’s hinted at but generally glossed over – at least in the physics courses that I take. This post is a quick overview, serving mostly as a reminder to myself of the topic. Please excuse the lack of rigour.

I will use the Einstein summation convention throughout,

and hopefully by the end I’ll even have explained why it makes sense.

## Tangent vectors

We have an -dimensional manifold , which contains points, but **not** vectors. You cannot subtract two points on a manifold and expect to get something useful; imagine a line drawn between two points on the Earth’s surface. It would go awkwardly underground, and wouldn’t measure any sort of quantity that’s appreciable by inhabitants on the surface.

Let be a curve on . It takes some real parameter (lets call it ) and spits out points in along a line, as you evolve . Let’s call the coordinates of these points in some coordinate system, and in some other coordinate system. Then we can find a ‘velocity’ vector , tangent to the curve, whose coordinates are . The coordinates of in the primed coordinate system are then given by the chain rule,

This motivates the study of all objects that transform this way, and they are called *contravariant vectors*, or *contravectors*, or just *vectors*.

Now, so far the vectors are just -tuples of numbers, with no particular geometric significance. I will however write down a vector with a basis by pairing up its components with a basis , as well as the same in the primed coordinate system:

and for now these basis vectors are formal placeholders. All we can say is, whatever the choice of , they will have to transform using the inverse of the transformation matrix used by , in order that the expression above remains true in any coordinate system.

A vector lives at a single point of , in a space called ‘the tangent space to at ‘, or for short – imagine a flat plane ( ) balancing on top of a lumpy surface ( ), touching it at a point ( ). If varies from point to point, it is strictly a *vector field*, or in other words a function ; in this case we can just say that it lives in , and we have to be careful not to forget about its position-dependence even if we suppress it occasionally for notational convenience.

## Differential operators

Let be a scalar field (just a real-valued function) on our manifold . We can write differentiation of along a vector (the *directional derivative* along ) in three ways, all defined to be equivalent

and note that we’re not really adding the vector to the point , because we’re evaluating the expression at . The -dependence is casually suppressed in the last expression.

We might worry that this is coordinate system-dependent, so lets try to write the same quantity down in the primed coordinate system, using the transformation properties of that we already know, and the chain rule:

so our directional derivative is coordinate-invariant after all! Note that multiplying the coordinates of a matrix with those of its inverse (and summing according to the Einstein convention) gives the Kronecker delta, which is why we can swap out for in the last expression.

Coordinate-invariance shouldn’t surprise us too much, because the first two ways of writing the directional derivative made no mention of any coordinate system for .

Now, recall that first-order differential operators on real functions of variables all take the form

and so if we just interpret the values as components of a vector, we’ve found a one-to-one correspondence between vectors and first-order differential operators (strictly speaking it’s between vector *components* and operators, but all the transformation matrices between coordinate systems are one-to-one too so it doesn’t matter).

This correspondence with differential operators hints strongly at what quantities to use as our basis vectors – the individual derivative operators certainly transform in the correct way. We now make the *formal* identification

I say formal because we will not treat these basis vectors like ‘proper’ derivative symbols, as their ‘true’ meaning will only come into play in certain carefully-defined situations.

Let’s make the following abbreviations: and when talking about operators; and and when talking about basis vectors.

## Linear functionals

A *linear functional* is a function that satisfies linearity, i.e.

Linear functionals are ‘vector-like’, but live in a space called , rather than the that contains vectors. They are totally determined by their action on the basis vectors of , so can be written down in components:

where the are some as-yet-mysterious basis for our linear functionals. Note the position of the indices on each quantity: the Einstein summation convention is working correctly, even if we don’t necessarily know yet what *sort* of quantities we’re dealing with.

The expression must be coordinate independent, as the left hand side makes no reference to any coordinate system; and we already know how to transform . Therefore the components must use the opposite transformation, . So we have

These linear functionals are also called *covariant vectors*, *covectors*, *differential one-forms*, or *one-forms*. Remember that both and can have -dependence in general, making them covector fields and vector fields respectively.

## Total differential of a function

The following formula for the ‘total’ differential of a function should be familiar:

where -dependence has been suppressed on both sides. However, we don’t currently have a way to make geometric sense of the individual coordinate differentials . This expression must be coordinate-independent (no mention of coordinates is made on the left side), so the coordinate differentials must transform as

This is exactly how the basis for our covectors transforms! So we can make the formal identification , much like how we earlier decided that .

The full component-wise expression for the action of our covectors on our vectors is

The only trick here is , which we have defined to be true.

Our expression for now comes in handy as a way to generate new covectors. In fact, covectors generated in this way have the following useful property:

You may have spotted by now the value of the Einstein summation convention – as long as you keep your up-indices on vector components (and covector bases), and down-indices on covector components (and vector bases), any scalar you end up with will be *coordinate-independent*. This is a useful ‘type-check’ on any expression; if the indices don’t match, something must be wrong (or you’ve violated relativity by finding a preferred coordinate system).

I finish with three warnings:

- Covectors generated from functions (like above) are not the only kind! Any linear combination of the basis covectors is a covector, and in general the arbitrary covector will not be the differential of any function at all.
- The components of vectors transform in the opposite way to the components of covectors. The basis vectors transform oppositely to the vector components, and to the basis covectors. This is confusing! Hence physicists like to pretend that basis vectors don’t exist, and only work with components. This is a convenient way to work for many computations, but you can end up getting confused when your basis vectors change from point-to-point (as they do on most non-trivial manifolds and coordinate systems):
Mathematicians never write any down coordinates, say they are working in a ‘coordinate-free’ way, and act all clever about it.

- There is one more way to write the directional derivative, which is
treating as a function . Unfortunately you also see people write the above as

which is very confusing, as it conflicts with our careful definitions of what the basis vectors and covectors mean – such is life.