Post History

71%

+3 −0

Q&A What does upper indices represent?

posted 3y ago by r~~‭ · edited 3y ago by r~~‭

Answer

#7: Post edited by $user avatar$ r~~‭ · 2021-12-07T19:51:31Z (over 3 years ago)

Copy Link

Raw

Markdown

A textbook homework problem might ask, ‘Speedy the snail creeps along at a steady pace of 60 cm per minute. How far does Speedy travel each second?’ The correct answer is, of course, one centimeter. Hopefully your teacher would also accept 10 mm or 0.01 m as equally valid answers; after all, they all represent the same physical length. But how do we know that? And how do we get that answer from 60 cm/min? What even is a quantity like 60 cm/min, mathematically?
Well, I'll make the case that 60 cm/min is a tensor in $\text{1-Space} \otimes \text{Time}^*$ expressed using the bases $((1\\, \text{cm}), (1\\, \text{min}^{-1}))$, where the first basis is contravariant relative to the units we usually use for physics, and the second basis is covariant relative to those units. All of the above questions follow from this. But I'm getting ahead of myself!
When we work with units such as meters, centimeters, minutes, seconds, etc., we bring a certain set of assumptions along. For starters, we assume that the quantities that we use units to represent can be added to each other if the units match: 1 meter + 1 meter = 2 meters. We also assume that these quantities can be multiplied by dimensionless quantities to get more dimensioned quantities in the same units: 5(2 m) = 10 m. We assume a bunch of intuitive facts that more or less boil down to not caring in what order any of these additions or multiplications are performed: 1 m + 2 m = 2 m + 1 m, 1 m + (2 m + 3 m) = (1 m + 2 m) + 3 m, (2 + 3)(10 m) = 2(10 m) + 3(10 m), 2(3(5 m)) = (2×3)(5 m), etc. Some of our units are compatible with each other, like meters and centimeters or minutes and seconds, and if we appropriately scale the quantities expressed in those units, they can also be added with each other. Finally, we assume that there's a ‘nothing’ associated with each of these families of compatible units, and that it doesn't matter which unit is used to express that nothing: 0 m = 0 cm. We expect it to behave like a proper nothing: 0 m + 5 m = 5 m, and 0(7 m) = 0 m.
Those assumptions are precisely what make each ‘family of compatible units’ into a vector space, for which the appropriate units are alternate choices of bases. I'll call the vector space for which meters are appropriate basis elements $\text{1-Space}$ (because it's only one dimension of space, not the three we often work with), and the vector space for which seconds are appropriate basis elements $\text{Time}$. These are both one-dimensional vector spaces, and as such vectors in these spaces only require a single number to represent them. It may seem that there isn't much point to calling them vectors at all, but we'll see how these concepts generalize to more interesting vector spaces later.
Here's a question: is a [hertz](https://en.wikipedia.org/wiki/Hertz) (Hz, or $\text{s}^{-1}$) a vector in $\text{Time}$? Say Alice types at a rate such that she presses a key five times in a second; we can say she types at 5 Hz. If Bob types at 4 Hz, then together, nine keys are being pressed in a second; Alice and Bob together type at 9 Hz. So it certainly seems like frequencies can be meaningfully added. You can check that the rest of the vector axioms end up holding as well, and so it does seem like a hertz is a vector in some vector space. But that space isn't $\text{Time}$; adding hertz and seconds isn't a thing we can ever make sense of.
However, there is a relationship between frequency and time: a frequency tells you how much of some dimensionless quantity you get for a given amount of time. If you type at 5 Hz for 1 s, you have pressed 5 keys. If you type at 5 Hz for 4 s, you have pressed 20 keys; the result scales linearly with the amount of time. So we can say that a frequency, in addition to being a vector in its own right, is also a linear function from a vector in $\text{Time}$ to dimensionless quantities. Linear functions from vector spaces to scalars are so important that they have gotten a bunch of special names: we call them linear forms, or one-forms, or dual vectors, or covectors. For *any* vector space $V$, the linear forms of $V$ form their own vector space called the dual space of $V$, denoted $V^\*$. So the vector space of frequencies is actually $\text{Time}^\*$, the dual space of $\text{Time}$.
When I was talking about compatible units before, I was vague about how to scale values when converting between units. Let's hammer that out now. We know that one meter equals 100 centimeters. So if we want to express 2 m in centimeters, we know to multiply 2 by 100 to get 200 cm. In general, in a vector space, if you have expressed a vector in some basis, and then you want to change your basis by dividing it by some amount, you have to multiply the numbers in your vector representation by that same amount. That's all that's happening here: as we change basis from 1 m to 1 cm (dividing by 100), we have to multiply the number 2 by the same number, 100, to get 200 cm from 2 m. (And of course, since dividing by $x$ is the same as multiplying by $\frac{1}{x}$, we can equivalently say that if you change basis by multiplying by some amount, you have to divide the numbers in your vector representation by that amount.)
The same logic applies for time: 2 min = 120 s because we divide a minute by 60 to get a second, and so we have to multiply 2 by 60 to get the equivalent representation in seconds. Similarly for frequency: 4 kHz = 4000 Hz, because 1 kHz = 1000 Hz. But now notice something interesting—frequency, remember, is a function from an amount of time to a quantity. If something is happening at 4 Hz for 60 seconds, it happens 240 times. What if we want to change units from seconds to minutes? We'd like to express 4 Hz, which is the same as 4 $\text{s}^{-1}$, in $\text{min}^{-1}$ instead. Well, the frequency itself isn't changing; it must still be the case that something occurring at this frequency for 60 seconds, or one minute, results in 240 events. That means that 4 Hz = 240 $\text{min}^{-1}$. So this time, multiplying the unit by 60 meant *also multiplying* the numeric representation of the frequency by 60. But this isn't mysterious at all; we did something different here than in the previous examples. Before, we were changing the same units we were using in our representation; this time, we were changing units of time and using units of frequency. In the language of linear algebra, we scaled our basis in one vector space, $\text{Time}$, and scaled the numeric representation of vectors in the *dual* vector space $\text{Time}^*$ by the same amount. This is also a general rule for any vector space and its dual.
The words ‘contravariant’ and ‘covariant’ just distinguish between these two rules. When there is a particular vector space of interest $V$, and we are sometimes working with vectors in that space and sometimes vectors in its dual space, we say that the vectors in $V$ are contravariant because we divide their numerical representations when we multiply basis vectors in $V$, and the dual vectors in $V^*$ are covariant because we multiply their numeric representations when we multiply basis vectors in $V$.
With all that out of the way, let's revisit Speedy's speed. When we say ‘60 cm/min’, we are expressing an exchange rate of a certain number of centimeters for every minute provided. Much like a frequency was a linear function from an amount of time to a dimensionless quantity, a speed is a linear function from an amount of time to an amount of one-dimensional space. If Speedy can travel 60 centimeters given one minute, Speedy can travel 120 centimeters given two minutes, and so on. Unlike frequency, the result of a speed when given an amount of time is not a dimensionless quantity, but another vector. So a speed is a linear function from vectors in one space to vectors in another.
The original math problem asked us to evaluate the function ‘60 cm/min’ on the vector ‘1 s’. We could do this by translating 1 s to $\frac{1}{60} \\,\text{min}$, since time vectors are contravariant with respect to time units. Then $\frac{1}{60} \cdot 60\\,\text{cm} = 1\\,\text{cm}$ is the result of the function and the answer to the question. But instead, we could translate 60 cm/min directly to cm/s. We ought to get the same answer either way, so we know that 60 cm/min must equal 1 cm/s. This implies that speed is covariant with respect to time units; when we scale down our unit of time, we scale down our representation of speed by the same amount. However, if we wanted to translate from cm/min to mm/min instead for some reason, we would have to scale *up* our representation of speed to 600 mm/min; speed is contravariant with respect to space units.
All this without having to talk about tensors yet! Let's fix that.
Speedy's cousin Sticky is treasure hunting and has mounted a compass-like metal detector to her shell. At Sticky's current location, for every centimeter Sticky creeps north, the needle of the detector dips down 0.017 mm and to the left 0.003 mm; for every centimeter Sticky creeps east, the needle moves to the left 0.058 mm. Assuming the detector has a linear response in Sticky's vicinity, how far does the needle move in total if she slides a total of 1.5 cm north and 0.5 cm west?
The intended solution to this problem is quite straightforward: the needle moves $1.5 \cdot -0.017\\,\text{mm} - 0.5 \cdot 0\\,\text{mm} = -0.0255\\,\text{mm}$ vertically, and $1.5 \cdot -0.003\\,\text{mm} - 0.5 \cdot -0.058\\,\text{mm} = 0.0245\\,\text{mm}$ horizontally. But just as with Speedy's speed, let's see what kind of mathematical object is represented by the response of Sticky's metal detector.
Just as a speed expresses an exchange rate between one-dimensional space and time, the numbers in this problem express exchange rates between two different two-dimensional spaces—namely, Sticky's displacement and the needle's displacement. The problem tells us to assume this relationship is linear, so it's natural to try to look at these numbers as a linear function from $\text{2-Space} \to \text{2-Space}$. Indeed, that's precisely what they are, and if you've done any linear algebra you might expect such a linear function between vector spaces to be represented as a matrix:
$$
\left\[
\begin{array}{cc}
-0.017 & 0 \\\\
-0.003 & -0.058
\end{array}
\right\]
$$
Instead, let's look at each element individually, and without dropping the units. The first element we're given is that the needle dips down 0.017 mm for every centimeter Sticky travels north—which is to say, each vector $1\\,\text{cm north}$ in the input contributes the vector $-0.017\\,\text{mm up}$ to the output. Notice that this component all by itself is a linear function from $\text{2-Space} \to \text{2-Space}$. So are the other two non-zero components given. So another way to represent this response is as the sum of three primitive linear functions, each of which consumes one basis vector in $\text{2-Space}$ and produces a multiple of a basis vector in the other $\text{2-Space}$. For a given basis for each copy of $\text{2-Space}$, there are only four total such combinations—and of course, these correspond to the four components of the matrix above. The symmetry of this representation suggests an alternate way to look at these primitive functions: as the Cartesian product of basis elements that represent a vector and basis elements that linearly accept a vector (or, in other words, basis linear forms).
This motivates the definition of a tensor product of vector spaces (over a common field of scalars): a new vector space generated by tuples of one basis element from each of the factor vector spaces. We write $V \otimes W$ for the tensor product of vector spaces $V$ and $W$, as well as $\vec e \otimes \vec f$ for the ‘basis tensor’ represented by the pair of basis vectors $\vec e$ and $\vec f$. The tensor product $\vec v \otimes \vec w$ for arbitrary vectors $\vec v$ and $\vec w$ is defined by extension, by distributing the $\otimes$ operator over the decomposition of $\vec v$ and $\vec w$ into linear combinations of basis vectors. (Tensor products can also be defined without referencing specific bases, but that definition is a little more abstract and thus perhaps harder to connect to intuition. But it is useful to know that just like many vector identities are true regardless of the basis being used, so are the equivalent tensor identities—the bases are only useful for writing specific vectors and tensors down.)
So Sticky's metal detector response can also be represented as a tensor in $\text{2-Space} \otimes \text{2-Space}^\*$. Specifically, it is the tensor
$$
\begin{multline}
-0.017\\,\text{mm up}\otimes(\text{cm north})^{-1} - 0.003\\,\text{mm right}\otimes(\text{cm north})^{-1}\\\\
\- 0.058\\,\text{mm right}\otimes(\text{cm east})^{-1}\\;\text{.}
\end{multline}
$$
In fact, any space of linear functions from a vector space $V$ to a vector space $W$ is equivalent to the tensor product $W \otimes V^\*$, and this generalizes to multilinear functions of multiple vector spaces and so on. (This is why I described Speedy's speed as a tensor as well, at the start of this post.) By applying reasoning that should now be familiar, we can see that this representation of Sticky's detector is contravariant with respect to the space of the needle's displacement vectors, and covariant with respect to the space of Sticky's displacement vectors.
Finally, let's look at what happens when we solve the Sticky problem. Just like Speedy's speed was a function that we applied to an input vector to get an output vector, Sticky's detector response is also such a function. We simply have to apply the above tensor as a function to the vector $1.5\\,\text{cm north} - 0.5\\,\text{cm east}$ to get the answer. The way we do that is by looking at the tensor when decomposed into primitive linear functions operating on single basis vectors, matching each basis-vector-consuming term to the corresponding term in the input vector, and summing. Let's call the detector tensor $T$, and call the coefficients of the tensor $T_{\text{cm north}}^{\text{mm up}}$, $T_{\text{cm east}}^{\text{mm up}}$, $T_{\text{cm north}}^{\text{mm right}}$, and $T_{\text{cm east}}^{\text{mm right}}$ in what I hope is the obvious way (though note that in this specific problem, $T_{\text{cm east}}^{\text{mm up}} = 0$). Following convention, I've placed the contravariant basis vectors in the upper position and the covariant basis vectors in the lower position. Similarly, let's represent Sticky's displacement as $v^{\text{cm north}}$ and $v^{\text{cm east}}$. Then the answer vector $\vec w$ was given by:
$$
\begin{align}
w\^{\text{mm up}} &= T_{\text{cm north}}\^{\text{mm up}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm up}} v\^{\text{cm east}} \\\\
w\^{\text{mm right}} &= T_{\text{cm north}}\^{\text{mm right}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm right}} v\^{\text{cm east}}
\end{align}
$$
Now we can save quite a bit of writing by abstracting over the basis vectors with indices. Let the index $s$ represent one of the Sticky displacement basis vectors $\\{\text{cm north}, \text{cm east}\\}$, and let the index $n$ represent one of the needle movement basis vectors $\\{\text{mm up}, \text{mm right}\\}$. Then we can replace the above equations with the general:
$$
~~w^n = \sum_d T_d^n v^d~~
$$
This operation of matching a covariant index with a contravariant index and summing is called contraction. It is extremely common, and also quite intuitive once you're comfortable with the underlying concepts—you're just feeding an output vector into a linear form which expects an input vector! Contraction is such a common operation in tensor algebra that there is a convention, [Einstein notation](https://en.wikipedia.org/wiki/Einstein_summation), which enables the $\sum$ to be omitted on terms where the same index variable is used once in an upper position and once in a lower position.
So in summary:
* Tensors are combinations of vectors from multiple vector spaces.
* Basis vectors are like physical units; tensors are like multidimensional quantities involving several units.
* A space of tensors can be covariant or contravariant in an underlying vector space, depending on whether it combines vectors or dual vectors from that space. The terminology just describes how the coefficients of tensors need to be adjusted if the basis of that vector space is scaled.
* But tensors covariant in a given vector space also represent linear functions consuming vectors from that space, while tensors contravariant in a vector space represent producing vectors in that space.
* A variable with upper and/or lower indices represents a coefficient from a tensor given some set of bases for the vector spaces being tensored.
* Upper indices correspond to contravariant basis vectors and lower indices correspond to covariant basis vectors.
* Matching upper and lower indices are meant to be summed over, which corresponds to feeding contravariant vectors to linear forms, a.k.a. covariant vectors.

A textbook homework problem might ask, ‘Speedy the snail creeps along at a steady pace of 60 cm per minute. How far does Speedy travel each second?’ The correct answer is, of course, one centimeter. Hopefully your teacher would also accept 10 mm or 0.01 m as equally valid answers; after all, they all represent the same physical length. But how do we know that? And how do we get that answer from 60 cm/min? What even is a quantity like 60 cm/min, mathematically?
Well, I'll make the case that 60 cm/min is a tensor in $\text{1-Space} \otimes \text{Time}^*$ expressed using the bases $((1\\, \text{cm}), (1\\, \text{min}^{-1}))$, where the first basis is contravariant relative to the units we usually use for physics, and the second basis is covariant relative to those units. All of the above questions follow from this. But I'm getting ahead of myself!
When we work with units such as meters, centimeters, minutes, seconds, etc., we bring a certain set of assumptions along. For starters, we assume that the quantities that we use units to represent can be added to each other if the units match: 1 meter + 1 meter = 2 meters. We also assume that these quantities can be multiplied by dimensionless quantities to get more dimensioned quantities in the same units: 5(2 m) = 10 m. We assume a bunch of intuitive facts that more or less boil down to not caring in what order any of these additions or multiplications are performed: 1 m + 2 m = 2 m + 1 m, 1 m + (2 m + 3 m) = (1 m + 2 m) + 3 m, (2 + 3)(10 m) = 2(10 m) + 3(10 m), 2(3(5 m)) = (2×3)(5 m), etc. Some of our units are compatible with each other, like meters and centimeters or minutes and seconds, and if we appropriately scale the quantities expressed in those units, they can also be added with each other. Finally, we assume that there's a ‘nothing’ associated with each of these families of compatible units, and that it doesn't matter which unit is used to express that nothing: 0 m = 0 cm. We expect it to behave like a proper nothing: 0 m + 5 m = 5 m, and 0(7 m) = 0 m.
Those assumptions are precisely what make each ‘family of compatible units’ into a vector space, for which the appropriate units are alternate choices of bases. I'll call the vector space for which meters are appropriate basis elements $\text{1-Space}$ (because it's only one dimension of space, not the three we often work with), and the vector space for which seconds are appropriate basis elements $\text{Time}$. These are both one-dimensional vector spaces, and as such vectors in these spaces only require a single number to represent them. It may seem that there isn't much point to calling them vectors at all, but we'll see how these concepts generalize to more interesting vector spaces later.
Here's a question: is a [hertz](https://en.wikipedia.org/wiki/Hertz) (Hz, or $\text{s}^{-1}$) a vector in $\text{Time}$? Say Alice types at a rate such that she presses a key five times in a second; we can say she types at 5 Hz. If Bob types at 4 Hz, then together, nine keys are being pressed in a second; Alice and Bob together type at 9 Hz. So it certainly seems like frequencies can be meaningfully added. You can check that the rest of the vector axioms end up holding as well, and so it does seem like a hertz is a vector in some vector space. But that space isn't $\text{Time}$; adding hertz and seconds isn't a thing we can ever make sense of.
However, there is a relationship between frequency and time: a frequency tells you how much of some dimensionless quantity you get for a given amount of time. If you type at 5 Hz for 1 s, you have pressed 5 keys. If you type at 5 Hz for 4 s, you have pressed 20 keys; the result scales linearly with the amount of time. So we can say that a frequency, in addition to being a vector in its own right, is also a linear function from a vector in $\text{Time}$ to dimensionless quantities. Linear functions from vector spaces to scalars are so important that they have gotten a bunch of special names: we call them linear forms, or one-forms, or dual vectors, or covectors. For *any* vector space $V$, the linear forms of $V$ form their own vector space called the dual space of $V$, denoted $V^\*$. So the vector space of frequencies is actually $\text{Time}^\*$, the dual space of $\text{Time}$.
When I was talking about compatible units before, I was vague about how to scale values when converting between units. Let's hammer that out now. We know that one meter equals 100 centimeters. So if we want to express 2 m in centimeters, we know to multiply 2 by 100 to get 200 cm. In general, in a vector space, if you have expressed a vector in some basis, and then you want to change your basis by dividing it by some amount, you have to multiply the numbers in your vector representation by that same amount. That's all that's happening here: as we change basis from 1 m to 1 cm (dividing by 100), we have to multiply the number 2 by the same number, 100, to get 200 cm from 2 m. (And of course, since dividing by $x$ is the same as multiplying by $\frac{1}{x}$, we can equivalently say that if you change basis by multiplying by some amount, you have to divide the numbers in your vector representation by that amount.)
The same logic applies for time: 2 min = 120 s because we divide a minute by 60 to get a second, and so we have to multiply 2 by 60 to get the equivalent representation in seconds. Similarly for frequency: 4 kHz = 4000 Hz, because 1 kHz = 1000 Hz. But now notice something interesting—frequency, remember, is a function from an amount of time to a quantity. If something is happening at 4 Hz for 60 seconds, it happens 240 times. What if we want to change units from seconds to minutes? We'd like to express 4 Hz, which is the same as 4 $\text{s}^{-1}$, in $\text{min}^{-1}$ instead. Well, the frequency itself isn't changing; it must still be the case that something occurring at this frequency for 60 seconds, or one minute, results in 240 events. That means that 4 Hz = 240 $\text{min}^{-1}$. So this time, multiplying the unit by 60 meant *also multiplying* the numeric representation of the frequency by 60. But this isn't mysterious at all; we did something different here than in the previous examples. Before, we were changing the same units we were using in our representation; this time, we were changing units of time and using units of frequency. In the language of linear algebra, we scaled our basis in one vector space, $\text{Time}$, and scaled the numeric representation of vectors in the *dual* vector space $\text{Time}^*$ by the same amount. This is also a general rule for any vector space and its dual.
The words ‘contravariant’ and ‘covariant’ just distinguish between these two rules. When there is a particular vector space of interest $V$, and we are sometimes working with vectors in that space and sometimes vectors in its dual space, we say that the vectors in $V$ are contravariant because we divide their numerical representations when we multiply basis vectors in $V$, and the dual vectors in $V^*$ are covariant because we multiply their numeric representations when we multiply basis vectors in $V$.
With all that out of the way, let's revisit Speedy's speed. When we say ‘60 cm/min’, we are expressing an exchange rate of a certain number of centimeters for every minute provided. Much like a frequency was a linear function from an amount of time to a dimensionless quantity, a speed is a linear function from an amount of time to an amount of one-dimensional space. If Speedy can travel 60 centimeters given one minute, Speedy can travel 120 centimeters given two minutes, and so on. Unlike frequency, the result of a speed when given an amount of time is not a dimensionless quantity, but another vector. So a speed is a linear function from vectors in one space to vectors in another.
The original math problem asked us to evaluate the function ‘60 cm/min’ on the vector ‘1 s’. We could do this by translating 1 s to $\frac{1}{60} \\,\text{min}$, since time vectors are contravariant with respect to time units. Then $\frac{1}{60} \cdot 60\\,\text{cm} = 1\\,\text{cm}$ is the result of the function and the answer to the question. But instead, we could translate 60 cm/min directly to cm/s. We ought to get the same answer either way, so we know that 60 cm/min must equal 1 cm/s. This implies that speed is covariant with respect to time units; when we scale down our unit of time, we scale down our representation of speed by the same amount. However, if we wanted to translate from cm/min to mm/min instead for some reason, we would have to scale *up* our representation of speed to 600 mm/min; speed is contravariant with respect to space units.
All this without having to talk about tensors yet! Let's fix that.
Speedy's cousin Sticky is treasure hunting and has mounted a compass-like metal detector to her shell. At Sticky's current location, for every centimeter Sticky creeps north, the needle of the detector dips down 0.017 mm and to the left 0.003 mm; for every centimeter Sticky creeps east, the needle moves to the left 0.058 mm. Assuming the detector has a linear response in Sticky's vicinity, how far does the needle move in total if she slides a total of 1.5 cm north and 0.5 cm west?
The intended solution to this problem is quite straightforward: the needle moves $1.5 \cdot -0.017\\,\text{mm} - 0.5 \cdot 0\\,\text{mm} = -0.0255\\,\text{mm}$ vertically, and $1.5 \cdot -0.003\\,\text{mm} - 0.5 \cdot -0.058\\,\text{mm} = 0.0245\\,\text{mm}$ horizontally. But just as with Speedy's speed, let's see what kind of mathematical object is represented by the response of Sticky's metal detector.
Just as a speed expresses an exchange rate between one-dimensional space and time, the numbers in this problem express exchange rates between two different two-dimensional spaces—namely, Sticky's displacement and the needle's displacement. The problem tells us to assume this relationship is linear, so it's natural to try to look at these numbers as a linear function from $\text{2-Space} \to \text{2-Space}$. Indeed, that's precisely what they are, and if you've done any linear algebra you might expect such a linear function between vector spaces to be represented as a matrix:
$$
\left\[
\begin{array}{cc}
-0.017 & 0 \\\\
-0.003 & -0.058
\end{array}
\right\]
$$
Instead, let's look at each element individually, and without dropping the units. The first element we're given is that the needle dips down 0.017 mm for every centimeter Sticky travels north—which is to say, each vector $1\\,\text{cm north}$ in the input contributes the vector $-0.017\\,\text{mm up}$ to the output. Notice that this component all by itself is a linear function from $\text{2-Space} \to \text{2-Space}$. So are the other two non-zero components given. So another way to represent this response is as the sum of three primitive linear functions, each of which consumes one basis vector in $\text{2-Space}$ and produces a multiple of a basis vector in the other $\text{2-Space}$. For a given basis for each copy of $\text{2-Space}$, there are only four total such combinations—and of course, these correspond to the four components of the matrix above. The symmetry of this representation suggests an alternate way to look at these primitive functions: as the Cartesian product of basis elements that represent a vector and basis elements that linearly accept a vector (or, in other words, basis linear forms).
This motivates the definition of a tensor product of vector spaces (over a common field of scalars): a new vector space generated by tuples of one basis element from each of the factor vector spaces. We write $V \otimes W$ for the tensor product of vector spaces $V$ and $W$, as well as $\vec e \otimes \vec f$ for the ‘basis tensor’ represented by the pair of basis vectors $\vec e$ and $\vec f$. The tensor product $\vec v \otimes \vec w$ for arbitrary vectors $\vec v$ and $\vec w$ is defined by extension, by distributing the $\otimes$ operator over the decomposition of $\vec v$ and $\vec w$ into linear combinations of basis vectors. (Tensor products can also be defined without referencing specific bases, but that definition is a little more abstract and thus perhaps harder to connect to intuition. But it is useful to know that just like many vector identities are true regardless of the basis being used, so are the equivalent tensor identities—the bases are only useful for writing specific vectors and tensors down.)
So Sticky's metal detector response can also be represented as a tensor in $\text{2-Space} \otimes \text{2-Space}^\*$. Specifically, it is the tensor
$$
\begin{multline}
-0.017\\,\text{mm up}\otimes(\text{cm north})^{-1} - 0.003\\,\text{mm right}\otimes(\text{cm north})^{-1}\\\\
\- 0.058\\,\text{mm right}\otimes(\text{cm east})^{-1}\\;\text{.}
\end{multline}
$$
In fact, any space of linear functions from a vector space $V$ to a vector space $W$ is equivalent to the tensor product $W \otimes V^\*$, and this generalizes to multilinear functions of multiple vector spaces and so on. (This is why I described Speedy's speed as a tensor as well, at the start of this post.) By applying reasoning that should now be familiar, we can see that this representation of Sticky's detector is contravariant with respect to the space of the needle's displacement vectors, and covariant with respect to the space of Sticky's displacement vectors.
Finally, let's look at what happens when we solve the Sticky problem. Just like Speedy's speed was a function that we applied to an input vector to get an output vector, Sticky's detector response is also such a function. We simply have to apply the above tensor as a function to the vector $1.5\\,\text{cm north} - 0.5\\,\text{cm east}$ to get the answer. The way we do that is by looking at the tensor when decomposed into primitive linear functions operating on single basis vectors, matching each basis-vector-consuming term to the corresponding term in the input vector, and summing. Let's call the detector tensor $T$, and call the coefficients of the tensor $T_{\text{cm north}}^{\text{mm up}}$, $T_{\text{cm east}}^{\text{mm up}}$, $T_{\text{cm north}}^{\text{mm right}}$, and $T_{\text{cm east}}^{\text{mm right}}$ in what I hope is the obvious way (though note that in this specific problem, $T_{\text{cm east}}^{\text{mm up}} = 0$). Following convention, I've placed the contravariant basis vectors in the upper position and the covariant basis vectors in the lower position. Similarly, let's represent Sticky's displacement as $v^{\text{cm north}}$ and $v^{\text{cm east}}$. Then the answer vector $\vec w$ was given by:
$$
\begin{align}
w\^{\text{mm up}} &= T_{\text{cm north}}\^{\text{mm up}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm up}} v\^{\text{cm east}} \\\\
w\^{\text{mm right}} &= T_{\text{cm north}}\^{\text{mm right}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm right}} v\^{\text{cm east}}
\end{align}
$$
Now we can save quite a bit of writing by abstracting over the basis vectors with indices. Let the index $s$ represent one of the Sticky displacement basis vectors $\\{\text{cm north}, \text{cm east}\\}$, and let the index $n$ represent one of the needle movement basis vectors $\\{\text{mm up}, \text{mm right}\\}$. Then we can replace the above equations with the general:
$$
w^n = \sum_s T_s^n v^s
$$
This operation of matching a covariant index with a contravariant index and summing is called contraction. It is extremely common, and also quite intuitive once you're comfortable with the underlying concepts—you're just feeding an output vector into a linear form which expects an input vector! Contraction is such a common operation in tensor algebra that there is a convention, [Einstein notation](https://en.wikipedia.org/wiki/Einstein_summation), which enables the $\sum$ to be omitted on terms where the same index variable is used once in an upper position and once in a lower position.
So in summary:
* Tensors are combinations of vectors from multiple vector spaces.
* Basis vectors are like physical units; tensors are like multidimensional quantities involving several units.
* A space of tensors can be covariant or contravariant in an underlying vector space, depending on whether it combines vectors or dual vectors from that space. The terminology just describes how the coefficients of tensors need to be adjusted if the basis of that vector space is scaled.
* But tensors covariant in a given vector space also represent linear functions consuming vectors from that space, while tensors contravariant in a vector space represent producing vectors in that space.
* A variable with upper and/or lower indices represents a coefficient from a tensor given some set of bases for the vector spaces being tensored.
* Upper indices correspond to contravariant basis vectors and lower indices correspond to covariant basis vectors.
* Matching upper and lower indices are meant to be summed over, which corresponds to feeding contravariant vectors to linear forms, a.k.a. covariant vectors.

#6: Post edited by $user avatar$ r~~‭ · 2021-11-02T17:28:07Z (over 3 years ago)

Copy Link

Raw

Markdown

A textbook homework problem might ask, ‘Speedy the snail creeps along at a steady pace of 60 cm per minute. How far does Speedy travel each second?’ The correct answer is, of course, one centimeter. Hopefully your teacher would also accept 10 mm or 0.01 m as equally valid answers; after all, they all represent the same physical length. But how do we know that? And how do we get that answer from 60 cm/min? What even is a quantity like 60 cm/min, mathematically?
Well, I'll make the case that 60 cm/min is a tensor in $\text{1-Space} \otimes \text{Time}^*$ expressed using the bases $((1\\, \text{cm}), (1\\, \text{min}^{-1}))$, where the first basis is contravariant relative to the units we usually use for physics, and the second basis is covariant relative to those units. All of the above questions follow from this. But I'm getting ahead of myself!
When we work with units such as meters, centimeters, minutes, seconds, etc., we bring a certain set of assumptions along. For starters, we assume that the quantities that we use units to represent can be added to each other if the units match: 1 meter + 1 meter = 2 meters. We also assume that these quantities can be multiplied by dimensionless quantities to get more dimensioned quantities in the same units: 5(2 m) = 10 m. We assume a bunch of intuitive facts that more or less boil down to not caring in what order any of these additions or multiplications are performed: 1 m + 2 m = 2 m + 1 m, 1 m + (2 m + 3 m) = (1 m + 2 m) + 3 m, (2 + 3)(10 m) = 2(10 m) + 3(10 m), 2(3(5 m)) = (2×3)(5 m), etc. Some of our units are compatible with each other, like meters and centimeters or minutes and seconds, and if we appropriately scale the quantities expressed in those units, they can also be added with each other. Finally, we assume that there's a ‘nothing’ associated with each of these families of compatible units, and that it doesn't matter which unit is used to express that nothing: 0 m = 0 cm. We expect it to behave like a proper nothing: 0 m + 5 m = 5 m, and 0(7 m) = 0 m.
Those assumptions are precisely what make each ‘family of compatible units’ into a vector space, for which the appropriate units are alternate choices of bases. I'll call the vector space for which meters are appropriate basis elements $\text{1-Space}$ (because it's only one dimension of space, not the three we often work with), and the vector space for which seconds are appropriate basis elements $\text{Time}$. These are both one-dimensional vector spaces, and as such vectors in these spaces only require a single number to represent them. It may seem that there isn't much point to calling them vectors at all, but we'll see how these concepts generalize to more interesting vector spaces later.
Here's a question: is a [hertz](https://en.wikipedia.org/wiki/Hertz) (Hz, or $\text{s}^{-1}$) a vector in $\text{Time}$? Say Alice types at a rate such that she presses a key five times in a second; we can say she types at 5 Hz. If Bob types at 4 Hz, then together, nine keys are being pressed in a second; Alice and Bob together type at 9 Hz. So it certainly seems like frequencies can be meaningfully added. You can check that the rest of the vector axioms end up holding as well, and so it does seem like a hertz is a vector in some vector space. But that space isn't $\text{Time}$; adding hertz and seconds isn't a thing we can ever make sense of.
However, there is a relationship between frequency and time: a frequency tells you how much of some dimensionless quantity you get for a given amount of time. If you type at 5 Hz for 1 s, you have pressed 5 keys. If you type at 5 Hz for 4 s, you have pressed 20 keys; the result scales linearly with the amount of time. So we can say that a frequency, in addition to being a vector in its own right, is also a linear function from a vector in $\text{Time}$ to dimensionless quantities. Linear functions from vector spaces to scalars are so important that they have gotten a bunch of special names: we call them linear forms, or one-forms, or dual vectors, or covectors. For *any* vector space $V$, the linear forms of $V$ form their own vector space called the dual space of $V$, denoted $V^\*$. So the vector space of frequencies is actually $\text{Time}^\*$, the dual space of $\text{Time}$.
When I was talking about compatible units before, I was vague about how to scale values when converting between units. Let's hammer that out now. We know that one meter equals 100 centimeters. So if we want to express 2 m in centimeters, we know to multiply 2 by 100 to get 200 cm. In general, in a vector space, if you have expressed a vector in some basis, and then you want to change your basis by dividing it by some amount, you have to multiply the numbers in your vector representation by that same amount. That's all that's happening here: as we change basis from 1 m to 1 cm (dividing by 100), we have to multiply the number 2 by the same number, 100, to get 200 cm from 2 m. (And of course, since dividing by $x$ is the same as multiplying by $\frac{1}{x}$, we can equivalently say that if you change basis by multiplying by some amount, you have to divide the numbers in your vector representation by that amount.)
The same logic applies for time: 2 min = 120 s because we divide a minute by 60 to get a second, and so we have to multiply 2 by 60 to get the equivalent representation in seconds. Similarly for frequency: 4 kHz = 4000 Hz, because 1 kHz = 1000 Hz. But now notice something interesting—frequency, remember, is a function from an amount of time to a quantity. If something is happening at 4 Hz for 60 seconds, it happens 240 times. What if we want to change units from seconds to minutes? We'd like to express 4 Hz, which is the same as 4 $\text{s}^{-1}$, in $\text{min}^{-1}$ instead. Well, the frequency itself isn't changing; it must still be the case that something occurring at this frequency for 60 seconds, or one minute, results in 240 events. That means that 4 Hz = 240 $\text{min}^{-1}$. So this time, multiplying the unit by 60 meant *also multiplying* the numeric representation of the frequency by 60. But this isn't mysterious at all; we did something different here than in the previous examples. Before, we were changing the same units we were using in our representation; this time, we were changing units of time and using units of frequency. In the language of linear algebra, we scaled our basis in one vector space, $\text{Time}$, and scaled the numeric representation of vectors in the *dual* vector space $\text{Time}^*$ by the same amount. This is also a general rule for any vector space and its dual.
The words ‘contravariant’ and ‘covariant’ just distinguish between these two rules. When there is a particular vector space of interest $V$, and we are sometimes working with vectors in that space and sometimes vectors in its dual space, we say that the vectors in $V$ are contravariant because we divide their numerical representations when we multiply basis vectors in $V$, and the dual vectors in $V^*$ are covariant because we multiply their numeric representations when we multiply basis vectors in $V$.
With all that out of the way, let's revisit Speedy's speed. When we say ‘60 cm/min’, we are expressing an exchange rate of a certain number of centimeters for every minute provided. Much like a frequency was a linear function from an amount of time to a dimensionless quantity, a speed is a linear function from an amount of time to an amount of one-dimensional space. If Speedy can travel 60 centimeters given one minute, Speedy can travel 120 centimeters given two minutes, and so on. Unlike frequency, the result of a speed when given an amount of time is not a dimensionless quantity, but another vector. So a speed is a linear function from vectors in one space to vectors in another.
The original math problem asked us to evaluate the function ‘60 cm/min’ on the vector ‘1 s’. We could do this by translating 1 s to $\frac{1}{60} \\,\text{min}$, since time vectors are contravariant with respect to time units. Then $\frac{1}{60} \cdot 60\\,\text{cm} = 1\\,\text{cm}$ is the result of the function and the answer to the question. But instead, we could translate 60 cm/min directly to cm/s. We ought to get the same answer either way, so we know that 60 cm/min must equal 1 cm/s. This implies that speed is covariant with respect to time units; when we scale down our unit of time, we scale down our representation of speed by the same amount. However, if we wanted to translate from cm/min to mm/min instead for some reason, we would have to scale *up* our representation of speed to 600 mm/min; speed is contravariant with respect to space units.
All this without having to talk about tensors yet! Let's fix that.
Speedy's cousin Sticky is treasure hunting and has mounted a compass-like metal detector to her shell. At Sticky's current location, for every centimeter Sticky creeps north, the needle of the detector dips down 0.017 mm and to the left 0.003 mm; for every centimeter Sticky creeps east, the needle moves to the left 0.058 mm. Assuming the detector has a linear response in Sticky's vicinity, how far does the needle move in total if she slides a total of 1.5 cm north and 0.5 cm west?
The intended solution to this problem is quite straightforward: the needle moves $1.5 \cdot -0.017\\,\text{mm} - 0.5 \cdot 0\\,\text{mm} = -0.0255\\,\text{mm}$ vertically, and $1.5 \cdot -0.003\\,\text{mm} - 0.5 \cdot -0.058\\,\text{mm} = 0.0245\\,\text{mm}$ horizontally. But just as with Speedy's speed, let's see what kind of mathematical object is represented by the response of Sticky's metal detector.
Just as a speed expresses an exchange rate between one-dimensional space and time, the numbers in this problem express exchange rates between two different two-dimensional spaces—namely, Sticky's displacement and the needle's displacement. The problem tells us to assume this relationship is linear, so it's natural to try to look at these numbers as a linear function from $\text{2-Space} \to \text{2-Space}$. Indeed, that's precisely what they are, and if you've done any linear algebra you might expect such a linear function between vector spaces to be represented as a matrix:
$$
\left\[
\begin{array}{cc}
-0.017 & 0 \\\\
-0.003 & -0.058
\end{array}
\right\]
$$
Instead, let's look at each element individually, and without dropping the units. The first element we're given is that the needle dips down 0.017 mm for every centimeter Sticky travels north—which is to say, each vector $1\\,\text{cm north}$ in the input contributes the vector $-0.017\\,\text{mm up}$ to the output. Notice that this component all by itself is a linear function from $\text{2-Space} \to \text{2-Space}$. So are the other two non-zero components given. So another way to represent this response is as the sum of three primitive linear functions, each of which consumes one basis vector in $\text{2-Space}$ and produces a multiple of a basis vector in the other $\text{2-Space}$. For a given basis for each copy of $\text{2-Space}$, there are only four total such combinations—and of course, these correspond to the four components of the matrix above. The symmetry of this representation suggests an alternate way to look at these primitive functions: as the Cartesian product of basis elements that represent a vector and basis elements that linearly accept a vector (or, in other words, basis linear forms).
This motivates the definition of a tensor product of vector spaces (over a common field of scalars): a new vector space generated by tuples of one basis element from each of the factor vector spaces. We write $V \otimes W$ for the tensor product of vector spaces $V$ and $W$, as well as $\vec e \otimes \vec f$ for the ‘basis tensor’ represented by the pair of basis vectors $\vec e$ and $\vec f$. The tensor product $\vec v \otimes \vec w$ for arbitrary vectors $\vec v$ and $\vec w$ is defined by extension, by distributing the $\otimes$ operator over the decomposition of $\vec v$ and $\vec w$ into linear combinations of basis vectors. (Tensor products can also be defined without referencing specific bases, but that definition is a little more abstract and thus perhaps harder to connect to intuition. But it is useful to know that just like many vector identities are true regardless of the basis being used, so are the equivalent tensor identities—the bases are only useful for writing specific vectors and tensors down.)
So Sticky's metal detector response can also be represented as a tensor in $\text{2-Space} \otimes \text{2-Space}^\*$. Specifically, it is the tensor
$$
\begin{multline}
-0.017\\,\text{mm up}\otimes(\text{cm north})^{-1} - 0.003\\,\text{mm right}\otimes(\text{cm north})^{-1}\\\\
\- 0.058\\,\text{mm right}\otimes(\text{cm east})^{-1}\\;\text{.}
\end{multline}
$$
In fact, any space of linear functions from a vector space $V$ to a vector space $W$ is equivalent to the tensor product $W \otimes V^\*$, and this generalizes to multilinear functions of multiple vector spaces and so on. (This is why I described Speedy's speed as a tensor as well, at the start of this post.) By applying reasoning that should now be familiar, we can see that this representation of Sticky's detector is contravariant with respect to the space of the needle's displacement vectors, and covariant with respect to the space of Sticky's displacement vectors.
Finally, let's look at what happens when we solve the Sticky problem. Just like Speedy's speed was a function that we applied to an input vector to get an output vector, Sticky's detector response is also such a function. We simply have to apply the above tensor as a function to the vector $1.5\\,\text{cm north} - 0.5\\,\text{cm east}$ to get the answer. The way we do that is by looking at the tensor when decomposed into primitive linear functions operating on single basis vectors, matching each basis-vector-consuming term to the corresponding term in the input vector, and summing. Let's call the detector tensor $T$, and call the coefficients of the tensor $T_{\text{cm north}}^{\text{mm up}}$, $T_{\text{cm east}}^{\text{mm up}}$, $T_{\text{cm north}}^{\text{mm right}}$, and $T_{\text{cm east}}^{\text{mm right}}$ in what I hope is the obvious way. Following convention, I've placed the contravariant basis vectors in the upper position and the covariant basis vectors in the lower position. Similarly, let's represent Sticky's displacement as $v^{\text{cm north}}$ and $v^{\text{cm east}}$. Then the answer vector $\vec w$ was given by:
$$
\begin{align}
w\^{\text{mm up}} &= T_{\text{cm north}}\^{\text{mm up}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm up}} v\^{\text{cm east}} \\\\
w\^{\text{mm right}} &= T_{\text{cm north}}\^{\text{mm right}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm right}} v\^{\text{cm east}}
\end{align}
$$
Now we can save quite a bit of writing by abstracting over the basis vectors with indices. Let the index $s$ represent one of the Sticky displacement basis vectors $\\{\text{cm north}, \text{cm east}\\}$, and let the index $n$ represent one of the needle movement basis vectors $\\{\text{mm up}, \text{mm right}\\}$. Then we can replace the above equations with the general:
$$
w^n = \sum_d T_d^n v^d
$$
This operation of matching a covariant index with a contravariant index and summing is called contraction. It is extremely common, and also quite intuitive once you're comfortable with the underlying concepts—you're just feeding an output vector into a linear form which expects an input vector! Contraction is such a common operation in tensor algebra that there is a convention, [Einstein notation](https://en.wikipedia.org/wiki/Einstein_summation), which enables the $\sum$ to be omitted on terms where the same index variable is used once in an upper position and once in a lower position.
So in summary:
* Tensors are combinations of vectors from multiple vector spaces.
* Basis vectors are like physical units; tensors are like multidimensional quantities involving several units.
* A space of tensors can be covariant or contravariant in an underlying vector space, depending on whether it combines vectors or dual vectors from that space. The terminology just describes how the coefficients of tensors need to be adjusted if the basis of that vector space is scaled.
* But tensors covariant in a given vector space also represent linear functions consuming vectors from that space, while tensors contravariant in a vector space represent producing vectors in that space.
* A variable with upper and/or lower indices represents a coefficient from a tensor given some set of bases for the vector spaces being tensored.
* Upper indices correspond to contravariant basis vectors and lower indices correspond to covariant basis vectors.
* Matching upper and lower indices are meant to be summed over, which corresponds to feeding contravariant vectors to linear forms, a.k.a. covariant vectors.

A textbook homework problem might ask, ‘Speedy the snail creeps along at a steady pace of 60 cm per minute. How far does Speedy travel each second?’ The correct answer is, of course, one centimeter. Hopefully your teacher would also accept 10 mm or 0.01 m as equally valid answers; after all, they all represent the same physical length. But how do we know that? And how do we get that answer from 60 cm/min? What even is a quantity like 60 cm/min, mathematically?
Well, I'll make the case that 60 cm/min is a tensor in $\text{1-Space} \otimes \text{Time}^*$ expressed using the bases $((1\\, \text{cm}), (1\\, \text{min}^{-1}))$, where the first basis is contravariant relative to the units we usually use for physics, and the second basis is covariant relative to those units. All of the above questions follow from this. But I'm getting ahead of myself!
When we work with units such as meters, centimeters, minutes, seconds, etc., we bring a certain set of assumptions along. For starters, we assume that the quantities that we use units to represent can be added to each other if the units match: 1 meter + 1 meter = 2 meters. We also assume that these quantities can be multiplied by dimensionless quantities to get more dimensioned quantities in the same units: 5(2 m) = 10 m. We assume a bunch of intuitive facts that more or less boil down to not caring in what order any of these additions or multiplications are performed: 1 m + 2 m = 2 m + 1 m, 1 m + (2 m + 3 m) = (1 m + 2 m) + 3 m, (2 + 3)(10 m) = 2(10 m) + 3(10 m), 2(3(5 m)) = (2×3)(5 m), etc. Some of our units are compatible with each other, like meters and centimeters or minutes and seconds, and if we appropriately scale the quantities expressed in those units, they can also be added with each other. Finally, we assume that there's a ‘nothing’ associated with each of these families of compatible units, and that it doesn't matter which unit is used to express that nothing: 0 m = 0 cm. We expect it to behave like a proper nothing: 0 m + 5 m = 5 m, and 0(7 m) = 0 m.
Those assumptions are precisely what make each ‘family of compatible units’ into a vector space, for which the appropriate units are alternate choices of bases. I'll call the vector space for which meters are appropriate basis elements $\text{1-Space}$ (because it's only one dimension of space, not the three we often work with), and the vector space for which seconds are appropriate basis elements $\text{Time}$. These are both one-dimensional vector spaces, and as such vectors in these spaces only require a single number to represent them. It may seem that there isn't much point to calling them vectors at all, but we'll see how these concepts generalize to more interesting vector spaces later.
Here's a question: is a [hertz](https://en.wikipedia.org/wiki/Hertz) (Hz, or $\text{s}^{-1}$) a vector in $\text{Time}$? Say Alice types at a rate such that she presses a key five times in a second; we can say she types at 5 Hz. If Bob types at 4 Hz, then together, nine keys are being pressed in a second; Alice and Bob together type at 9 Hz. So it certainly seems like frequencies can be meaningfully added. You can check that the rest of the vector axioms end up holding as well, and so it does seem like a hertz is a vector in some vector space. But that space isn't $\text{Time}$; adding hertz and seconds isn't a thing we can ever make sense of.
However, there is a relationship between frequency and time: a frequency tells you how much of some dimensionless quantity you get for a given amount of time. If you type at 5 Hz for 1 s, you have pressed 5 keys. If you type at 5 Hz for 4 s, you have pressed 20 keys; the result scales linearly with the amount of time. So we can say that a frequency, in addition to being a vector in its own right, is also a linear function from a vector in $\text{Time}$ to dimensionless quantities. Linear functions from vector spaces to scalars are so important that they have gotten a bunch of special names: we call them linear forms, or one-forms, or dual vectors, or covectors. For *any* vector space $V$, the linear forms of $V$ form their own vector space called the dual space of $V$, denoted $V^\*$. So the vector space of frequencies is actually $\text{Time}^\*$, the dual space of $\text{Time}$.
When I was talking about compatible units before, I was vague about how to scale values when converting between units. Let's hammer that out now. We know that one meter equals 100 centimeters. So if we want to express 2 m in centimeters, we know to multiply 2 by 100 to get 200 cm. In general, in a vector space, if you have expressed a vector in some basis, and then you want to change your basis by dividing it by some amount, you have to multiply the numbers in your vector representation by that same amount. That's all that's happening here: as we change basis from 1 m to 1 cm (dividing by 100), we have to multiply the number 2 by the same number, 100, to get 200 cm from 2 m. (And of course, since dividing by $x$ is the same as multiplying by $\frac{1}{x}$, we can equivalently say that if you change basis by multiplying by some amount, you have to divide the numbers in your vector representation by that amount.)
The same logic applies for time: 2 min = 120 s because we divide a minute by 60 to get a second, and so we have to multiply 2 by 60 to get the equivalent representation in seconds. Similarly for frequency: 4 kHz = 4000 Hz, because 1 kHz = 1000 Hz. But now notice something interesting—frequency, remember, is a function from an amount of time to a quantity. If something is happening at 4 Hz for 60 seconds, it happens 240 times. What if we want to change units from seconds to minutes? We'd like to express 4 Hz, which is the same as 4 $\text{s}^{-1}$, in $\text{min}^{-1}$ instead. Well, the frequency itself isn't changing; it must still be the case that something occurring at this frequency for 60 seconds, or one minute, results in 240 events. That means that 4 Hz = 240 $\text{min}^{-1}$. So this time, multiplying the unit by 60 meant *also multiplying* the numeric representation of the frequency by 60. But this isn't mysterious at all; we did something different here than in the previous examples. Before, we were changing the same units we were using in our representation; this time, we were changing units of time and using units of frequency. In the language of linear algebra, we scaled our basis in one vector space, $\text{Time}$, and scaled the numeric representation of vectors in the *dual* vector space $\text{Time}^*$ by the same amount. This is also a general rule for any vector space and its dual.
The words ‘contravariant’ and ‘covariant’ just distinguish between these two rules. When there is a particular vector space of interest $V$, and we are sometimes working with vectors in that space and sometimes vectors in its dual space, we say that the vectors in $V$ are contravariant because we divide their numerical representations when we multiply basis vectors in $V$, and the dual vectors in $V^*$ are covariant because we multiply their numeric representations when we multiply basis vectors in $V$.
With all that out of the way, let's revisit Speedy's speed. When we say ‘60 cm/min’, we are expressing an exchange rate of a certain number of centimeters for every minute provided. Much like a frequency was a linear function from an amount of time to a dimensionless quantity, a speed is a linear function from an amount of time to an amount of one-dimensional space. If Speedy can travel 60 centimeters given one minute, Speedy can travel 120 centimeters given two minutes, and so on. Unlike frequency, the result of a speed when given an amount of time is not a dimensionless quantity, but another vector. So a speed is a linear function from vectors in one space to vectors in another.
The original math problem asked us to evaluate the function ‘60 cm/min’ on the vector ‘1 s’. We could do this by translating 1 s to $\frac{1}{60} \\,\text{min}$, since time vectors are contravariant with respect to time units. Then $\frac{1}{60} \cdot 60\\,\text{cm} = 1\\,\text{cm}$ is the result of the function and the answer to the question. But instead, we could translate 60 cm/min directly to cm/s. We ought to get the same answer either way, so we know that 60 cm/min must equal 1 cm/s. This implies that speed is covariant with respect to time units; when we scale down our unit of time, we scale down our representation of speed by the same amount. However, if we wanted to translate from cm/min to mm/min instead for some reason, we would have to scale *up* our representation of speed to 600 mm/min; speed is contravariant with respect to space units.
All this without having to talk about tensors yet! Let's fix that.
Speedy's cousin Sticky is treasure hunting and has mounted a compass-like metal detector to her shell. At Sticky's current location, for every centimeter Sticky creeps north, the needle of the detector dips down 0.017 mm and to the left 0.003 mm; for every centimeter Sticky creeps east, the needle moves to the left 0.058 mm. Assuming the detector has a linear response in Sticky's vicinity, how far does the needle move in total if she slides a total of 1.5 cm north and 0.5 cm west?
The intended solution to this problem is quite straightforward: the needle moves $1.5 \cdot -0.017\\,\text{mm} - 0.5 \cdot 0\\,\text{mm} = -0.0255\\,\text{mm}$ vertically, and $1.5 \cdot -0.003\\,\text{mm} - 0.5 \cdot -0.058\\,\text{mm} = 0.0245\\,\text{mm}$ horizontally. But just as with Speedy's speed, let's see what kind of mathematical object is represented by the response of Sticky's metal detector.
Just as a speed expresses an exchange rate between one-dimensional space and time, the numbers in this problem express exchange rates between two different two-dimensional spaces—namely, Sticky's displacement and the needle's displacement. The problem tells us to assume this relationship is linear, so it's natural to try to look at these numbers as a linear function from $\text{2-Space} \to \text{2-Space}$. Indeed, that's precisely what they are, and if you've done any linear algebra you might expect such a linear function between vector spaces to be represented as a matrix:
$$
\left\[
\begin{array}{cc}
-0.017 & 0 \\\\
-0.003 & -0.058
\end{array}
\right\]
$$
Instead, let's look at each element individually, and without dropping the units. The first element we're given is that the needle dips down 0.017 mm for every centimeter Sticky travels north—which is to say, each vector $1\\,\text{cm north}$ in the input contributes the vector $-0.017\\,\text{mm up}$ to the output. Notice that this component all by itself is a linear function from $\text{2-Space} \to \text{2-Space}$. So are the other two non-zero components given. So another way to represent this response is as the sum of three primitive linear functions, each of which consumes one basis vector in $\text{2-Space}$ and produces a multiple of a basis vector in the other $\text{2-Space}$. For a given basis for each copy of $\text{2-Space}$, there are only four total such combinations—and of course, these correspond to the four components of the matrix above. The symmetry of this representation suggests an alternate way to look at these primitive functions: as the Cartesian product of basis elements that represent a vector and basis elements that linearly accept a vector (or, in other words, basis linear forms).
This motivates the definition of a tensor product of vector spaces (over a common field of scalars): a new vector space generated by tuples of one basis element from each of the factor vector spaces. We write $V \otimes W$ for the tensor product of vector spaces $V$ and $W$, as well as $\vec e \otimes \vec f$ for the ‘basis tensor’ represented by the pair of basis vectors $\vec e$ and $\vec f$. The tensor product $\vec v \otimes \vec w$ for arbitrary vectors $\vec v$ and $\vec w$ is defined by extension, by distributing the $\otimes$ operator over the decomposition of $\vec v$ and $\vec w$ into linear combinations of basis vectors. (Tensor products can also be defined without referencing specific bases, but that definition is a little more abstract and thus perhaps harder to connect to intuition. But it is useful to know that just like many vector identities are true regardless of the basis being used, so are the equivalent tensor identities—the bases are only useful for writing specific vectors and tensors down.)
So Sticky's metal detector response can also be represented as a tensor in $\text{2-Space} \otimes \text{2-Space}^\*$. Specifically, it is the tensor
$$
\begin{multline}
-0.017\\,\text{mm up}\otimes(\text{cm north})^{-1} - 0.003\\,\text{mm right}\otimes(\text{cm north})^{-1}\\\\
\- 0.058\\,\text{mm right}\otimes(\text{cm east})^{-1}\\;\text{.}
\end{multline}
$$
In fact, any space of linear functions from a vector space $V$ to a vector space $W$ is equivalent to the tensor product $W \otimes V^\*$, and this generalizes to multilinear functions of multiple vector spaces and so on. (This is why I described Speedy's speed as a tensor as well, at the start of this post.) By applying reasoning that should now be familiar, we can see that this representation of Sticky's detector is contravariant with respect to the space of the needle's displacement vectors, and covariant with respect to the space of Sticky's displacement vectors.
Finally, let's look at what happens when we solve the Sticky problem. Just like Speedy's speed was a function that we applied to an input vector to get an output vector, Sticky's detector response is also such a function. We simply have to apply the above tensor as a function to the vector $1.5\\,\text{cm north} - 0.5\\,\text{cm east}$ to get the answer. The way we do that is by looking at the tensor when decomposed into primitive linear functions operating on single basis vectors, matching each basis-vector-consuming term to the corresponding term in the input vector, and summing. Let's call the detector tensor $T$, and call the coefficients of the tensor $T_{\text{cm north}}^{\text{mm up}}$, $T_{\text{cm east}}^{\text{mm up}}$, $T_{\text{cm north}}^{\text{mm right}}$, and $T_{\text{cm east}}^{\text{mm right}}$ in what I hope is the obvious way (though note that in this specific problem, $T_{\text{cm east}}^{\text{mm up}} = 0$). Following convention, I've placed the contravariant basis vectors in the upper position and the covariant basis vectors in the lower position. Similarly, let's represent Sticky's displacement as $v^{\text{cm north}}$ and $v^{\text{cm east}}$. Then the answer vector $\vec w$ was given by:
$$
\begin{align}
w\^{\text{mm up}} &= T_{\text{cm north}}\^{\text{mm up}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm up}} v\^{\text{cm east}} \\\\
w\^{\text{mm right}} &= T_{\text{cm north}}\^{\text{mm right}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm right}} v\^{\text{cm east}}
\end{align}
$$
Now we can save quite a bit of writing by abstracting over the basis vectors with indices. Let the index $s$ represent one of the Sticky displacement basis vectors $\\{\text{cm north}, \text{cm east}\\}$, and let the index $n$ represent one of the needle movement basis vectors $\\{\text{mm up}, \text{mm right}\\}$. Then we can replace the above equations with the general:
$$
w^n = \sum_d T_d^n v^d
$$
This operation of matching a covariant index with a contravariant index and summing is called contraction. It is extremely common, and also quite intuitive once you're comfortable with the underlying concepts—you're just feeding an output vector into a linear form which expects an input vector! Contraction is such a common operation in tensor algebra that there is a convention, [Einstein notation](https://en.wikipedia.org/wiki/Einstein_summation), which enables the $\sum$ to be omitted on terms where the same index variable is used once in an upper position and once in a lower position.
So in summary:
* Tensors are combinations of vectors from multiple vector spaces.
* Basis vectors are like physical units; tensors are like multidimensional quantities involving several units.
* A space of tensors can be covariant or contravariant in an underlying vector space, depending on whether it combines vectors or dual vectors from that space. The terminology just describes how the coefficients of tensors need to be adjusted if the basis of that vector space is scaled.
* But tensors covariant in a given vector space also represent linear functions consuming vectors from that space, while tensors contravariant in a vector space represent producing vectors in that space.
* A variable with upper and/or lower indices represents a coefficient from a tensor given some set of bases for the vector spaces being tensored.
* Upper indices correspond to contravariant basis vectors and lower indices correspond to covariant basis vectors.
* Matching upper and lower indices are meant to be summed over, which corresponds to feeding contravariant vectors to linear forms, a.k.a. covariant vectors.

#5: Post edited by $user avatar$ r~~‭ · 2021-11-02T16:50:36Z (over 3 years ago)

Copy Link

Raw

Markdown

A textbook homework problem might ask, ‘Speedy the snail creeps along at a steady pace of 60 cm per minute. How far does Speedy travel each second?’ The correct answer is, of course, one centimeter. Hopefully your teacher would also accept 10 mm or 0.01 m as equally valid answers; after all, they all represent the same physical length. But how do we know that? And how do we get that answer from 60 cm/min? What even is a quantity like 60 cm/min, mathematically?
Well, as it happens, 60 cm/min is a tensor in $\text{1-Space} \otimes \text{Time}^*$ expressed using the bases $((1\\, \text{cm}), (1\\, \text{min}^{-1}))$, where the first basis is contravariant relative to the units we usually use for physics, and the second basis is covariant relative to those units. All of the above questions follow from this. But I'm getting ahead of myself!
When we work with units such as meters, centimeters, minutes, seconds, etc., we bring a certain set of assumptions along. For starters, we assume that the quantities that we use units to represent can be added to each other if the units match: 1 meter + 1 meter = 2 meters. We also assume that these quantities can be multiplied by dimensionless quantities to get more dimensioned quantities in the same units: 5(2 m) = 10 m. We assume a bunch of intuitive facts that more or less boil down to not caring in what order any of these additions or multiplications are performed: 1 m + 2 m = 2 m + 1 m, 1 m + (2 m + 3 m) = (1 m + 2 m) + 3 m, (2 + 3)(10 m) = 2(10 m) + 3(10 m), 2(3(5 m)) = (2×3)(5 m), etc. Some of our units are compatible with each other, like meters and centimeters or minutes and seconds, and if we appropriately scale the quantities expressed in those units, they can also be added with each other. Finally, we assume that there's a ‘nothing’ associated with each of these families of compatible units, and that it doesn't matter which unit is used to express that nothing: 0 m = 0 cm. We expect it to behave like a proper nothing: 0 m + 5 m = 5 m, and 0(7 m) = 0 m.
Those assumptions are precisely what make each ‘family of compatible units’ into a vector space, for which the appropriate units are alternate choices of bases. I'll call the vector space for which meters are appropriate basis elements $\text{1-Space}$ (because it's only one dimension of space, not the three we often work with), and the vector space for which seconds are appropriate basis elements $\text{Time}$. These are both one-dimensional vector spaces, and as such vectors in these spaces only require a single number to represent them. It may seem that there isn't much point to calling them vectors at all, but we'll see how these concepts generalize to more interesting vector spaces later.
Here's a question: is a [hertz](https://en.wikipedia.org/wiki/Hertz) (Hz, or $\text{s}^{-1}$) a vector in $\text{Time}$? Say Alice types at a rate such that she presses a key five times in a second; we can say she types at 5 Hz. If Bob types at 4 Hz, then together, nine keys are being pressed in a second; Alice and Bob together type at 9 Hz. So it certainly seems like frequencies can be meaningfully added. You can check that the rest of the vector axioms end up holding as well, and so it does seem like a hertz is a vector in some vector space. But that space isn't $\text{Time}$; adding hertz and seconds isn't a thing we can ever make sense of.
However, there is a relationship between frequency and time: a frequency tells you how much of some dimensionless quantity you get for a given amount of time. If you type at 5 Hz for 1 s, you have pressed 5 keys. If you type at 5 Hz for 4 s, you have pressed 20 keys; the result scales linearly with the amount of time. So we can say that a frequency, in addition to being a vector in its own right, is also a linear function from a vector in $\text{Time}$ to dimensionless quantities. Linear functions from vector spaces to scalars are so important that they have gotten a bunch of special names: we call them linear forms, or one-forms, or dual vectors, or covectors. For *any* vector space $V$, the linear forms of $V$ form their own vector space called the dual space of $V$, denoted $V^\*$. So the vector space of frequencies is actually $\text{Time}^\*$, the dual space of $\text{Time}$.
When I was talking about compatible units before, I was vague about how to scale values when converting between units. Let's hammer that out now. We know that one meter equals 100 centimeters. So if we want to express 2 m in centimeters, we know to multiply 2 by 100 to get 200 cm. In general, in a vector space, if you have expressed a vector in some basis, and then you want to change your basis by dividing it by some amount, you have to multiply the numbers in your vector representation by that same amount. That's all that's happening here: as we change basis from 1 m to 1 cm (dividing by 100), we have to multiply the number 2 by the same number, 100, to get 200 cm from 2 m. (And of course, since dividing by $x$ is the same as multiplying by $\frac{1}{x}$, we can equivalently say that if you change basis by multiplying by some amount, you have to divide the numbers in your vector representation by that amount.)
The same logic applies for time: 2 min = 120 s because we divide a minute by 60 to get a second, and so we have to multiply 2 by 60 to get the equivalent representation in seconds. Similarly for frequency: 4 kHz = 4000 Hz, because 1 kHz = 1000 Hz. But now notice something interesting—frequency, remember, is a function from an amount of time to a quantity. If something is happening at 4 Hz for 60 seconds, it happens 240 times. What if we want to change units from seconds to minutes? We'd like to express 4 Hz, which is the same as 4 $\text{s}^{-1}$, in $\text{min}^{-1}$ instead. Well, the frequency itself isn't changing; it must still be the case that something occurring at this frequency for 60 seconds, or one minute, results in 240 events. That means that 4 Hz = 240 $\text{min}^{-1}$. So this time, multiplying the unit by 60 meant *also multiplying* the numeric representation of the frequency by 60. But this isn't mysterious at all; we did something different here than in the previous examples. Before, we were changing the same units we were using in our representation; this time, we were changing units of time and using units of frequency. In the language of linear algebra, we scaled our basis in one vector space, $\text{Time}$, and scaled the numeric representation of vectors in the *dual* vector space $\text{Time}^*$ by the same amount. This is also a general rule for any vector space and its dual.
The words ‘contravariant’ and ‘covariant’ just distinguish between these two rules. When there is a particular vector space of interest $V$, and we are sometimes working with vectors in that space and sometimes vectors in its dual space, we say that the vectors in $V$ are contravariant because we divide their numerical representations when we multiply basis vectors in $V$, and the dual vectors in $V^*$ are covariant because we multiply their numeric representations when we multiply basis vectors in $V$.
With all that out of the way, let's revisit Speedy's speed. When we say ‘60 cm/min’, we are expressing an exchange rate of a certain number of centimeters for every minute provided. Much like a frequency was a linear function from an amount of time to a dimensionless quantity, a speed is a linear function from an amount of time to an amount of one-dimensional space. If Speedy can travel 60 centimeters given one minute, Speedy can travel 120 centimeters given two minutes, and so on. Unlike frequency, the result of a speed when given an amount of time is not a dimensionless quantity, but another vector. So a speed is a linear function from vectors in one space to vectors in another.
The original math problem asked us to evaluate the function ‘60 cm/min’ on the vector ‘1 s’. We could do this by translating 1 s to $\frac{1}{60} \\,\text{min}$, since time vectors are contravariant with respect to time units. Then $\frac{1}{60} \cdot 60\\,\text{cm} = 1\\,\text{cm}$ is the result of the function and the answer to the question. But instead, we could translate 60 cm/min directly to cm/s. We ought to get the same answer either way, so we know that 60 cm/min must equal 1 cm/s. This implies that speed is covariant with respect to time units; when we scale down our unit of time, we scale down our representation of speed by the same amount. However, if we wanted to translate from cm/min to mm/min instead for some reason, we would have to scale *up* our representation of speed to 600 mm/min; speed is contravariant with respect to space units.
All this without having to talk about tensors yet! Let's fix that.
Speedy's cousin Sticky is treasure hunting and has mounted a compass-like metal detector to her shell. At Sticky's current location, for every centimeter Sticky creeps north, the needle of the detector dips down 0.017 mm and to the left 0.003 mm; for every centimeter Sticky creeps east, the needle moves to the left 0.058 mm. Assuming the detector has a linear response in Sticky's vicinity, how far does the needle move in total if she slides a total of 1.5 cm north and 0.5 cm west?
The intended solution to this problem is quite straightforward: the needle moves $1.5 \cdot -0.017\\,\text{mm} - 0.5 \cdot 0\\,\text{mm} = -0.0255\\,\text{mm}$ vertically, and $1.5 \cdot -0.003\\,\text{mm} - 0.5 \cdot -0.058\\,\text{mm} = 0.0245\\,\text{mm}$ horizontally. But just as with Speedy's speed, let's see what kind of mathematical object is represented by the response of Sticky's metal detector.
Just as a speed expresses an exchange rate between one-dimensional space and time, the numbers in this problem express exchange rates between two different two-dimensional spaces—namely, Sticky's displacement and the needle's displacement. The problem tells us to assume this relationship is linear, so it's natural to try to look at these numbers as a linear function from $\text{2-Space} \to \text{2-Space}$. Indeed, that's precisely what they are, and if you've done any linear algebra you might expect such a linear function between vector spaces to be represented as a matrix:
$$
\left\[
\begin{array}{cc}
-0.017 & 0 \\\\
-0.003 & -0.058
\end{array}
\right\]
$$
Instead, let's look at each element individually, and without dropping the units. The first element we're given is that the needle dips down 0.017 mm for every centimeter Sticky travels north—which is to say, each vector $1\\,\text{cm north}$ in the input contributes the vector $-0.017\\,\text{mm up}$ to the output. Notice that this component all by itself is a linear function from $\text{2-Space} \to \text{2-Space}$. So are the other two non-zero components given. So another way to represent this response is as the sum of three primitive linear functions, each of which consumes one basis vector in $\text{2-Space}$ and produces a multiple of a basis vector in the other $\text{2-Space}$. For a given basis for each copy of $\text{2-Space}$, there are only four total such combinations—and of course, these correspond to the four components of the matrix above. The symmetry of this representation suggests an alternate way to look at these primitive functions: as the Cartesian product of basis elements that represent a vector and basis elements that linearly accept a vector (or, in other words, basis linear forms).
This motivates the definition of a tensor product of vector spaces (over a common field of scalars): a new vector space generated by tuples of one basis element from each of the factor vector spaces. We write $V \otimes W$ for the tensor product of vector spaces $V$ and $W$, as well as $\vec e \otimes \vec f$ for the ‘basis tensor’ represented by the pair of basis vectors $\vec e$ and $\vec f$. The tensor product $\vec v \otimes \vec w$ for arbitrary vectors $\vec v$ and $\vec w$ is defined by extension, by distributing the $\otimes$ operator over the decomposition of $\vec v$ and $\vec w$ into sums of basis vectors. (Tensor products can also be defined without referencing specific bases, but that definition is a little more abstract and thus perhaps harder to connect to intuition. But it is useful to know that just like many vector identities are true regardless of the basis being used, so are the equivalent tensor identities—the bases are only useful for writing specific vectors and tensors down.)
So Sticky's metal detector response can also be represented as a tensor in $\text{2-Space} \otimes \text{2-Space}^\*$. Specifically, it is the tensor
$$
\begin{multline}
-0.017\\,\text{mm up}\otimes(\text{cm north})^{-1} - 0.003\\,\text{mm right}\otimes(\text{cm north})^{-1}\\\\
\- 0.058\\,\text{mm right}\otimes(\text{cm east})^{-1}\\;\text{.}
\end{multline}
$$
In fact, any space of linear functions from a vector space $V$ to a vector space $W$ is equivalent to the tensor product $W \otimes V^\*$, and this generalizes to multilinear functions of multiple vector spaces and so on. By applying reasoning that should now be familiar, we can see that this representation of Sticky's detector is contravariant with respect to the space of the needle's displacement vectors, and covariant with respect to the space of Sticky's displacement vectors.
Finally, let's look at what happens when we solve the Sticky problem. Just like Speedy's speed was a function that we applied to an input vector to get an output vector, Sticky's detector response is also such a function. We simply have to apply the above tensor as a function to the vector $1.5\\,\text{cm north} - 0.5\\,\text{cm east}$ to get the answer. The way we do that is by looking at the tensor when decomposed into primitive linear functions operating on single basis vectors, matching each basis-vector-consuming term to the corresponding term in the input vector, and summing. Let's call the detector tensor $T$, and call the coefficients of the tensor $T_{\text{cm north}}^{\text{mm up}}$, $T_{\text{cm east}}^{\text{mm up}}$, $T_{\text{cm north}}^{\text{mm right}}$, and $T_{\text{cm east}}^{\text{mm right}}$ in what I hope is the obvious way. Following convention, I've placed the contravariant basis vectors in the upper position and the covariant basis vectors in the lower position. Similarly, let's represent Sticky's displacement as $v^{\text{cm north}}$ and $v^{\text{cm east}}$. Then the answer vector $\vec w$ was given by:
$$
\begin{align}
w\^{\text{mm up}} &= T_{\text{cm north}}\^{\text{mm up}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm up}} v\^{\text{cm east}} \\\\
w\^{\text{mm right}} &= T_{\text{cm north}}\^{\text{mm right}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm right}} v\^{\text{cm east}}
\end{align}
$$
Now we can save quite a bit of writing by abstracting over the basis vectors with indices. Let the index $s$ represent one of the Sticky displacement basis vectors $\\{\text{cm north}, \text{cm east}\\}$, and let the index $n$ represent one of the needle movement basis vectors $\\{\text{mm up}, \text{mm right}\\}$. Then we can replace the above equations with the general:
$$
w^n = \sum_d T_d^n v^d
$$
This operation of matching a covariant index with a contravariant index and summing is called contraction. It is extremely common, and also quite intuitive once you're comfortable with the underlying concepts—you're just feeding an output vector into a linear form which expects an input vector! Contraction is such a common operation in tensor algebra that there is a convention, [Einstein notation](https://en.wikipedia.org/wiki/Einstein_summation), which enables the $\sum$ to be omitted on terms where the same index variable is used once in an upper position and once in a lower position.
So in summary:
* Tensors are combinations of vectors from multiple vector spaces.
* Basis vectors are like physical units; tensors are like multidimensional quantities involving several units.
* A space of tensors can be covariant or contravariant in an underlying vector space, depending on whether it combines vectors or dual vectors on that space. The terminology just describes how the coefficients of tensors need to be adjusted if the basis of that vector space is scaled.
* But tensors covariant in a given vector space also represent linear functions consuming vectors from that space, while tensors contravariant in a vector space represent producing vectors in that space.
* A variable with upper and/or lower indices represents a coefficient from a tensor given some set of bases for the vector spaces being tensored.
* Upper indices correspond to contravariant basis vectors and lower indices correspond to covariant basis vectors.
* Matching upper and lower indices are meant to be summed over, which corresponds to feeding contravariant vectors to linear forms, a.k.a. covariant vectors.

A textbook homework problem might ask, ‘Speedy the snail creeps along at a steady pace of 60 cm per minute. How far does Speedy travel each second?’ The correct answer is, of course, one centimeter. Hopefully your teacher would also accept 10 mm or 0.01 m as equally valid answers; after all, they all represent the same physical length. But how do we know that? And how do we get that answer from 60 cm/min? What even is a quantity like 60 cm/min, mathematically?
Well, I'll make the case that 60 cm/min is a tensor in $\text{1-Space} \otimes \text{Time}^*$ expressed using the bases $((1\\, \text{cm}), (1\\, \text{min}^{-1}))$, where the first basis is contravariant relative to the units we usually use for physics, and the second basis is covariant relative to those units. All of the above questions follow from this. But I'm getting ahead of myself!
When we work with units such as meters, centimeters, minutes, seconds, etc., we bring a certain set of assumptions along. For starters, we assume that the quantities that we use units to represent can be added to each other if the units match: 1 meter + 1 meter = 2 meters. We also assume that these quantities can be multiplied by dimensionless quantities to get more dimensioned quantities in the same units: 5(2 m) = 10 m. We assume a bunch of intuitive facts that more or less boil down to not caring in what order any of these additions or multiplications are performed: 1 m + 2 m = 2 m + 1 m, 1 m + (2 m + 3 m) = (1 m + 2 m) + 3 m, (2 + 3)(10 m) = 2(10 m) + 3(10 m), 2(3(5 m)) = (2×3)(5 m), etc. Some of our units are compatible with each other, like meters and centimeters or minutes and seconds, and if we appropriately scale the quantities expressed in those units, they can also be added with each other. Finally, we assume that there's a ‘nothing’ associated with each of these families of compatible units, and that it doesn't matter which unit is used to express that nothing: 0 m = 0 cm. We expect it to behave like a proper nothing: 0 m + 5 m = 5 m, and 0(7 m) = 0 m.
Those assumptions are precisely what make each ‘family of compatible units’ into a vector space, for which the appropriate units are alternate choices of bases. I'll call the vector space for which meters are appropriate basis elements $\text{1-Space}$ (because it's only one dimension of space, not the three we often work with), and the vector space for which seconds are appropriate basis elements $\text{Time}$. These are both one-dimensional vector spaces, and as such vectors in these spaces only require a single number to represent them. It may seem that there isn't much point to calling them vectors at all, but we'll see how these concepts generalize to more interesting vector spaces later.
Here's a question: is a [hertz](https://en.wikipedia.org/wiki/Hertz) (Hz, or $\text{s}^{-1}$) a vector in $\text{Time}$? Say Alice types at a rate such that she presses a key five times in a second; we can say she types at 5 Hz. If Bob types at 4 Hz, then together, nine keys are being pressed in a second; Alice and Bob together type at 9 Hz. So it certainly seems like frequencies can be meaningfully added. You can check that the rest of the vector axioms end up holding as well, and so it does seem like a hertz is a vector in some vector space. But that space isn't $\text{Time}$; adding hertz and seconds isn't a thing we can ever make sense of.
However, there is a relationship between frequency and time: a frequency tells you how much of some dimensionless quantity you get for a given amount of time. If you type at 5 Hz for 1 s, you have pressed 5 keys. If you type at 5 Hz for 4 s, you have pressed 20 keys; the result scales linearly with the amount of time. So we can say that a frequency, in addition to being a vector in its own right, is also a linear function from a vector in $\text{Time}$ to dimensionless quantities. Linear functions from vector spaces to scalars are so important that they have gotten a bunch of special names: we call them linear forms, or one-forms, or dual vectors, or covectors. For *any* vector space $V$, the linear forms of $V$ form their own vector space called the dual space of $V$, denoted $V^\*$. So the vector space of frequencies is actually $\text{Time}^\*$, the dual space of $\text{Time}$.
When I was talking about compatible units before, I was vague about how to scale values when converting between units. Let's hammer that out now. We know that one meter equals 100 centimeters. So if we want to express 2 m in centimeters, we know to multiply 2 by 100 to get 200 cm. In general, in a vector space, if you have expressed a vector in some basis, and then you want to change your basis by dividing it by some amount, you have to multiply the numbers in your vector representation by that same amount. That's all that's happening here: as we change basis from 1 m to 1 cm (dividing by 100), we have to multiply the number 2 by the same number, 100, to get 200 cm from 2 m. (And of course, since dividing by $x$ is the same as multiplying by $\frac{1}{x}$, we can equivalently say that if you change basis by multiplying by some amount, you have to divide the numbers in your vector representation by that amount.)
The same logic applies for time: 2 min = 120 s because we divide a minute by 60 to get a second, and so we have to multiply 2 by 60 to get the equivalent representation in seconds. Similarly for frequency: 4 kHz = 4000 Hz, because 1 kHz = 1000 Hz. But now notice something interesting—frequency, remember, is a function from an amount of time to a quantity. If something is happening at 4 Hz for 60 seconds, it happens 240 times. What if we want to change units from seconds to minutes? We'd like to express 4 Hz, which is the same as 4 $\text{s}^{-1}$, in $\text{min}^{-1}$ instead. Well, the frequency itself isn't changing; it must still be the case that something occurring at this frequency for 60 seconds, or one minute, results in 240 events. That means that 4 Hz = 240 $\text{min}^{-1}$. So this time, multiplying the unit by 60 meant *also multiplying* the numeric representation of the frequency by 60. But this isn't mysterious at all; we did something different here than in the previous examples. Before, we were changing the same units we were using in our representation; this time, we were changing units of time and using units of frequency. In the language of linear algebra, we scaled our basis in one vector space, $\text{Time}$, and scaled the numeric representation of vectors in the *dual* vector space $\text{Time}^*$ by the same amount. This is also a general rule for any vector space and its dual.
The words ‘contravariant’ and ‘covariant’ just distinguish between these two rules. When there is a particular vector space of interest $V$, and we are sometimes working with vectors in that space and sometimes vectors in its dual space, we say that the vectors in $V$ are contravariant because we divide their numerical representations when we multiply basis vectors in $V$, and the dual vectors in $V^*$ are covariant because we multiply their numeric representations when we multiply basis vectors in $V$.
With all that out of the way, let's revisit Speedy's speed. When we say ‘60 cm/min’, we are expressing an exchange rate of a certain number of centimeters for every minute provided. Much like a frequency was a linear function from an amount of time to a dimensionless quantity, a speed is a linear function from an amount of time to an amount of one-dimensional space. If Speedy can travel 60 centimeters given one minute, Speedy can travel 120 centimeters given two minutes, and so on. Unlike frequency, the result of a speed when given an amount of time is not a dimensionless quantity, but another vector. So a speed is a linear function from vectors in one space to vectors in another.
The original math problem asked us to evaluate the function ‘60 cm/min’ on the vector ‘1 s’. We could do this by translating 1 s to $\frac{1}{60} \\,\text{min}$, since time vectors are contravariant with respect to time units. Then $\frac{1}{60} \cdot 60\\,\text{cm} = 1\\,\text{cm}$ is the result of the function and the answer to the question. But instead, we could translate 60 cm/min directly to cm/s. We ought to get the same answer either way, so we know that 60 cm/min must equal 1 cm/s. This implies that speed is covariant with respect to time units; when we scale down our unit of time, we scale down our representation of speed by the same amount. However, if we wanted to translate from cm/min to mm/min instead for some reason, we would have to scale *up* our representation of speed to 600 mm/min; speed is contravariant with respect to space units.
All this without having to talk about tensors yet! Let's fix that.
Speedy's cousin Sticky is treasure hunting and has mounted a compass-like metal detector to her shell. At Sticky's current location, for every centimeter Sticky creeps north, the needle of the detector dips down 0.017 mm and to the left 0.003 mm; for every centimeter Sticky creeps east, the needle moves to the left 0.058 mm. Assuming the detector has a linear response in Sticky's vicinity, how far does the needle move in total if she slides a total of 1.5 cm north and 0.5 cm west?
The intended solution to this problem is quite straightforward: the needle moves $1.5 \cdot -0.017\\,\text{mm} - 0.5 \cdot 0\\,\text{mm} = -0.0255\\,\text{mm}$ vertically, and $1.5 \cdot -0.003\\,\text{mm} - 0.5 \cdot -0.058\\,\text{mm} = 0.0245\\,\text{mm}$ horizontally. But just as with Speedy's speed, let's see what kind of mathematical object is represented by the response of Sticky's metal detector.
Just as a speed expresses an exchange rate between one-dimensional space and time, the numbers in this problem express exchange rates between two different two-dimensional spaces—namely, Sticky's displacement and the needle's displacement. The problem tells us to assume this relationship is linear, so it's natural to try to look at these numbers as a linear function from $\text{2-Space} \to \text{2-Space}$. Indeed, that's precisely what they are, and if you've done any linear algebra you might expect such a linear function between vector spaces to be represented as a matrix:
$$
\left\[
\begin{array}{cc}
-0.017 & 0 \\\\
-0.003 & -0.058
\end{array}
\right\]
$$
Instead, let's look at each element individually, and without dropping the units. The first element we're given is that the needle dips down 0.017 mm for every centimeter Sticky travels north—which is to say, each vector $1\\,\text{cm north}$ in the input contributes the vector $-0.017\\,\text{mm up}$ to the output. Notice that this component all by itself is a linear function from $\text{2-Space} \to \text{2-Space}$. So are the other two non-zero components given. So another way to represent this response is as the sum of three primitive linear functions, each of which consumes one basis vector in $\text{2-Space}$ and produces a multiple of a basis vector in the other $\text{2-Space}$. For a given basis for each copy of $\text{2-Space}$, there are only four total such combinations—and of course, these correspond to the four components of the matrix above. The symmetry of this representation suggests an alternate way to look at these primitive functions: as the Cartesian product of basis elements that represent a vector and basis elements that linearly accept a vector (or, in other words, basis linear forms).
This motivates the definition of a tensor product of vector spaces (over a common field of scalars): a new vector space generated by tuples of one basis element from each of the factor vector spaces. We write $V \otimes W$ for the tensor product of vector spaces $V$ and $W$, as well as $\vec e \otimes \vec f$ for the ‘basis tensor’ represented by the pair of basis vectors $\vec e$ and $\vec f$. The tensor product $\vec v \otimes \vec w$ for arbitrary vectors $\vec v$ and $\vec w$ is defined by extension, by distributing the $\otimes$ operator over the decomposition of $\vec v$ and $\vec w$ into linear combinations of basis vectors. (Tensor products can also be defined without referencing specific bases, but that definition is a little more abstract and thus perhaps harder to connect to intuition. But it is useful to know that just like many vector identities are true regardless of the basis being used, so are the equivalent tensor identities—the bases are only useful for writing specific vectors and tensors down.)
So Sticky's metal detector response can also be represented as a tensor in $\text{2-Space} \otimes \text{2-Space}^\*$. Specifically, it is the tensor
$$
\begin{multline}
-0.017\\,\text{mm up}\otimes(\text{cm north})^{-1} - 0.003\\,\text{mm right}\otimes(\text{cm north})^{-1}\\\\
\- 0.058\\,\text{mm right}\otimes(\text{cm east})^{-1}\\;\text{.}
\end{multline}
$$
In fact, any space of linear functions from a vector space $V$ to a vector space $W$ is equivalent to the tensor product $W \otimes V^\*$, and this generalizes to multilinear functions of multiple vector spaces and so on. (This is why I described Speedy's speed as a tensor as well, at the start of this post.) By applying reasoning that should now be familiar, we can see that this representation of Sticky's detector is contravariant with respect to the space of the needle's displacement vectors, and covariant with respect to the space of Sticky's displacement vectors.
Finally, let's look at what happens when we solve the Sticky problem. Just like Speedy's speed was a function that we applied to an input vector to get an output vector, Sticky's detector response is also such a function. We simply have to apply the above tensor as a function to the vector $1.5\\,\text{cm north} - 0.5\\,\text{cm east}$ to get the answer. The way we do that is by looking at the tensor when decomposed into primitive linear functions operating on single basis vectors, matching each basis-vector-consuming term to the corresponding term in the input vector, and summing. Let's call the detector tensor $T$, and call the coefficients of the tensor $T_{\text{cm north}}^{\text{mm up}}$, $T_{\text{cm east}}^{\text{mm up}}$, $T_{\text{cm north}}^{\text{mm right}}$, and $T_{\text{cm east}}^{\text{mm right}}$ in what I hope is the obvious way. Following convention, I've placed the contravariant basis vectors in the upper position and the covariant basis vectors in the lower position. Similarly, let's represent Sticky's displacement as $v^{\text{cm north}}$ and $v^{\text{cm east}}$. Then the answer vector $\vec w$ was given by:
$$
\begin{align}
w\^{\text{mm up}} &= T_{\text{cm north}}\^{\text{mm up}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm up}} v\^{\text{cm east}} \\\\
w\^{\text{mm right}} &= T_{\text{cm north}}\^{\text{mm right}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm right}} v\^{\text{cm east}}
\end{align}
$$
Now we can save quite a bit of writing by abstracting over the basis vectors with indices. Let the index $s$ represent one of the Sticky displacement basis vectors $\\{\text{cm north}, \text{cm east}\\}$, and let the index $n$ represent one of the needle movement basis vectors $\\{\text{mm up}, \text{mm right}\\}$. Then we can replace the above equations with the general:
$$
w^n = \sum_d T_d^n v^d
$$
This operation of matching a covariant index with a contravariant index and summing is called contraction. It is extremely common, and also quite intuitive once you're comfortable with the underlying concepts—you're just feeding an output vector into a linear form which expects an input vector! Contraction is such a common operation in tensor algebra that there is a convention, [Einstein notation](https://en.wikipedia.org/wiki/Einstein_summation), which enables the $\sum$ to be omitted on terms where the same index variable is used once in an upper position and once in a lower position.
So in summary:
* Tensors are combinations of vectors from multiple vector spaces.
* Basis vectors are like physical units; tensors are like multidimensional quantities involving several units.
* A space of tensors can be covariant or contravariant in an underlying vector space, depending on whether it combines vectors or dual vectors from that space. The terminology just describes how the coefficients of tensors need to be adjusted if the basis of that vector space is scaled.
* But tensors covariant in a given vector space also represent linear functions consuming vectors from that space, while tensors contravariant in a vector space represent producing vectors in that space.
* A variable with upper and/or lower indices represents a coefficient from a tensor given some set of bases for the vector spaces being tensored.
* Upper indices correspond to contravariant basis vectors and lower indices correspond to covariant basis vectors.
* Matching upper and lower indices are meant to be summed over, which corresponds to feeding contravariant vectors to linear forms, a.k.a. covariant vectors.

#4: Post edited by $user avatar$ r~~‭ · 2021-11-02T05:17:59Z (over 3 years ago)

Copy Link

Raw

Markdown

A textbook homework problem might ask, ‘Speedy the snail creeps along at a steady pace of 60 cm per minute. How far does Speedy travel each second?’ The correct answer is, of course, one centimeter. Hopefully your teacher would also accept 10 mm or 0.01 m as equally valid answers; after all, they all represent the same physical length. But how do we know that? And how do we get that answer from 60 cm/min? What even is a quantity like 60 cm/min, mathematically?
Well, as it happens, 60 cm/min is a tensor in $\text{1-Space} \otimes \text{Time}^*$ expressed using the bases $((1\\, \text{cm}), (1\\, \text{min}^{-1}))$, where the first basis is contravariant relative to the units we usually use for physics, and the second basis is covariant relative to those units. All of the above questions follow from this. But I'm getting ahead of myself!
When we work with units such as meters, centimeters, minutes, seconds, etc., we bring a certain set of assumptions along. For starters, we assume that the quantities that we use units to represent can be added to each other if the units match: 1 meter + 1 meter = 2 meters. We also assume that these quantities can be multiplied by dimensionless quantities to get more dimensioned quantities in the same units: 5(2 m) = 10 m. We assume a bunch of intuitive facts that more or less boil down to not caring in what order any of these additions or multiplications are performed: 1 m + 2 m = 2 m + 1 m, 1 m + (2 m + 3 m) = (1 m + 2 m) + 3 m, (2 + 3)(10 m) = 2(10 m) + 3(10 m), 2(3(5 m)) = (2×3)(5 m), etc. Some of our units are compatible with each other, like meters and centimeters or minutes and seconds, and if we appropriately scale the quantities expressed in those units, they can also be added with each other. Finally, we assume that there's a ‘nothing’ associated with each of these families of compatible units, and that it doesn't matter which unit is used to express that nothing: 0 m = 0 cm. We expect it to behave like a proper nothing: 0 m + 5 m = 5 m, and 0(7 m) = 0 m.
Those assumptions are precisely what make each ‘family of compatible units’ into a vector space, for which the appropriate units are alternate choices of bases. I'll call the vector space for which meters are appropriate basis elements $\text{1-Space}$ (because it's only one dimension of space, not the three we often work with), and the vector space for which seconds are appropriate basis elements $\text{Time}$. These are both one-dimensional vector spaces, and as such vectors in these spaces only require a single number to represent them. It may seem that there isn't much point to calling them vectors at all, but we'll see how these concepts generalize to more interesting vector spaces later.
Here's a question: is a [hertz](https://en.wikipedia.org/wiki/Hertz) (Hz, or $\text{s}^{-1}$) a vector in $\text{Time}$? Say Alice types at a rate such that she presses a key five times in a second; we can say she types at 5 Hz. If Bob types at 4 Hz, then together, nine keys are being pressed in a second; Alice and Bob together type at 9 Hz. So it certainly seems like frequencies can be meaningfully added. You can check that the rest of the vector axioms end up holding as well, and so it does seem like a hertz is a vector in some vector space. But that space isn't $\text{Time}$; adding hertz and seconds isn't a thing we can ever make sense of.
However, there is a relationship between frequency and time: a frequency tells you how much of some dimensionless quantity you get for a given amount of time. If you type at 5 Hz for 1 s, you have pressed 5 keys. If you type at 5 Hz for 4 s, you have pressed 20 keys; the result scales linearly with the amount of time. So we can say that a frequency, in addition to being a vector in its own right, is also a linear function from a vector in $\text{Time}$ to dimensionless quantities. Linear functions from vector spaces to scalars are so important that they have gotten a bunch of special names: we call them linear forms, or one-forms, or dual vectors, or covectors. For *any* vector space $V$, the linear forms of $V$ form their own vector space called the dual space of $V$, denoted $V^\*$. So the vector space of frequencies is actually $\text{Time}^\*$, the dual space of $\text{Time}$.
When I was talking about compatible units before, I was vague about how to scale values when converting between units. Let's hammer that out now. We know that one meter equals 100 centimeters. So if we want to express 2 m in centimeters, we know to multiply 2 by 100 to get 200 cm. In general, in a vector space, if you have expressed a vector in some basis, and then you want to change your basis by dividing it by some amount, you have to multiply the numbers in your vector representation by that same amount. That's all that's happening here: as we change basis from 1 m to 1 cm (dividing by 100), we have to multiply the number 2 by the same number, 100, to get 200 cm from 2 m. (And of course, since dividing by $x$ is the same as multiplying by $\frac{1}{x}$, we can equivalently say that if you change basis by multiplying by some amount, you have to divide the numbers in your vector representation by that amount.)
The same logic applies for time: 2 min = 120 s because we divide a minute by 60 to get a second, and so we have to multiply 2 by 60 to get the equivalent representation in seconds. Similarly for frequency: 4 kHz = 4000 Hz, because 1 kHz = 1000 Hz. But now notice something interesting—frequency, remember, is a function from an amount of time to a quantity. If something is happening at 4 Hz for 60 seconds, it happens 240 times. What if we want to change units from seconds to minutes? We'd like to express 4 Hz, which is the same as 4 $\text{s}^{-1}$, in $\text{min}^{-1}$ instead. Well, the frequency itself isn't changing; it must still be the case that something occurring at this frequency for 60 seconds, or one minute, results in 240 events. That means that 4 Hz = 240 $\text{min}^{-1}$. So this time, multiplying the unit by 60 meant *also multiplying* the numeric representation of the frequency by 60. But this isn't mysterious at all; we did something different here than in the previous examples. Before, we were changing the same units we were using in our representation; this time, we were changing units of time and using units of frequency. In the language of linear algebra, we scaled our basis in one vector space, $\text{Time}$, and scaled the numeric representation of vectors in the *dual* vector space $\text{Time}^*$ by the same amount. This is also a general rule for any vector space and its dual.
The words ‘contravariant’ and ‘covariant’ just distinguish between these two rules. When there is a particular vector space of interest $V$, and we are sometimes working with vectors in that space and sometimes vectors in its dual space, we say that the vectors in $V$ are contravariant because we divide their numerical representations when we multiply basis vectors in $V$, and the dual vectors in $V^*$ are covariant because we multiply their numeric representations when we multiply basis vectors in $V$.
With all that out of the way, let's revisit Speedy's speed. When we say ‘60 cm/min’, we are expressing an exchange rate of a certain number of centimeters for every minute provided. Much like a frequency was a linear function from an amount of time to a dimensionless quantity, a speed is a linear function from an amount of time to an amount of one-dimensional space. If Speedy can travel 60 centimeters given one minute, Speedy can travel 120 centimeters given two minutes, and so on. Unlike frequency, the result of a speed when given an amount of time is not a dimensionless quantity, but another vector. So a speed is a linear function from vectors in one space to vectors in another.
The original math problem asked us to evaluate the function ‘60 cm/min’ on the vector ‘1 s’. We could do this by translating 1 s to $\frac{1}{60} \\,\text{min}$, since time vectors are contravariant with respect to time units. Then $\frac{1}{60} \cdot 60\\,\text{cm} = 1\\,\text{cm}$ is the result of the function and the answer to the question. But instead, we could translate 60 cm/min directly to cm/s. We ought to get the same answer either way, so we know that 60 cm/min must equal 1 cm/s. This implies that speed is covariant with respect to time units; when we scale down our unit of time, we scale down our representation of speed by the same amount. However, if we wanted to translate from cm/min to mm/min instead for some reason, we would have to scale *up* our representation of speed to 600 mm/min; speed is contravariant with respect to space units.
All this without having to talk about tensors yet! Let's fix that.
Speedy's cousin Sticky is treasure hunting and has mounted a compass-like metal detector to her shell. At Sticky's current location, for every centimeter Sticky creeps north, the needle of the detector dips down 0.017 mm and to the left 0.003 mm; for every centimeter Sticky creeps east, the needle moves to the left 0.058 mm. Assuming the detector has a linear response in Sticky's vicinity, how far does the needle move in total if she slides a total of 1.5 cm north and 0.5 cm west?
The intended solution to this problem is quite straightforward: the needle moves $1.5 \cdot -0.017\\,\text{mm} - 0.5 \cdot 0\\,\text{mm} = -0.0255\\,\text{mm}$ vertically, and $1.5 \cdot -0.003\\,\text{mm} - 0.5 \cdot -0.058\\,\text{mm} = 0.0245\\,\text{mm}$ horizontally. But just as with Speedy's speed, let's see what kind of mathematical object is represented by the response of Sticky's metal detector.
Just as a speed expresses an exchange rate between one-dimensional space and time, the numbers in this problem express exchange rates between two different two-dimensional spaces—namely, Sticky's displacement and the needle's displacement. The problem tells us to assume this relationship is linear, so it's natural to try to look at these numbers as a linear function from $\text{2-Space} \to \text{2-Space}$. Indeed, that's precisely what they are, and if you've done any linear algebra you might expect such a linear function between vector spaces to be represented as a matrix:
$$
\left\[
\begin{array}{cc}
-0.017 & 0 \\\\
-0.003 & -0.058
\end{array}
\right\]
$$
Instead, let's look at each element individually, and without dropping the units. The first element we're given is that the needle dips down 0.017 mm for every centimeter Sticky travels north—which is to say, each vector $1\\,\text{cm north}$ in the input contributes the vector $-0.017\\,\text{mm up}$ to the output. Notice that this component all by itself is a linear function from $\text{2-Space} \to \text{2-Space}$. So are the other two non-zero components given. So another way to represent this response is as the sum of three primitive linear functions, each of which consumes one basis vector in $\text{2-Space}$ and produces a multiple of a basis vector in the other $\text{2-Space}$. For a given basis for each copy of $\text{2-Space}$, there are only four total such combinations—and of course, these correspond to the four components of the matrix above. The symmetry of this representation suggests an alternate way to look at these primitive functions: as the Cartesian product of basis elements that represent a vector and basis elements that linearly accept a vector (or, in other words, basis linear forms).
This motivates the definition of a tensor product of vector spaces (over a common field of scalars): a new vector space generated by tuples of one basis element from each of the factor vector spaces. We write $V \otimes W$ for the tensor product of vector spaces $V$ and $W$, as well as $\vec e \otimes \vec f$ for the ‘basis tensor’ represented by the pair of basis vectors $\vec e$ and $\vec f$. The tensor product $\vec v \otimes \vec w$ for arbitrary vectors $\vec v$ and $\vec w$ is defined by extension, by distributing the $\otimes$ operator over the decomposition of $\vec v$ and $\vec w$ into sums of basis vectors. (Tensor products can also be defined without referencing specific bases, but that definition is a little more abstract and thus perhaps harder to connect to intuition. But it is useful to know that just like algebraic operations on vectors are true regardless of the basis being used, so are operations on tensors—the bases are only useful for writing specific vectors and tensors down.)
So Sticky's metal detector response can also be represented as a tensor in $\text{2-Space} \otimes \text{2-Space}^\*$. Specifically, it is the tensor
$$
\begin{multline}
-0.017\\,\text{mm up}\otimes(\text{cm north})^{-1} - 0.003\\,\text{mm right}\otimes(\text{cm north})^{-1}\\\\
\- 0.058\\,\text{mm right}\otimes(\text{cm east})^{-1}\\;\text{.}
\end{multline}
$$
In fact, any space of linear functions from a vector space $V$ to a vector space $W$ is equivalent to the tensor product $W \otimes V^\*$, and this generalizes to multilinear functions of multiple vector spaces and so on. By applying reasoning that should now be familiar, we can see that this representation of Sticky's detector is contravariant with respect to the space of the needle's displacement vectors, and covariant with respect to the space of Sticky's displacement vectors.
Finally, let's look at what happens when we solve the Sticky problem. Just like Speedy's speed was a function that we applied to an input vector to get an output vector, Sticky's detector response is also such a function. We simply have to apply the above tensor as a function to the vector $1.5\\,\text{cm north} - 0.5\\,\text{cm east}$ to get the answer. The way we do that is by looking at the tensor when decomposed into primitive linear functions operating on single basis vectors, matching each basis-vector-consuming term to the corresponding term in the input vector, and summing. Let's call the detector tensor $T$, and call the coefficients of the tensor $T_{\text{cm north}}^{\text{mm up}}$, $T_{\text{cm east}}^{\text{mm up}}$, $T_{\text{cm north}}^{\text{mm right}}$, and $T_{\text{cm east}}^{\text{mm right}}$ in what I hope is the obvious way. Following convention, I've placed the contravariant basis vectors in the upper position and the covariant basis vectors in the lower position. Similarly, let's represent Sticky's displacement as $v^{\text{cm north}}$ and $v^{\text{cm east}}$. Then the answer vector $\vec w$ was given by:
$$
\begin{align}
w\^{\text{mm up}} &= T_{\text{cm north}}\^{\text{mm up}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm up}} v\^{\text{cm east}} \\\\
w\^{\text{mm right}} &= T_{\text{cm north}}\^{\text{mm right}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm right}} v\^{\text{cm east}}
\end{align}
$$
Now we can save quite a bit of writing by abstracting over the basis vectors with indices. Let the index $s$ represent one of the Sticky displacement basis vectors $\\{\text{cm north}, \text{cm east}\\}$, and let the index $n$ represent one of the needle movement basis vectors $\\{\text{mm up}, \text{mm right}\\}$. Then we can replace the above equations with the general:
$$
w^n = \sum_d T_d^n v^d
$$
This operation of matching a covariant index with a contravariant index and summing is called contraction. It is extremely common, and also quite intuitive once you're comfortable with the underlying concepts—you're just feeding an output vector into a linear form which expects an input vector! Contraction is such a common operation in tensor algebra that there is a convention, [Einstein notation](https://en.wikipedia.org/wiki/Einstein_summation), which enables the $\sum$ to be omitted on terms where the same index variable is used once in an upper position and once in a lower position.
So in summary:
* Tensors are combinations of vectors from multiple vector spaces.
* Basis vectors are like physical units; tensors are like multidimensional quantities involving several units.
* A space of tensors can be covariant or contravariant in an underlying vector space, depending on whether it combines vectors or dual vectors on that space. The terminology just describes how the coefficients of tensors need to be adjusted if the basis of that vector space is scaled.
* But tensors covariant in a given vector space also represent linear functions consuming vectors from that space, while tensors contravariant in a vector space represent producing vectors in that space.
* A variable with upper and/or lower indices represents a coefficient from a tensor given some set of bases for the vector spaces being tensored.
* Upper indices correspond to contravariant basis vectors and lower indices correspond to covariant basis vectors.
* Matching upper and lower indices are meant to be summed over, which corresponds to feeding contravariant vectors to linear forms, a.k.a. covariant vectors.

A textbook homework problem might ask, ‘Speedy the snail creeps along at a steady pace of 60 cm per minute. How far does Speedy travel each second?’ The correct answer is, of course, one centimeter. Hopefully your teacher would also accept 10 mm or 0.01 m as equally valid answers; after all, they all represent the same physical length. But how do we know that? And how do we get that answer from 60 cm/min? What even is a quantity like 60 cm/min, mathematically?
Well, as it happens, 60 cm/min is a tensor in $\text{1-Space} \otimes \text{Time}^*$ expressed using the bases $((1\\, \text{cm}), (1\\, \text{min}^{-1}))$, where the first basis is contravariant relative to the units we usually use for physics, and the second basis is covariant relative to those units. All of the above questions follow from this. But I'm getting ahead of myself!
When we work with units such as meters, centimeters, minutes, seconds, etc., we bring a certain set of assumptions along. For starters, we assume that the quantities that we use units to represent can be added to each other if the units match: 1 meter + 1 meter = 2 meters. We also assume that these quantities can be multiplied by dimensionless quantities to get more dimensioned quantities in the same units: 5(2 m) = 10 m. We assume a bunch of intuitive facts that more or less boil down to not caring in what order any of these additions or multiplications are performed: 1 m + 2 m = 2 m + 1 m, 1 m + (2 m + 3 m) = (1 m + 2 m) + 3 m, (2 + 3)(10 m) = 2(10 m) + 3(10 m), 2(3(5 m)) = (2×3)(5 m), etc. Some of our units are compatible with each other, like meters and centimeters or minutes and seconds, and if we appropriately scale the quantities expressed in those units, they can also be added with each other. Finally, we assume that there's a ‘nothing’ associated with each of these families of compatible units, and that it doesn't matter which unit is used to express that nothing: 0 m = 0 cm. We expect it to behave like a proper nothing: 0 m + 5 m = 5 m, and 0(7 m) = 0 m.
Those assumptions are precisely what make each ‘family of compatible units’ into a vector space, for which the appropriate units are alternate choices of bases. I'll call the vector space for which meters are appropriate basis elements $\text{1-Space}$ (because it's only one dimension of space, not the three we often work with), and the vector space for which seconds are appropriate basis elements $\text{Time}$. These are both one-dimensional vector spaces, and as such vectors in these spaces only require a single number to represent them. It may seem that there isn't much point to calling them vectors at all, but we'll see how these concepts generalize to more interesting vector spaces later.
Here's a question: is a [hertz](https://en.wikipedia.org/wiki/Hertz) (Hz, or $\text{s}^{-1}$) a vector in $\text{Time}$? Say Alice types at a rate such that she presses a key five times in a second; we can say she types at 5 Hz. If Bob types at 4 Hz, then together, nine keys are being pressed in a second; Alice and Bob together type at 9 Hz. So it certainly seems like frequencies can be meaningfully added. You can check that the rest of the vector axioms end up holding as well, and so it does seem like a hertz is a vector in some vector space. But that space isn't $\text{Time}$; adding hertz and seconds isn't a thing we can ever make sense of.
However, there is a relationship between frequency and time: a frequency tells you how much of some dimensionless quantity you get for a given amount of time. If you type at 5 Hz for 1 s, you have pressed 5 keys. If you type at 5 Hz for 4 s, you have pressed 20 keys; the result scales linearly with the amount of time. So we can say that a frequency, in addition to being a vector in its own right, is also a linear function from a vector in $\text{Time}$ to dimensionless quantities. Linear functions from vector spaces to scalars are so important that they have gotten a bunch of special names: we call them linear forms, or one-forms, or dual vectors, or covectors. For *any* vector space $V$, the linear forms of $V$ form their own vector space called the dual space of $V$, denoted $V^\*$. So the vector space of frequencies is actually $\text{Time}^\*$, the dual space of $\text{Time}$.
When I was talking about compatible units before, I was vague about how to scale values when converting between units. Let's hammer that out now. We know that one meter equals 100 centimeters. So if we want to express 2 m in centimeters, we know to multiply 2 by 100 to get 200 cm. In general, in a vector space, if you have expressed a vector in some basis, and then you want to change your basis by dividing it by some amount, you have to multiply the numbers in your vector representation by that same amount. That's all that's happening here: as we change basis from 1 m to 1 cm (dividing by 100), we have to multiply the number 2 by the same number, 100, to get 200 cm from 2 m. (And of course, since dividing by $x$ is the same as multiplying by $\frac{1}{x}$, we can equivalently say that if you change basis by multiplying by some amount, you have to divide the numbers in your vector representation by that amount.)
The same logic applies for time: 2 min = 120 s because we divide a minute by 60 to get a second, and so we have to multiply 2 by 60 to get the equivalent representation in seconds. Similarly for frequency: 4 kHz = 4000 Hz, because 1 kHz = 1000 Hz. But now notice something interesting—frequency, remember, is a function from an amount of time to a quantity. If something is happening at 4 Hz for 60 seconds, it happens 240 times. What if we want to change units from seconds to minutes? We'd like to express 4 Hz, which is the same as 4 $\text{s}^{-1}$, in $\text{min}^{-1}$ instead. Well, the frequency itself isn't changing; it must still be the case that something occurring at this frequency for 60 seconds, or one minute, results in 240 events. That means that 4 Hz = 240 $\text{min}^{-1}$. So this time, multiplying the unit by 60 meant *also multiplying* the numeric representation of the frequency by 60. But this isn't mysterious at all; we did something different here than in the previous examples. Before, we were changing the same units we were using in our representation; this time, we were changing units of time and using units of frequency. In the language of linear algebra, we scaled our basis in one vector space, $\text{Time}$, and scaled the numeric representation of vectors in the *dual* vector space $\text{Time}^*$ by the same amount. This is also a general rule for any vector space and its dual.
The words ‘contravariant’ and ‘covariant’ just distinguish between these two rules. When there is a particular vector space of interest $V$, and we are sometimes working with vectors in that space and sometimes vectors in its dual space, we say that the vectors in $V$ are contravariant because we divide their numerical representations when we multiply basis vectors in $V$, and the dual vectors in $V^*$ are covariant because we multiply their numeric representations when we multiply basis vectors in $V$.
With all that out of the way, let's revisit Speedy's speed. When we say ‘60 cm/min’, we are expressing an exchange rate of a certain number of centimeters for every minute provided. Much like a frequency was a linear function from an amount of time to a dimensionless quantity, a speed is a linear function from an amount of time to an amount of one-dimensional space. If Speedy can travel 60 centimeters given one minute, Speedy can travel 120 centimeters given two minutes, and so on. Unlike frequency, the result of a speed when given an amount of time is not a dimensionless quantity, but another vector. So a speed is a linear function from vectors in one space to vectors in another.
The original math problem asked us to evaluate the function ‘60 cm/min’ on the vector ‘1 s’. We could do this by translating 1 s to $\frac{1}{60} \\,\text{min}$, since time vectors are contravariant with respect to time units. Then $\frac{1}{60} \cdot 60\\,\text{cm} = 1\\,\text{cm}$ is the result of the function and the answer to the question. But instead, we could translate 60 cm/min directly to cm/s. We ought to get the same answer either way, so we know that 60 cm/min must equal 1 cm/s. This implies that speed is covariant with respect to time units; when we scale down our unit of time, we scale down our representation of speed by the same amount. However, if we wanted to translate from cm/min to mm/min instead for some reason, we would have to scale *up* our representation of speed to 600 mm/min; speed is contravariant with respect to space units.
All this without having to talk about tensors yet! Let's fix that.
Speedy's cousin Sticky is treasure hunting and has mounted a compass-like metal detector to her shell. At Sticky's current location, for every centimeter Sticky creeps north, the needle of the detector dips down 0.017 mm and to the left 0.003 mm; for every centimeter Sticky creeps east, the needle moves to the left 0.058 mm. Assuming the detector has a linear response in Sticky's vicinity, how far does the needle move in total if she slides a total of 1.5 cm north and 0.5 cm west?
The intended solution to this problem is quite straightforward: the needle moves $1.5 \cdot -0.017\\,\text{mm} - 0.5 \cdot 0\\,\text{mm} = -0.0255\\,\text{mm}$ vertically, and $1.5 \cdot -0.003\\,\text{mm} - 0.5 \cdot -0.058\\,\text{mm} = 0.0245\\,\text{mm}$ horizontally. But just as with Speedy's speed, let's see what kind of mathematical object is represented by the response of Sticky's metal detector.
Just as a speed expresses an exchange rate between one-dimensional space and time, the numbers in this problem express exchange rates between two different two-dimensional spaces—namely, Sticky's displacement and the needle's displacement. The problem tells us to assume this relationship is linear, so it's natural to try to look at these numbers as a linear function from $\text{2-Space} \to \text{2-Space}$. Indeed, that's precisely what they are, and if you've done any linear algebra you might expect such a linear function between vector spaces to be represented as a matrix:
$$
\left\[
\begin{array}{cc}
-0.017 & 0 \\\\
-0.003 & -0.058
\end{array}
\right\]
$$
Instead, let's look at each element individually, and without dropping the units. The first element we're given is that the needle dips down 0.017 mm for every centimeter Sticky travels north—which is to say, each vector $1\\,\text{cm north}$ in the input contributes the vector $-0.017\\,\text{mm up}$ to the output. Notice that this component all by itself is a linear function from $\text{2-Space} \to \text{2-Space}$. So are the other two non-zero components given. So another way to represent this response is as the sum of three primitive linear functions, each of which consumes one basis vector in $\text{2-Space}$ and produces a multiple of a basis vector in the other $\text{2-Space}$. For a given basis for each copy of $\text{2-Space}$, there are only four total such combinations—and of course, these correspond to the four components of the matrix above. The symmetry of this representation suggests an alternate way to look at these primitive functions: as the Cartesian product of basis elements that represent a vector and basis elements that linearly accept a vector (or, in other words, basis linear forms).
This motivates the definition of a tensor product of vector spaces (over a common field of scalars): a new vector space generated by tuples of one basis element from each of the factor vector spaces. We write $V \otimes W$ for the tensor product of vector spaces $V$ and $W$, as well as $\vec e \otimes \vec f$ for the ‘basis tensor’ represented by the pair of basis vectors $\vec e$ and $\vec f$. The tensor product $\vec v \otimes \vec w$ for arbitrary vectors $\vec v$ and $\vec w$ is defined by extension, by distributing the $\otimes$ operator over the decomposition of $\vec v$ and $\vec w$ into sums of basis vectors. (Tensor products can also be defined without referencing specific bases, but that definition is a little more abstract and thus perhaps harder to connect to intuition. But it is useful to know that just like many vector identities are true regardless of the basis being used, so are the equivalent tensor identities—the bases are only useful for writing specific vectors and tensors down.)
So Sticky's metal detector response can also be represented as a tensor in $\text{2-Space} \otimes \text{2-Space}^\*$. Specifically, it is the tensor
$$
\begin{multline}
-0.017\\,\text{mm up}\otimes(\text{cm north})^{-1} - 0.003\\,\text{mm right}\otimes(\text{cm north})^{-1}\\\\
\- 0.058\\,\text{mm right}\otimes(\text{cm east})^{-1}\\;\text{.}
\end{multline}
$$
In fact, any space of linear functions from a vector space $V$ to a vector space $W$ is equivalent to the tensor product $W \otimes V^\*$, and this generalizes to multilinear functions of multiple vector spaces and so on. By applying reasoning that should now be familiar, we can see that this representation of Sticky's detector is contravariant with respect to the space of the needle's displacement vectors, and covariant with respect to the space of Sticky's displacement vectors.
Finally, let's look at what happens when we solve the Sticky problem. Just like Speedy's speed was a function that we applied to an input vector to get an output vector, Sticky's detector response is also such a function. We simply have to apply the above tensor as a function to the vector $1.5\\,\text{cm north} - 0.5\\,\text{cm east}$ to get the answer. The way we do that is by looking at the tensor when decomposed into primitive linear functions operating on single basis vectors, matching each basis-vector-consuming term to the corresponding term in the input vector, and summing. Let's call the detector tensor $T$, and call the coefficients of the tensor $T_{\text{cm north}}^{\text{mm up}}$, $T_{\text{cm east}}^{\text{mm up}}$, $T_{\text{cm north}}^{\text{mm right}}$, and $T_{\text{cm east}}^{\text{mm right}}$ in what I hope is the obvious way. Following convention, I've placed the contravariant basis vectors in the upper position and the covariant basis vectors in the lower position. Similarly, let's represent Sticky's displacement as $v^{\text{cm north}}$ and $v^{\text{cm east}}$. Then the answer vector $\vec w$ was given by:
$$
\begin{align}
w\^{\text{mm up}} &= T_{\text{cm north}}\^{\text{mm up}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm up}} v\^{\text{cm east}} \\\\
w\^{\text{mm right}} &= T_{\text{cm north}}\^{\text{mm right}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm right}} v\^{\text{cm east}}
\end{align}
$$
Now we can save quite a bit of writing by abstracting over the basis vectors with indices. Let the index $s$ represent one of the Sticky displacement basis vectors $\\{\text{cm north}, \text{cm east}\\}$, and let the index $n$ represent one of the needle movement basis vectors $\\{\text{mm up}, \text{mm right}\\}$. Then we can replace the above equations with the general:
$$
w^n = \sum_d T_d^n v^d
$$
This operation of matching a covariant index with a contravariant index and summing is called contraction. It is extremely common, and also quite intuitive once you're comfortable with the underlying concepts—you're just feeding an output vector into a linear form which expects an input vector! Contraction is such a common operation in tensor algebra that there is a convention, [Einstein notation](https://en.wikipedia.org/wiki/Einstein_summation), which enables the $\sum$ to be omitted on terms where the same index variable is used once in an upper position and once in a lower position.
So in summary:
* Tensors are combinations of vectors from multiple vector spaces.
* Basis vectors are like physical units; tensors are like multidimensional quantities involving several units.
* A space of tensors can be covariant or contravariant in an underlying vector space, depending on whether it combines vectors or dual vectors on that space. The terminology just describes how the coefficients of tensors need to be adjusted if the basis of that vector space is scaled.
* But tensors covariant in a given vector space also represent linear functions consuming vectors from that space, while tensors contravariant in a vector space represent producing vectors in that space.
* A variable with upper and/or lower indices represents a coefficient from a tensor given some set of bases for the vector spaces being tensored.
* Upper indices correspond to contravariant basis vectors and lower indices correspond to covariant basis vectors.
* Matching upper and lower indices are meant to be summed over, which corresponds to feeding contravariant vectors to linear forms, a.k.a. covariant vectors.

#3: Post edited by $user avatar$ r~~‭ · 2021-11-02T05:01:22Z (over 3 years ago)

Copy Link

Raw

Markdown

A textbook homework problem might ask, ‘Speedy the snail creeps along at a steady pace of 60 cm per minute. How far does Speedy travel each second?’ The correct answer is, of course, one centimeter. Hopefully your teacher would also accept 10 mm or 0.01 m as equally valid answers; after all, they all represent the same physical length. But how do we know that? And how do we get that answer from 60 cm/min? What even is a quantity like 60 cm/min, mathematically?
Well, as it happens, 60 cm/min is a tensor in $\text{1-Space} \otimes \text{Time}^*$ expressed using the bases $((1\\, \text{cm}), (1\\, \text{min}^{-1}))$, where the first basis is contravariant relative to the units we usually use for physics, and the second basis is covariant relative to those units. All of the above questions follow from this. But I'm getting ahead of myself!
When we work with units such as meters, centimeters, minutes, seconds, etc., we bring a certain set of assumptions along. For starters, we assume that the quantities that we use units to represent can be added to each other if the units match: 1 meter + 1 meter = 2 meters. We also assume that these quantities can be multiplied by dimensionless quantities to get more dimensioned quantities in the same units: 5(2 m) = 10 m. We assume a bunch of intuitive facts that more or less boil down to not caring in what order any of these additions or multiplications are performed: 1 m + 2 m = 2 m + 1 m, 1 m + (2 m + 3 m) = (1 m + 2 m) + 3 m, (2 + 3)(10 m) = 2(10 m) + 3(10 m), 2(3(5 m)) = (2×3)(5 m), etc. Some of our units are compatible with each other, like meters and centimeters or minutes and seconds, and if we appropriately scale the quantities expressed in those units, they can also be added with each other. Finally, we assume that there's a ‘nothing’ associated with each of these families of compatible units, and that it doesn't matter which unit is used to express that nothing: 0 m = 0 cm. We expect it to behave like a proper nothing: 0 m + 5 m = 5 m, and 0(7 m) = 0 m.
Those assumptions are precisely what make each ‘family of compatible units’ into a vector space, for which the appropriate units are alternate choices of bases. I'll call the vector space for which meters are appropriate basis elements $\text{1-Space}$ (because it's only one dimension of space, not the three we often work with), and the vector space for which seconds are appropriate basis elements $\text{Time}$. These are both one-dimensional vector spaces, and as such vectors in these spaces only require a single number to represent them. It may seem that there isn't much point to calling them vectors at all, but we'll see how these concepts generalize to more interesting vector spaces later.
Here's a question: is a [hertz](https://en.wikipedia.org/wiki/Hertz) (Hz, or $\text{s}^{-1}$) a vector in $\text{Time}$? Say Alice types at a rate such that she presses a key five times in a second; we can say she types at 5 Hz. If Bob types at 4 Hz, then together, nine keys are being pressed in a second; Alice and Bob together type at 9 Hz. So it certainly seems like frequencies can be meaningfully added. You can check that the rest of the vector axioms end up holding as well, and so it does seem like a hertz is a vector in some vector space. But that space isn't $\text{Time}$; adding hertz and seconds isn't a thing we can ever make sense of.
However, there is a relationship between frequency and time: a frequency tells you how much of some dimensionless quantity you get for a given amount of time. If you type at 5 Hz for 1 s, you have pressed 5 keys. If you type at 5 Hz for 4 s, you have pressed 20 keys; the result scales linearly with the amount of time. So we can say that a frequency, in addition to being a vector in its own right, is also a linear function from a vector in $\text{Time}$ to dimensionless quantities. Linear functions from vector spaces to scalars are so important that they have gotten a bunch of special names: we call them linear forms, or one-forms, or dual vectors, or covectors. For *any* vector space $V$, the linear forms of $V$ form their own vector space called the dual space of $V$, denoted $V^\*$. So the vector space of frequencies is actually $\text{Time}^\*$, the dual space of $\text{Time}$.
When I was talking about compatible units before, I was vague about how to scale values when converting between units. Let's hammer that out now. We know that one meter equals 100 centimeters. So if we want to express 2 m in centimeters, we know to multiply 2 by 100 to get 200 cm. In general, in a vector space, if you have expressed a vector in some basis, and then you want to change your basis by dividing it by some amount, you have to multiply the numbers in your vector representation by that same amount. That's all that's happening here: as we change basis from 1 m to 1 cm (dividing by 100), we have to multiply the number 2 by the same number, 100, to get 200 cm from 2 m. (And of course, since dividing by $x$ is the same as multiplying by $\frac{1}{x}$, we can equivalently say that if you change basis by multiplying by some amount, you have to divide the numbers in your vector representation by that amount.)
The same logic applies for time: 2 min = 120 s because we divide a minute by 60 to get a second, and so we have to multiply 2 by 60 to get the equivalent representation in seconds. Similarly for frequency: 4 kHz = 4000 Hz, because 1 kHz = 1000 Hz. But now notice something interesting—frequency, remember, is a function from an amount of time to a quantity. If something is happening at 4 Hz for 60 seconds, it happens 240 times. What if we want to change units from seconds to minutes? We'd like to express 4 Hz, which is the same as 4 $\text{s}^{-1}$, in $\text{min}^{-1}$ instead. Well, the frequency itself isn't changing; it must still be the case that something occurring at this frequency for 60 seconds, or one minute, results in 240 events. That means that 4 Hz = 240 $\text{min}^{-1}$. So this time, multiplying the unit by 60 meant *also multiplying* the numeric representation of the frequency by 60. But this isn't mysterious at all; we did something different here than in the previous examples. Before, we were changing the same units we were using in our representation; this time, we were changing units of time and using units of frequency. In the language of linear algebra, we scaled our basis in one vector space, $\text{Time}$, and scaled the numeric representation of vectors in the *dual* vector space $\text{Time}^*$ by the same amount. This is also a general rule for any vector space and its dual.
The words ‘contravariant’ and ‘covariant’ just distinguish between these two rules. When there is a particular vector space of interest $V$, and we are sometimes working with vectors in that space and sometimes vectors in its dual space, we say that the vectors in $V$ are contravariant because we divide their numerical representations when we multiply basis vectors in $V$, and the dual vectors in $V^*$ are covariant because we multiply their numeric representations when we multiply basis vectors in $V$.
With all that out of the way, let's revisit Speedy's speed. When we say ‘60 cm/min’, we are expressing an exchange rate of a certain number of centimeters for every minute provided. Much like a frequency was a linear function from an amount of time to a dimensionless quantity, a speed is a linear function from an amount of time to an amount of one-dimensional space. If Speedy can travel 60 centimeters given one minute, Speedy can travel 120 centimeters given two minutes, and so on. Unlike frequency, the result of a speed when given an amount of time is not a dimensionless quantity, but another vector. So a speed is a linear function from vectors in one space to vectors in another.
The original math problem asked us to evaluate the function ‘60 cm/min’ on the vector ‘1 s’. We could do this by translating 1 s to $\frac{1}{60} \\,\text{min}$, since time vectors are contravariant with respect to time units. Then $\frac{1}{60} \cdot 60\\,\text{cm} = 1\\,\text{cm}$ is the result of the function and the answer to the question. But instead, we could translate 60 cm/min directly to cm/s. We ought to get the same answer either way, so we know that 60 cm/min must equal 1 cm/s. This implies that speed is covariant with respect to time units; when we scale down our unit of time, we scale down our representation of speed by the same amount. However, if we wanted to translate from cm/min to mm/min instead for some reason, we would have to scale *up* our representation of speed to 600 mm/min; speed is contravariant with respect to space units.
All this without having to talk about tensors yet! Let's fix that.
Speedy's cousin Sticky is treasure hunting and has mounted a compass-like metal detector to her shell. At Sticky's current location, for every centimeter Sticky creeps north, the needle of the detector dips down 0.017 mm and to the left 0.003 mm; for every centimeter Sticky creeps east, the needle moves to the left 0.058 mm. Assuming the detector has a linear response in Sticky's vicinity, how far does the needle move in total if she slides a total of 1.5 cm north and 0.5 cm west?
The intended solution to this problem is quite straightforward: the needle moves $1.5 \cdot -0.017\\,\text{mm} - 0.5 \cdot 0\\,\text{mm} = -0.0255\\,\text{mm}$ vertically, and $1.5 \cdot -0.003\\,\text{mm} - 0.5 \cdot -0.058\\,\text{mm} = 0.0245\\,\text{mm}$ horizontally. But just as with Speedy's speed, let's see what kind of mathematical object is represented by the response of Sticky's metal detector.
Just as a speed expresses an exchange rate between one-dimensional space and time, the numbers in this problem express exchange rates between two different two-dimensional spaces—namely, Sticky's displacement and the needle's displacement. The problem tells us to assume this relationship is linear, so it's natural to try to look at these numbers as a linear function from $\text{2-Space} \to \text{2-Space}$. Indeed, that's precisely what they are, and if you've done any linear algebra you might expect such a linear function between vector spaces to be represented as a matrix:
$$
\left\[
\begin{array}{cc}
-0.017 & 0 \\\\
-0.003 & -0.058
\end{array}
\right\]
$$
Instead, let's look at each element individually, and without dropping the units. The first element we're given is that the needle dips down 0.017 mm for every centimeter Sticky travels north—which is to say, each vector $1\\,\text{cm north}$ in the input contributes the vector $-0.017\\,\text{mm up}$ to the output. Notice that this component all by itself is a linear function from $\text{2-Space} \to \text{2-Space}$. So are the other two non-zero components given. So another way to represent this response is as the sum of three primitive linear functions, each of which consumes one basis vector in $\text{2-Space}$ and produces a multiple of a basis vector in the other $\text{2-Space}$. For a given basis for each copy of $\text{2-Space}$, there are only four total such combinations—and of course, these correspond to the four components of the matrix above. The symmetry of this representation suggests an alternate way to look at these primitive functions: as the Cartesian product of basis elements that represent a vector and basis elements that linearly accept a vector (or, in other words, basis linear forms).
This motivates the definition of a tensor product of vector spaces (over a common field of scalars): a new vector space generated by tuples of one basis element from each of the factor vector spaces. We write $V \otimes W$ for the tensor product of vector spaces $V$ and $W$, as well as $\vec e \otimes \vec f$ for the ‘basis tensor’ represented by the pair of basis vectors $\vec e$ and $\vec f$. The tensor product $\vec v \otimes \vec w$ for arbitrary vectors $\vec v$ and $\vec w$ is defined by extension, by distributing the $\otimes$ operator over the decomposition of $\vec v$ and $\vec w$ into sums of basis vectors. (Tensor products can also be defined without referencing specific bases, but that definition is a little more abstract and thus perhaps harder to connect to intuition. But it is useful to know that just like algebraic operations on vectors are true regardless of the basis being used, so are operations on tensors—the bases are only useful for writing specific vectors and tensors down.)
So Sticky's metal detector response can also be represented as a tensor in $\text{2-Space} \otimes \text{2-Space}^\*$. Specifically, it is the tensor
$$
\begin{multline}
-0.017\\,\text{mm up}\otimes(\text{cm north})^{-1} - 0.003\\,\text{mm right}\otimes(\text{cm north})^{-1}\\\\
\- 0.058\\,\text{mm right}\otimes(\text{cm east})^{-1}\\;\text{.}
\end{multline}
$$
In fact, any space of linear functions from a vector space $V$ to a vector space $W$ is equivalent to the tensor product $W \otimes V^\*$, and this generalizes to multilinear functions of multiple vector spaces and so on. By applying reasoning that should now be familiar, we can see that this representation of Sticky's detector is contravariant with respect to the space of the needle's displacement vectors, and covariant with respect to the space of Sticky's displacement vectors.
Finally, let's look at what happens when we solve the Sticky problem. Just like Speedy's speed was a function that we applied to an input vector to get an output vector, Sticky's detector response is also such a function. We simply have to apply the above tensor as a function to the vector $1.5\\,\text{cm north} - 0.5\\,\text{cm east}$ to get the answer. The way we do that is by looking at the tensor when decomposed into primitive linear functions operating on single basis vectors, matching each basis-vector-consuming term to the corresponding term in the input vector, and summing. Let's call the detector tensor $T$, and call the coefficients of the tensor $T_{\text{cm north}}^{\text{mm up}}$, $T_{\text{cm east}}^{\text{mm up}}$, $T_{\text{cm north}}^{\text{mm right}}$, and $T_{\text{cm east}}^{\text{mm right}}$ in what I hope is the obvious way. Following convention, I've placed the contravariant basis vectors in the upper position and the covariant basis vectors in the lower position. Similarly, let's represent Sticky's displacement as $v^{\text{cm north}}$ and $v^{\text{cm east}}$. Then the answer vector $\vec w$ was given by:
$$
\begin{align}
w\^{\text{mm up}} &= T_{\text{cm north}}\^{\text{mm up}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm up}} v\^{\text{cm east}} \\\\
w\^{\text{mm right}} &= T_{\text{cm north}}\^{\text{mm right}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm right}} v\^{\text{cm east}}
\end{align}
$$
Now we can save quite a bit of writing by abstracting over the basis vectors with indices. Let the index $s$ represent one of the Sticky displacement basis vectors $\\{\text{cm north}, \text{cm east}\\}$, and let the index $n$ represent one of the needle movement basis vectors $\\{\text{mm up}, \text{mm right}\\}$. Then we can replace the above equations with the general:
$$
w^n = \sum_d T_d^n v^d
$$
This operation of matching a covariant index with a contravariant index and summing is called contraction. It is extremely common, and also quite intuitive once you're comfortable with the underlying concepts—you're just feeding an output vector into a linear form which expects an input vector! Contraction is such a common operation in tensor algebra that there is a convention, [Einstein notation](https://en.wikipedia.org/wiki/Einstein_summation), which enables the $\sum$ to be omitted on terms where the same index variable is used once in an upper position and once in a lower position.
So in summary:
* Tensors are combinations of vectors from multiple vector spaces.
* Basis vectors are like physical units; tensors are like multidimensional quantities involving several units.
* A space of tensors can be covariant or contravariant in an underlying vector space, depending on whether it combines vectors or dual vectors on that space. The terminology just describes how the coefficients of tensors need to be adjusted if the basis of that vector space is scaled.
* But tensors covariant in a given vector space also represent linear functions consuming vectors from that space, while tensors contravariant in a vector space represent producing vectors in that space.
* A variable with upper and/or lower indices represents a coefficient from a tensor given some set of bases for the input vector spaces.
* Upper indices correspond to contravariant basis vectors and lower indices correspond to covariant basis vectors.
* Matching upper and lower indices are meant to be summed over, which corresponds to feeding contravariant vectors to linear forms, a.k.a. covariant vectors.

A textbook homework problem might ask, ‘Speedy the snail creeps along at a steady pace of 60 cm per minute. How far does Speedy travel each second?’ The correct answer is, of course, one centimeter. Hopefully your teacher would also accept 10 mm or 0.01 m as equally valid answers; after all, they all represent the same physical length. But how do we know that? And how do we get that answer from 60 cm/min? What even is a quantity like 60 cm/min, mathematically?
Well, as it happens, 60 cm/min is a tensor in $\text{1-Space} \otimes \text{Time}^*$ expressed using the bases $((1\\, \text{cm}), (1\\, \text{min}^{-1}))$, where the first basis is contravariant relative to the units we usually use for physics, and the second basis is covariant relative to those units. All of the above questions follow from this. But I'm getting ahead of myself!
When we work with units such as meters, centimeters, minutes, seconds, etc., we bring a certain set of assumptions along. For starters, we assume that the quantities that we use units to represent can be added to each other if the units match: 1 meter + 1 meter = 2 meters. We also assume that these quantities can be multiplied by dimensionless quantities to get more dimensioned quantities in the same units: 5(2 m) = 10 m. We assume a bunch of intuitive facts that more or less boil down to not caring in what order any of these additions or multiplications are performed: 1 m + 2 m = 2 m + 1 m, 1 m + (2 m + 3 m) = (1 m + 2 m) + 3 m, (2 + 3)(10 m) = 2(10 m) + 3(10 m), 2(3(5 m)) = (2×3)(5 m), etc. Some of our units are compatible with each other, like meters and centimeters or minutes and seconds, and if we appropriately scale the quantities expressed in those units, they can also be added with each other. Finally, we assume that there's a ‘nothing’ associated with each of these families of compatible units, and that it doesn't matter which unit is used to express that nothing: 0 m = 0 cm. We expect it to behave like a proper nothing: 0 m + 5 m = 5 m, and 0(7 m) = 0 m.
Those assumptions are precisely what make each ‘family of compatible units’ into a vector space, for which the appropriate units are alternate choices of bases. I'll call the vector space for which meters are appropriate basis elements $\text{1-Space}$ (because it's only one dimension of space, not the three we often work with), and the vector space for which seconds are appropriate basis elements $\text{Time}$. These are both one-dimensional vector spaces, and as such vectors in these spaces only require a single number to represent them. It may seem that there isn't much point to calling them vectors at all, but we'll see how these concepts generalize to more interesting vector spaces later.
Here's a question: is a [hertz](https://en.wikipedia.org/wiki/Hertz) (Hz, or $\text{s}^{-1}$) a vector in $\text{Time}$? Say Alice types at a rate such that she presses a key five times in a second; we can say she types at 5 Hz. If Bob types at 4 Hz, then together, nine keys are being pressed in a second; Alice and Bob together type at 9 Hz. So it certainly seems like frequencies can be meaningfully added. You can check that the rest of the vector axioms end up holding as well, and so it does seem like a hertz is a vector in some vector space. But that space isn't $\text{Time}$; adding hertz and seconds isn't a thing we can ever make sense of.
However, there is a relationship between frequency and time: a frequency tells you how much of some dimensionless quantity you get for a given amount of time. If you type at 5 Hz for 1 s, you have pressed 5 keys. If you type at 5 Hz for 4 s, you have pressed 20 keys; the result scales linearly with the amount of time. So we can say that a frequency, in addition to being a vector in its own right, is also a linear function from a vector in $\text{Time}$ to dimensionless quantities. Linear functions from vector spaces to scalars are so important that they have gotten a bunch of special names: we call them linear forms, or one-forms, or dual vectors, or covectors. For *any* vector space $V$, the linear forms of $V$ form their own vector space called the dual space of $V$, denoted $V^\*$. So the vector space of frequencies is actually $\text{Time}^\*$, the dual space of $\text{Time}$.
When I was talking about compatible units before, I was vague about how to scale values when converting between units. Let's hammer that out now. We know that one meter equals 100 centimeters. So if we want to express 2 m in centimeters, we know to multiply 2 by 100 to get 200 cm. In general, in a vector space, if you have expressed a vector in some basis, and then you want to change your basis by dividing it by some amount, you have to multiply the numbers in your vector representation by that same amount. That's all that's happening here: as we change basis from 1 m to 1 cm (dividing by 100), we have to multiply the number 2 by the same number, 100, to get 200 cm from 2 m. (And of course, since dividing by $x$ is the same as multiplying by $\frac{1}{x}$, we can equivalently say that if you change basis by multiplying by some amount, you have to divide the numbers in your vector representation by that amount.)
The same logic applies for time: 2 min = 120 s because we divide a minute by 60 to get a second, and so we have to multiply 2 by 60 to get the equivalent representation in seconds. Similarly for frequency: 4 kHz = 4000 Hz, because 1 kHz = 1000 Hz. But now notice something interesting—frequency, remember, is a function from an amount of time to a quantity. If something is happening at 4 Hz for 60 seconds, it happens 240 times. What if we want to change units from seconds to minutes? We'd like to express 4 Hz, which is the same as 4 $\text{s}^{-1}$, in $\text{min}^{-1}$ instead. Well, the frequency itself isn't changing; it must still be the case that something occurring at this frequency for 60 seconds, or one minute, results in 240 events. That means that 4 Hz = 240 $\text{min}^{-1}$. So this time, multiplying the unit by 60 meant *also multiplying* the numeric representation of the frequency by 60. But this isn't mysterious at all; we did something different here than in the previous examples. Before, we were changing the same units we were using in our representation; this time, we were changing units of time and using units of frequency. In the language of linear algebra, we scaled our basis in one vector space, $\text{Time}$, and scaled the numeric representation of vectors in the *dual* vector space $\text{Time}^*$ by the same amount. This is also a general rule for any vector space and its dual.
The words ‘contravariant’ and ‘covariant’ just distinguish between these two rules. When there is a particular vector space of interest $V$, and we are sometimes working with vectors in that space and sometimes vectors in its dual space, we say that the vectors in $V$ are contravariant because we divide their numerical representations when we multiply basis vectors in $V$, and the dual vectors in $V^*$ are covariant because we multiply their numeric representations when we multiply basis vectors in $V$.
With all that out of the way, let's revisit Speedy's speed. When we say ‘60 cm/min’, we are expressing an exchange rate of a certain number of centimeters for every minute provided. Much like a frequency was a linear function from an amount of time to a dimensionless quantity, a speed is a linear function from an amount of time to an amount of one-dimensional space. If Speedy can travel 60 centimeters given one minute, Speedy can travel 120 centimeters given two minutes, and so on. Unlike frequency, the result of a speed when given an amount of time is not a dimensionless quantity, but another vector. So a speed is a linear function from vectors in one space to vectors in another.
The original math problem asked us to evaluate the function ‘60 cm/min’ on the vector ‘1 s’. We could do this by translating 1 s to $\frac{1}{60} \\,\text{min}$, since time vectors are contravariant with respect to time units. Then $\frac{1}{60} \cdot 60\\,\text{cm} = 1\\,\text{cm}$ is the result of the function and the answer to the question. But instead, we could translate 60 cm/min directly to cm/s. We ought to get the same answer either way, so we know that 60 cm/min must equal 1 cm/s. This implies that speed is covariant with respect to time units; when we scale down our unit of time, we scale down our representation of speed by the same amount. However, if we wanted to translate from cm/min to mm/min instead for some reason, we would have to scale *up* our representation of speed to 600 mm/min; speed is contravariant with respect to space units.
All this without having to talk about tensors yet! Let's fix that.
Speedy's cousin Sticky is treasure hunting and has mounted a compass-like metal detector to her shell. At Sticky's current location, for every centimeter Sticky creeps north, the needle of the detector dips down 0.017 mm and to the left 0.003 mm; for every centimeter Sticky creeps east, the needle moves to the left 0.058 mm. Assuming the detector has a linear response in Sticky's vicinity, how far does the needle move in total if she slides a total of 1.5 cm north and 0.5 cm west?
The intended solution to this problem is quite straightforward: the needle moves $1.5 \cdot -0.017\\,\text{mm} - 0.5 \cdot 0\\,\text{mm} = -0.0255\\,\text{mm}$ vertically, and $1.5 \cdot -0.003\\,\text{mm} - 0.5 \cdot -0.058\\,\text{mm} = 0.0245\\,\text{mm}$ horizontally. But just as with Speedy's speed, let's see what kind of mathematical object is represented by the response of Sticky's metal detector.
Just as a speed expresses an exchange rate between one-dimensional space and time, the numbers in this problem express exchange rates between two different two-dimensional spaces—namely, Sticky's displacement and the needle's displacement. The problem tells us to assume this relationship is linear, so it's natural to try to look at these numbers as a linear function from $\text{2-Space} \to \text{2-Space}$. Indeed, that's precisely what they are, and if you've done any linear algebra you might expect such a linear function between vector spaces to be represented as a matrix:
$$
\left\[
\begin{array}{cc}
-0.017 & 0 \\\\
-0.003 & -0.058
\end{array}
\right\]
$$
Instead, let's look at each element individually, and without dropping the units. The first element we're given is that the needle dips down 0.017 mm for every centimeter Sticky travels north—which is to say, each vector $1\\,\text{cm north}$ in the input contributes the vector $-0.017\\,\text{mm up}$ to the output. Notice that this component all by itself is a linear function from $\text{2-Space} \to \text{2-Space}$. So are the other two non-zero components given. So another way to represent this response is as the sum of three primitive linear functions, each of which consumes one basis vector in $\text{2-Space}$ and produces a multiple of a basis vector in the other $\text{2-Space}$. For a given basis for each copy of $\text{2-Space}$, there are only four total such combinations—and of course, these correspond to the four components of the matrix above. The symmetry of this representation suggests an alternate way to look at these primitive functions: as the Cartesian product of basis elements that represent a vector and basis elements that linearly accept a vector (or, in other words, basis linear forms).
This motivates the definition of a tensor product of vector spaces (over a common field of scalars): a new vector space generated by tuples of one basis element from each of the factor vector spaces. We write $V \otimes W$ for the tensor product of vector spaces $V$ and $W$, as well as $\vec e \otimes \vec f$ for the ‘basis tensor’ represented by the pair of basis vectors $\vec e$ and $\vec f$. The tensor product $\vec v \otimes \vec w$ for arbitrary vectors $\vec v$ and $\vec w$ is defined by extension, by distributing the $\otimes$ operator over the decomposition of $\vec v$ and $\vec w$ into sums of basis vectors. (Tensor products can also be defined without referencing specific bases, but that definition is a little more abstract and thus perhaps harder to connect to intuition. But it is useful to know that just like algebraic operations on vectors are true regardless of the basis being used, so are operations on tensors—the bases are only useful for writing specific vectors and tensors down.)
So Sticky's metal detector response can also be represented as a tensor in $\text{2-Space} \otimes \text{2-Space}^\*$. Specifically, it is the tensor
$$
\begin{multline}
-0.017\\,\text{mm up}\otimes(\text{cm north})^{-1} - 0.003\\,\text{mm right}\otimes(\text{cm north})^{-1}\\\\
\- 0.058\\,\text{mm right}\otimes(\text{cm east})^{-1}\\;\text{.}
\end{multline}
$$
In fact, any space of linear functions from a vector space $V$ to a vector space $W$ is equivalent to the tensor product $W \otimes V^\*$, and this generalizes to multilinear functions of multiple vector spaces and so on. By applying reasoning that should now be familiar, we can see that this representation of Sticky's detector is contravariant with respect to the space of the needle's displacement vectors, and covariant with respect to the space of Sticky's displacement vectors.
Finally, let's look at what happens when we solve the Sticky problem. Just like Speedy's speed was a function that we applied to an input vector to get an output vector, Sticky's detector response is also such a function. We simply have to apply the above tensor as a function to the vector $1.5\\,\text{cm north} - 0.5\\,\text{cm east}$ to get the answer. The way we do that is by looking at the tensor when decomposed into primitive linear functions operating on single basis vectors, matching each basis-vector-consuming term to the corresponding term in the input vector, and summing. Let's call the detector tensor $T$, and call the coefficients of the tensor $T_{\text{cm north}}^{\text{mm up}}$, $T_{\text{cm east}}^{\text{mm up}}$, $T_{\text{cm north}}^{\text{mm right}}$, and $T_{\text{cm east}}^{\text{mm right}}$ in what I hope is the obvious way. Following convention, I've placed the contravariant basis vectors in the upper position and the covariant basis vectors in the lower position. Similarly, let's represent Sticky's displacement as $v^{\text{cm north}}$ and $v^{\text{cm east}}$. Then the answer vector $\vec w$ was given by:
$$
\begin{align}
w\^{\text{mm up}} &= T_{\text{cm north}}\^{\text{mm up}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm up}} v\^{\text{cm east}} \\\\
w\^{\text{mm right}} &= T_{\text{cm north}}\^{\text{mm right}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm right}} v\^{\text{cm east}}
\end{align}
$$
Now we can save quite a bit of writing by abstracting over the basis vectors with indices. Let the index $s$ represent one of the Sticky displacement basis vectors $\\{\text{cm north}, \text{cm east}\\}$, and let the index $n$ represent one of the needle movement basis vectors $\\{\text{mm up}, \text{mm right}\\}$. Then we can replace the above equations with the general:
$$
w^n = \sum_d T_d^n v^d
$$
This operation of matching a covariant index with a contravariant index and summing is called contraction. It is extremely common, and also quite intuitive once you're comfortable with the underlying concepts—you're just feeding an output vector into a linear form which expects an input vector! Contraction is such a common operation in tensor algebra that there is a convention, [Einstein notation](https://en.wikipedia.org/wiki/Einstein_summation), which enables the $\sum$ to be omitted on terms where the same index variable is used once in an upper position and once in a lower position.
So in summary:
* Tensors are combinations of vectors from multiple vector spaces.
* Basis vectors are like physical units; tensors are like multidimensional quantities involving several units.
* A space of tensors can be covariant or contravariant in an underlying vector space, depending on whether it combines vectors or dual vectors on that space. The terminology just describes how the coefficients of tensors need to be adjusted if the basis of that vector space is scaled.
* But tensors covariant in a given vector space also represent linear functions consuming vectors from that space, while tensors contravariant in a vector space represent producing vectors in that space.
* A variable with upper and/or lower indices represents a coefficient from a tensor given some set of bases for the vector spaces being tensored.
* Upper indices correspond to contravariant basis vectors and lower indices correspond to covariant basis vectors.
* Matching upper and lower indices are meant to be summed over, which corresponds to feeding contravariant vectors to linear forms, a.k.a. covariant vectors.

#2: Post edited by $user avatar$ r~~‭ · 2021-11-02T02:54:18Z (over 3 years ago)

Copy Link

Raw

Markdown

A textbook homework problem might ask, ‘Speedy the snail creeps along at a steady pace of 60 cm per minute. How far does Speedy travel each second?’ The correct answer is, of course, one centimeter. Hopefully your teacher would also accept 10 mm or 0.01 m as equally valid answers; after all, they all represent the same physical length. But how do we know that? And how do we get that answer from 60 cm/min? What even is a quantity like 60 cm/min, mathematically?
Well, as it happens, 60 cm/min is a tensor in $\text{1-Space} \otimes \text{Time}^*$ expressed using the bases $((1\\, \text{cm}), (1\\, \text{min}^{-1}))$, where the first basis is contravariant relative to the units we usually use for physics, and the second basis is covariant relative to those units. All of the above questions follow from this. But I'm getting ahead of myself!
When we work with units such as meters, centimeters, minutes, seconds, etc., we bring a certain set of assumptions along. For starters, we assume that the quantities that we use units to represent can be added to each other if the units match: 1 meter + 1 meter = 2 meters. We also assume that these quantities can be multiplied by dimensionless quantities to get more dimensioned quantities in the same units: 5(2 m) = 10 m. We assume a bunch of intuitive facts that more or less boil down to not caring in what order any of these additions or multiplications are performed: 1 m + 2 m = 2 m + 1 m, 1 m + (2 m + 3 m) = (1 m + 2 m) + 3 m, (2 + 3)(10 m) = 2(10 m) + 3(10 m), 2(3(5 m)) = (2×3)(5 m), etc. Some of our units are compatible with each other, like meters and centimeters or minutes and seconds, and if we appropriately scale the quantities expressed in those units, they can also be added with each other. Finally, we assume that there's a ‘nothing’ associated with each of these families of compatible units, and that it doesn't matter which unit is used to express that nothing: 0 m = 0 cm. We expect it to behave like a proper nothing: 0 m + 5 m = 5 m, and 0(7 m) = 0 m.
Those assumptions are precisely what make each ‘family of compatible units’ into a vector space, for which the appropriate units are alternate choices of bases. I'll call the vector space for which meters are appropriate basis elements $\text{1-Space}$ (because it's only one dimension of space, not the three we often work with), and the vector space for which seconds are appropriate basis elements $\text{Time}$. These are both one-dimensional vector spaces, and as such vectors in these spaces only require a single number to represent them. It may seem that there isn't much point to calling them vectors at all, but we'll see how these concepts generalize to more interesting vector spaces later.
Here's a question: is a [hertz](https://en.wikipedia.org/wiki/Hertz) (Hz, or $\text{s}^{-1}$) a vector in $\text{Time}$? Say Alice types at a rate such that she presses a key five times in a second; we can say she types at 5 Hz. If Bob types at 4 Hz, then together, nine keys are being pressed in a second; Alice and Bob together type at 9 Hz. So it certainly seems like frequencies can be meaningfully added. You can check that the rest of the vector axioms end up holding as well, and so it does seem like a hertz is a vector in some vector space. But that space isn't $\text{Time}$; adding hertz and seconds isn't a thing we can ever make sense of.
However, there is a relationship between frequency and time: a frequency tells you how much of some dimensionless quantity you get for a given amount of time. If you type at 5 Hz for 1 s, you have pressed 5 keys. If you type at 5 Hz for 4 s, you have pressed 20 keys; the result scales linearly with the amount of time. So we can say that a frequency, in addition to being a vector in its own right, is also a linear function from a vector in $\text{Time}$ to dimensionless quantities. Linear functions from vector spaces to scalars are so important that they have gotten a bunch of special names: we call them linear forms, or one-forms, or dual vectors, or covectors. For *any* vector space $V$, the linear forms of $V$ form their own vector space called the dual space of $V$, denoted $V^\*$. So the vector space of frequencies is actually $\text{Time}^\*$, the dual space of $\text{Time}$.
When I was talking about compatible units before, I was vague about how to scale values when converting between units. Let's hammer that out now. We know that one meter equals 100 centimeters. So if we want to express 2 m in centimeters, we know to multiply 2 by 100 to get 200 cm. In general, in a vector space, if you have expressed a vector in some basis, and then you want to change your basis by dividing it by some amount, you have to multiply the numbers in your vector representation by that same amount. That's all that's happening here: as we change basis from 1 m to 1 cm (dividing by 100), we have to multiply the number 2 by the same number, 100, to get 200 cm from 2 m. (And of course, since dividing by $x$ is the same as multiplying by $\frac{1}{x}$, we can equivalently say that if you change basis by multiplying by some amount, you have to divide the numbers in your vector representation by that amount.)
The same logic applies for time: 2 min = 120 s because we divide a minute by 60 to get a second, and so we have to multiply 2 by 60 to get the equivalent representation in seconds. Similarly for frequency: 4 kHz = 4000 Hz, because 1 kHz = 1000 Hz. But now notice something interesting—frequency, remember, is a function from an amount of time to a quantity. If something is happening at 4 Hz for 60 seconds, it happens 240 times. What if we want to change units from seconds to minutes? We'd like to express 4 Hz, which is the same as 4 $\text{s}^{-1}$, in $\text{min}^{-1}$ instead. Well, the frequency itself isn't changing; it must still be the case that something occurring at this frequency for 60 seconds, or one minute, results in 240 events. That means that 4 Hz = 240 $\text{min}^{-1}$. So this time, multiplying the unit by 60 meant *also multiplying* the numeric representation of the frequency by 60. But this isn't mysterious at all; we did something different here than in the previous examples. Before, we were changing the same units we were using in our representation; this time, we were changing units of time and using units of frequency. In the language of linear algebra, we scaled our basis in one vector space, $\text{Time}$, and scaled the numeric representation of vectors in the *dual* vector space $\text{Time}^*$ by the same amount. This is also a general rule for any vector space and its dual.
The words ‘contravariant’ and ‘covariant’ just distinguish between these two rules. When there is a particular vector space of interest $V$, and we are sometimes working with vectors in that space and sometimes vectors in its dual space, we say that the vectors in $V$ are contravariant because we divide their numerical representations when we multiply basis vectors in $V$, and the dual vectors in $V^*$ are covariant because we multiply their numeric representations when we multiply basis vectors in $V$.
With all that out of the way, let's revisit Speedy's speed. When we say ‘60 cm/min’, we are expressing an exchange rate of a certain number of centimeters for every minute provided. Much like a frequency was a linear function from an amount of time to a dimensionless quantity, a speed is a linear function from an amount of time to an amount of one-dimensional space. If Speedy can travel 60 centimeters given one minute, Speedy can travel 120 centimeters given two minutes, and so on. Unlike frequency, the result of a speed when given an amount of time is not a dimensionless quantity, but another vector. So a speed is a linear function from vectors in one space to vectors in another.
The original math problem asked us to evaluate the function ‘60 cm/min’ on the vector ‘1 s’. We could do this by translating 1 s to $\frac{1}{60} \\,\text{min}$, since time vectors are contravariant with respect to time units. Then $\frac{1}{60} \cdot 60\\,\text{cm} = 1\\,\text{cm}$ is the result of the function and the answer to the question. But instead, we could translate 60 cm/min directly to cm/s. We ought to get the same answer either way, so we know that 60 cm/min must equal 1 cm/s. This implies that speed is covariant with respect to time units; when we scale down our unit of time, we scale down our representation of speed by the same amount. However, if we wanted to translate from cm/min to mm/min instead for some reason, we would have to scale *up* our representation of speed to 600 mm/min; speed is contravariant with respect to space units.
All this without having to talk about tensors yet! Let's fix that.
Speedy's cousin Sticky is treasure hunting and has mounted a compass-like metal detector to her shell. At Sticky's current location, for every centimeter Sticky creeps north, the needle of the detector dips down 0.017 mm and to the left 0.003 mm; for every centimeter Sticky creeps east, the needle moves to the left 0.058 mm. Assuming the detector has a linear response in Sticky's vicinity, how far does the needle move in total if she slides a total of 1.5 cm north and 0.5 cm west?
The intended solution to this problem is quite straightforward: the needle moves $1.5 \cdot -0.017\\,\text{mm} - 0.5 \cdot 0\\,\text{mm} = -0.0255\\,\text{mm}$ vertically, and $1.5 \cdot -0.003\\,\text{mm} - 0.5 \cdot -0.058\\,\text{mm} = 0.0245\\,\text{mm}$ horizontally. But just as with Speedy's speed, let's see what kind of mathematical object is represented by the response of Sticky's metal detector.
Just as a speed expresses an exchange rate between one-dimensional space and time, the numbers in this problem express exchange rates between two different two-dimensional spaces—namely, Sticky's displacement and the needle's displacement. The problem tells us to assume this relationship is linear, so it's natural to try to look at these numbers as a linear function from $\text{2-Space} \to \text{2-Space}$. Indeed, that's precisely what they are, and if you've done any linear algebra you might expect such a linear function between vector spaces to be represented as a matrix:
$$
\left\[
\begin{array}{cc}
-0.017 & 0 \\\\
-0.003 & -0.058
\end{array}
\right\]
$$
Instead, let's look at each element individually, and without dropping the units. The first element we're given is that the needle dips down 0.017 mm for every centimeter Sticky travels north—which is to say, each vector $1\\,\text{cm north}$ in the input contributes the vector $-0.017\\,\text{mm up}$ to the output. Notice that this component all by itself is a linear function from $\text{2-Space} \to \text{2-Space}$. So are the other two non-zero components given. So another way to represent this response is as the sum of three primitive linear functions, each of which consumes one basis vector in $\text{2-Space}$ and produces a multiple of a basis vector in the other $\text{2-Space}$. For a given basis for each copy of $\text{2-Space}$, there are only four total such combinations—and of course, these correspond to the four components of the matrix above. The symmetry of this representation suggests an alternate way to look at these primitive functions: as the Cartesian product of basis elements that represent a vector and basis elements that linearly accept a vector (or, in other words, basis linear forms).
This motivates the definition of a tensor product of vector spaces (over a common field of scalars): a new vector space generated by tuples of one basis element from each of the factor vector spaces. We write $V \otimes W$ for the tensor product of vector spaces $V$ and $W$, as well as $\vec{e} \otimes \vec{f}$ for the ‘basis tensor’ represented by the pair of basis vectors $e$ and $f$. By extension, $\vec{v} \otimes \vec{w}$ for arbitrary vectors $\vec{v}$ and $\vec{w}$ can be defined by distributing the $\otimes$ operator over the decomposition of $\vec{v}$ and $\vec{w}$ into sums of basis vectors. (Tensor products can also be defined without referencing specific bases, but that definition is a little more abstract and thus perhaps harder to connect to intuition.)
So Sticky's metal detector response can also be represented as a tensor in $\text{2-Space} \otimes \text{2-Space}^\*$. Specifically, it is the tensor $-0.017\\,\text{mm up}\otimes(\text{cm north})^{-1} - 0.003\\,\text{mm right}\otimes(\text{cm north})^{-1} - 0.058\\,\text{mm right}\otimes(\text{cm east})^{-1}$. In fact, any space of linear functions from a vector space $V$ to a vector space $W$ is equivalent to the tensor product $W \otimes V^\*$, and this generalizes to multilinear functions of multiple vector spaces and so on. By applying reasoning that should now be familiar, we can see that Sticky's detector is contravariant with respect to the basis $(1\\,\text{cm north}, 1\\,\text{cm east})$, and covariant with respect to the basis $(1\\,\text{mm up}, 1\\,\text{mm right})$.
Finally, let's look at what happens when we solve the Sticky problem. Just like Speedy's speed was a function that we applied to an input vector to get an output vector, Sticky's detector response is also such a function. We simply have to apply the above tensor as a function to the vector $1.5\\,\text{cm north} - 0.5\\,\text{cm east}$ to get the answer. The way we do that is by looking at the tensor when decomposed into primitive linear functions operating on single basis vectors, matching each basis-vector-consuming term to the corresponding term in the input vector, and summing. Let's call the detector tensor $T$, and call the coefficients of the tensor $T_{\text{cm north}}^{\text{mm up}}$, $T_{\text{cm east}}^{\text{mm up}}$, $T_{\text{cm north}}^{\text{mm right}}$, and $T_{\text{cm east}}^{\text{mm right}}$ in what I hope is the obvious way. Following convention, I've placed the contravariant basis vectors in the upper position and the covariant basis vectors in the lower position. Similarly, let's represent Sticky's displacement as $v^{\text{cm north}}$ and $v^{\text{cm east}}$. Then the answer vector $\vec{w}$ was given by:
$$
\begin{align}
w\^{\text{mm up}} &= T_{\text{cm north}}\^{\text{mm up}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm up}} v\^{\text{cm east}} \\\\
w\^{\text{mm right}} &= T_{\text{cm north}}\^{\text{mm right}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm right}} v\^{\text{cm east}}
\end{align}
$$
Now we can save quite a bit of writing by abstracting over the basis vectors with indices. Let the index $s$ represent one of the Sticky displacement basis vectors $\\{\text{cm north}, \text{cm east}\\}$, and let the index $n$ represent one of the needle movement basis vectors $\\{\text{mm up}, \text{mm right}\\}$. Then we can replace the above equations with the general:
$$
w^n = \sum_d T_d^n v^d
$$
This operation of matching a covariant index with a contravariant index and summing is called contraction. It is extremely common, and also quite intuitive once you're comfortable with the underlying concepts—you're just feeding an output vector into a linear form which expects an input vector! Contraction is such a common operation in tensor algebra that there is a convention, [Einstein notation](https://en.wikipedia.org/wiki/Einstein_summation), which enables the $\sum$ to be omitted on terms where the same index variable is used once in an upper position and once in a lower position.
So in summary:
* Tensors are combinations of vectors from multiple vector spaces.
* Basis vectors are like physical units; tensors are like multidimensional quantities involving several units.
* A space of tensors can be covariant or contravariant in an underlying vector space, depending on whether it combines vectors or dual vectors on that space. The terminology just describes how the coefficients of tensors need to be adjusted if the basis of that vector space is scaled.
* But tensors covariant in a given vector space also represent linear functions consuming vectors from that space, while tensors contravariant in a vector space represent producing vectors in that space.
* A variable with upper and/or lower indices represents a coefficient from a tensor given some set of bases for the input vector spaces.
* Upper indices correspond to contravariant basis vectors and lower indices correspond to covariant basis vectors.
* Matching upper and lower indices are meant to be summed over, which corresponds to feeding contravariant vectors to linear forms, a.k.a. covariant vectors.

A textbook homework problem might ask, ‘Speedy the snail creeps along at a steady pace of 60 cm per minute. How far does Speedy travel each second?’ The correct answer is, of course, one centimeter. Hopefully your teacher would also accept 10 mm or 0.01 m as equally valid answers; after all, they all represent the same physical length. But how do we know that? And how do we get that answer from 60 cm/min? What even is a quantity like 60 cm/min, mathematically?
Well, as it happens, 60 cm/min is a tensor in $\text{1-Space} \otimes \text{Time}^*$ expressed using the bases $((1\\, \text{cm}), (1\\, \text{min}^{-1}))$, where the first basis is contravariant relative to the units we usually use for physics, and the second basis is covariant relative to those units. All of the above questions follow from this. But I'm getting ahead of myself!
When we work with units such as meters, centimeters, minutes, seconds, etc., we bring a certain set of assumptions along. For starters, we assume that the quantities that we use units to represent can be added to each other if the units match: 1 meter + 1 meter = 2 meters. We also assume that these quantities can be multiplied by dimensionless quantities to get more dimensioned quantities in the same units: 5(2 m) = 10 m. We assume a bunch of intuitive facts that more or less boil down to not caring in what order any of these additions or multiplications are performed: 1 m + 2 m = 2 m + 1 m, 1 m + (2 m + 3 m) = (1 m + 2 m) + 3 m, (2 + 3)(10 m) = 2(10 m) + 3(10 m), 2(3(5 m)) = (2×3)(5 m), etc. Some of our units are compatible with each other, like meters and centimeters or minutes and seconds, and if we appropriately scale the quantities expressed in those units, they can also be added with each other. Finally, we assume that there's a ‘nothing’ associated with each of these families of compatible units, and that it doesn't matter which unit is used to express that nothing: 0 m = 0 cm. We expect it to behave like a proper nothing: 0 m + 5 m = 5 m, and 0(7 m) = 0 m.
Those assumptions are precisely what make each ‘family of compatible units’ into a vector space, for which the appropriate units are alternate choices of bases. I'll call the vector space for which meters are appropriate basis elements $\text{1-Space}$ (because it's only one dimension of space, not the three we often work with), and the vector space for which seconds are appropriate basis elements $\text{Time}$. These are both one-dimensional vector spaces, and as such vectors in these spaces only require a single number to represent them. It may seem that there isn't much point to calling them vectors at all, but we'll see how these concepts generalize to more interesting vector spaces later.
Here's a question: is a [hertz](https://en.wikipedia.org/wiki/Hertz) (Hz, or $\text{s}^{-1}$) a vector in $\text{Time}$? Say Alice types at a rate such that she presses a key five times in a second; we can say she types at 5 Hz. If Bob types at 4 Hz, then together, nine keys are being pressed in a second; Alice and Bob together type at 9 Hz. So it certainly seems like frequencies can be meaningfully added. You can check that the rest of the vector axioms end up holding as well, and so it does seem like a hertz is a vector in some vector space. But that space isn't $\text{Time}$; adding hertz and seconds isn't a thing we can ever make sense of.
However, there is a relationship between frequency and time: a frequency tells you how much of some dimensionless quantity you get for a given amount of time. If you type at 5 Hz for 1 s, you have pressed 5 keys. If you type at 5 Hz for 4 s, you have pressed 20 keys; the result scales linearly with the amount of time. So we can say that a frequency, in addition to being a vector in its own right, is also a linear function from a vector in $\text{Time}$ to dimensionless quantities. Linear functions from vector spaces to scalars are so important that they have gotten a bunch of special names: we call them linear forms, or one-forms, or dual vectors, or covectors. For *any* vector space $V$, the linear forms of $V$ form their own vector space called the dual space of $V$, denoted $V^\*$. So the vector space of frequencies is actually $\text{Time}^\*$, the dual space of $\text{Time}$.
When I was talking about compatible units before, I was vague about how to scale values when converting between units. Let's hammer that out now. We know that one meter equals 100 centimeters. So if we want to express 2 m in centimeters, we know to multiply 2 by 100 to get 200 cm. In general, in a vector space, if you have expressed a vector in some basis, and then you want to change your basis by dividing it by some amount, you have to multiply the numbers in your vector representation by that same amount. That's all that's happening here: as we change basis from 1 m to 1 cm (dividing by 100), we have to multiply the number 2 by the same number, 100, to get 200 cm from 2 m. (And of course, since dividing by $x$ is the same as multiplying by $\frac{1}{x}$, we can equivalently say that if you change basis by multiplying by some amount, you have to divide the numbers in your vector representation by that amount.)
The same logic applies for time: 2 min = 120 s because we divide a minute by 60 to get a second, and so we have to multiply 2 by 60 to get the equivalent representation in seconds. Similarly for frequency: 4 kHz = 4000 Hz, because 1 kHz = 1000 Hz. But now notice something interesting—frequency, remember, is a function from an amount of time to a quantity. If something is happening at 4 Hz for 60 seconds, it happens 240 times. What if we want to change units from seconds to minutes? We'd like to express 4 Hz, which is the same as 4 $\text{s}^{-1}$, in $\text{min}^{-1}$ instead. Well, the frequency itself isn't changing; it must still be the case that something occurring at this frequency for 60 seconds, or one minute, results in 240 events. That means that 4 Hz = 240 $\text{min}^{-1}$. So this time, multiplying the unit by 60 meant *also multiplying* the numeric representation of the frequency by 60. But this isn't mysterious at all; we did something different here than in the previous examples. Before, we were changing the same units we were using in our representation; this time, we were changing units of time and using units of frequency. In the language of linear algebra, we scaled our basis in one vector space, $\text{Time}$, and scaled the numeric representation of vectors in the *dual* vector space $\text{Time}^*$ by the same amount. This is also a general rule for any vector space and its dual.
The words ‘contravariant’ and ‘covariant’ just distinguish between these two rules. When there is a particular vector space of interest $V$, and we are sometimes working with vectors in that space and sometimes vectors in its dual space, we say that the vectors in $V$ are contravariant because we divide their numerical representations when we multiply basis vectors in $V$, and the dual vectors in $V^*$ are covariant because we multiply their numeric representations when we multiply basis vectors in $V$.
With all that out of the way, let's revisit Speedy's speed. When we say ‘60 cm/min’, we are expressing an exchange rate of a certain number of centimeters for every minute provided. Much like a frequency was a linear function from an amount of time to a dimensionless quantity, a speed is a linear function from an amount of time to an amount of one-dimensional space. If Speedy can travel 60 centimeters given one minute, Speedy can travel 120 centimeters given two minutes, and so on. Unlike frequency, the result of a speed when given an amount of time is not a dimensionless quantity, but another vector. So a speed is a linear function from vectors in one space to vectors in another.
The original math problem asked us to evaluate the function ‘60 cm/min’ on the vector ‘1 s’. We could do this by translating 1 s to $\frac{1}{60} \\,\text{min}$, since time vectors are contravariant with respect to time units. Then $\frac{1}{60} \cdot 60\\,\text{cm} = 1\\,\text{cm}$ is the result of the function and the answer to the question. But instead, we could translate 60 cm/min directly to cm/s. We ought to get the same answer either way, so we know that 60 cm/min must equal 1 cm/s. This implies that speed is covariant with respect to time units; when we scale down our unit of time, we scale down our representation of speed by the same amount. However, if we wanted to translate from cm/min to mm/min instead for some reason, we would have to scale *up* our representation of speed to 600 mm/min; speed is contravariant with respect to space units.
All this without having to talk about tensors yet! Let's fix that.
Speedy's cousin Sticky is treasure hunting and has mounted a compass-like metal detector to her shell. At Sticky's current location, for every centimeter Sticky creeps north, the needle of the detector dips down 0.017 mm and to the left 0.003 mm; for every centimeter Sticky creeps east, the needle moves to the left 0.058 mm. Assuming the detector has a linear response in Sticky's vicinity, how far does the needle move in total if she slides a total of 1.5 cm north and 0.5 cm west?
The intended solution to this problem is quite straightforward: the needle moves $1.5 \cdot -0.017\\,\text{mm} - 0.5 \cdot 0\\,\text{mm} = -0.0255\\,\text{mm}$ vertically, and $1.5 \cdot -0.003\\,\text{mm} - 0.5 \cdot -0.058\\,\text{mm} = 0.0245\\,\text{mm}$ horizontally. But just as with Speedy's speed, let's see what kind of mathematical object is represented by the response of Sticky's metal detector.
Just as a speed expresses an exchange rate between one-dimensional space and time, the numbers in this problem express exchange rates between two different two-dimensional spaces—namely, Sticky's displacement and the needle's displacement. The problem tells us to assume this relationship is linear, so it's natural to try to look at these numbers as a linear function from $\text{2-Space} \to \text{2-Space}$. Indeed, that's precisely what they are, and if you've done any linear algebra you might expect such a linear function between vector spaces to be represented as a matrix:
$$
\left\[
\begin{array}{cc}
-0.017 & 0 \\\\
-0.003 & -0.058
\end{array}
\right\]
$$
Instead, let's look at each element individually, and without dropping the units. The first element we're given is that the needle dips down 0.017 mm for every centimeter Sticky travels north—which is to say, each vector $1\\,\text{cm north}$ in the input contributes the vector $-0.017\\,\text{mm up}$ to the output. Notice that this component all by itself is a linear function from $\text{2-Space} \to \text{2-Space}$. So are the other two non-zero components given. So another way to represent this response is as the sum of three primitive linear functions, each of which consumes one basis vector in $\text{2-Space}$ and produces a multiple of a basis vector in the other $\text{2-Space}$. For a given basis for each copy of $\text{2-Space}$, there are only four total such combinations—and of course, these correspond to the four components of the matrix above. The symmetry of this representation suggests an alternate way to look at these primitive functions: as the Cartesian product of basis elements that represent a vector and basis elements that linearly accept a vector (or, in other words, basis linear forms).
This motivates the definition of a tensor product of vector spaces (over a common field of scalars): a new vector space generated by tuples of one basis element from each of the factor vector spaces. We write $V \otimes W$ for the tensor product of vector spaces $V$ and $W$, as well as $\vec e \otimes \vec f$ for the ‘basis tensor’ represented by the pair of basis vectors $\vec e$ and $\vec f$. The tensor product $\vec v \otimes \vec w$ for arbitrary vectors $\vec v$ and $\vec w$ is defined by extension, by distributing the $\otimes$ operator over the decomposition of $\vec v$ and $\vec w$ into sums of basis vectors. (Tensor products can also be defined without referencing specific bases, but that definition is a little more abstract and thus perhaps harder to connect to intuition. But it is useful to know that just like algebraic operations on vectors are true regardless of the basis being used, so are operations on tensors—the bases are only useful for writing specific vectors and tensors down.)
So Sticky's metal detector response can also be represented as a tensor in $\text{2-Space} \otimes \text{2-Space}^\*$. Specifically, it is the tensor
$$
\begin{multline}
-0.017\\,\text{mm up}\otimes(\text{cm north})^{-1} - 0.003\\,\text{mm right}\otimes(\text{cm north})^{-1}\\\\
\- 0.058\\,\text{mm right}\otimes(\text{cm east})^{-1}\\;\text{.}
\end{multline}
$$
In fact, any space of linear functions from a vector space $V$ to a vector space $W$ is equivalent to the tensor product $W \otimes V^\*$, and this generalizes to multilinear functions of multiple vector spaces and so on. By applying reasoning that should now be familiar, we can see that this representation of Sticky's detector is contravariant with respect to the space of the needle's displacement vectors, and covariant with respect to the space of Sticky's displacement vectors.
Finally, let's look at what happens when we solve the Sticky problem. Just like Speedy's speed was a function that we applied to an input vector to get an output vector, Sticky's detector response is also such a function. We simply have to apply the above tensor as a function to the vector $1.5\\,\text{cm north} - 0.5\\,\text{cm east}$ to get the answer. The way we do that is by looking at the tensor when decomposed into primitive linear functions operating on single basis vectors, matching each basis-vector-consuming term to the corresponding term in the input vector, and summing. Let's call the detector tensor $T$, and call the coefficients of the tensor $T_{\text{cm north}}^{\text{mm up}}$, $T_{\text{cm east}}^{\text{mm up}}$, $T_{\text{cm north}}^{\text{mm right}}$, and $T_{\text{cm east}}^{\text{mm right}}$ in what I hope is the obvious way. Following convention, I've placed the contravariant basis vectors in the upper position and the covariant basis vectors in the lower position. Similarly, let's represent Sticky's displacement as $v^{\text{cm north}}$ and $v^{\text{cm east}}$. Then the answer vector $\vec w$ was given by:
$$
\begin{align}
w\^{\text{mm up}} &= T_{\text{cm north}}\^{\text{mm up}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm up}} v\^{\text{cm east}} \\\\
w\^{\text{mm right}} &= T_{\text{cm north}}\^{\text{mm right}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm right}} v\^{\text{cm east}}
\end{align}
$$
Now we can save quite a bit of writing by abstracting over the basis vectors with indices. Let the index $s$ represent one of the Sticky displacement basis vectors $\\{\text{cm north}, \text{cm east}\\}$, and let the index $n$ represent one of the needle movement basis vectors $\\{\text{mm up}, \text{mm right}\\}$. Then we can replace the above equations with the general:
$$
w^n = \sum_d T_d^n v^d
$$
This operation of matching a covariant index with a contravariant index and summing is called contraction. It is extremely common, and also quite intuitive once you're comfortable with the underlying concepts—you're just feeding an output vector into a linear form which expects an input vector! Contraction is such a common operation in tensor algebra that there is a convention, [Einstein notation](https://en.wikipedia.org/wiki/Einstein_summation), which enables the $\sum$ to be omitted on terms where the same index variable is used once in an upper position and once in a lower position.
So in summary:
* Tensors are combinations of vectors from multiple vector spaces.
* Basis vectors are like physical units; tensors are like multidimensional quantities involving several units.
* A space of tensors can be covariant or contravariant in an underlying vector space, depending on whether it combines vectors or dual vectors on that space. The terminology just describes how the coefficients of tensors need to be adjusted if the basis of that vector space is scaled.
* But tensors covariant in a given vector space also represent linear functions consuming vectors from that space, while tensors contravariant in a vector space represent producing vectors in that space.
* A variable with upper and/or lower indices represents a coefficient from a tensor given some set of bases for the input vector spaces.
* Upper indices correspond to contravariant basis vectors and lower indices correspond to covariant basis vectors.
* Matching upper and lower indices are meant to be summed over, which corresponds to feeding contravariant vectors to linear forms, a.k.a. covariant vectors.

#1: Initial revision by $user avatar$ r~~‭ · 2021-11-02T00:57:32Z (over 3 years ago)

Copy Link

Raw

Markdown

A textbook homework problem might ask, ‘Speedy the snail creeps along at a steady pace of 60 cm per minute. How far does Speedy travel each second?’ The correct answer is, of course, one centimeter. Hopefully your teacher would also accept 10 mm or 0.01 m as equally valid answers; after all, they all represent the same physical length. But how do we know that? And how do we get that answer from 60 cm/min? What even is a quantity like 60 cm/min, mathematically?

Well, as it happens, 60 cm/min is a tensor in $\text{1-Space} \otimes \text{Time}^*$ expressed using the bases $((1\\, \text{cm}), (1\\, \text{min}^{-1}))$, where the first basis is contravariant relative to the units we usually use for physics, and the second basis is covariant relative to those units. All of the above questions follow from this. But I'm getting ahead of myself!

When we work with units such as meters, centimeters, minutes, seconds, etc., we bring a certain set of assumptions along. For starters, we assume that the quantities that we use units to represent can be added to each other if the units match: 1 meter + 1 meter = 2 meters. We also assume that these quantities can be multiplied by dimensionless quantities to get more dimensioned quantities in the same units: 5(2 m) = 10 m. We assume a bunch of intuitive facts that more or less boil down to not caring in what order any of these additions or multiplications are performed: 1 m + 2 m = 2 m + 1 m, 1 m + (2 m + 3 m) = (1 m + 2 m) + 3 m, (2 + 3)(10 m) = 2(10 m) + 3(10 m), 2(3(5 m)) = (2×3)(5 m), etc. Some of our units are compatible with each other, like meters and centimeters or minutes and seconds, and if we appropriately scale the quantities expressed in those units, they can also be added with each other. Finally, we assume that there's a ‘nothing’ associated with each of these families of compatible units, and that it doesn't matter which unit is used to express that nothing: 0 m = 0 cm. We expect it to behave like a proper nothing: 0 m + 5 m = 5 m, and 0(7 m) = 0 m.

Those assumptions are precisely what make each ‘family of compatible units’ into a vector space, for which the appropriate units are alternate choices of bases. I'll call the vector space for which meters are appropriate basis elements $\text{1-Space}$ (because it's only one dimension of space, not the three we often work with), and the vector space for which seconds are appropriate basis elements $\text{Time}$. These are both one-dimensional vector spaces, and as such vectors in these spaces only require a single number to represent them. It may seem that there isn't much point to calling them vectors at all, but we'll see how these concepts generalize to more interesting vector spaces later.

Here's a question: is a [hertz](https://en.wikipedia.org/wiki/Hertz) (Hz, or $\text{s}^{-1}$) a vector in $\text{Time}$? Say Alice types at a rate such that she presses a key five times in a second; we can say she types at 5 Hz. If Bob types at 4 Hz, then together, nine keys are being pressed in a second; Alice and Bob together type at 9 Hz. So it certainly seems like frequencies can be meaningfully added. You can check that the rest of the vector axioms end up holding as well, and so it does seem like a hertz is a vector in some vector space. But that space isn't $\text{Time}$; adding hertz and seconds isn't a thing we can ever make sense of.

However, there is a relationship between frequency and time: a frequency tells you how much of some dimensionless quantity you get for a given amount of time. If you type at 5 Hz for 1 s, you have pressed 5 keys. If you type at 5 Hz for 4 s, you have pressed 20 keys; the result scales linearly with the amount of time. So we can say that a frequency, in addition to being a vector in its own right, is also a linear function from a vector in $\text{Time}$ to dimensionless quantities. Linear functions from vector spaces to scalars are so important that they have gotten a bunch of special names: we call them linear forms, or one-forms, or dual vectors, or covectors. For *any* vector space $V$, the linear forms of $V$ form their own vector space called the dual space of $V$, denoted $V^\*$. So the vector space of frequencies is actually $\text{Time}^\*$, the dual space of $\text{Time}$.

When I was talking about compatible units before, I was vague about how to scale values when converting between units. Let's hammer that out now. We know that one meter equals 100 centimeters. So if we want to express 2 m in centimeters, we know to multiply 2 by 100 to get 200 cm. In general, in a vector space, if you have expressed a vector in some basis, and then you want to change your basis by dividing it by some amount, you have to multiply the numbers in your vector representation by that same amount. That's all that's happening here: as we change basis from 1 m to 1 cm (dividing by 100), we have to multiply the number 2 by the same number, 100, to get 200 cm from 2 m. (And of course, since dividing by $x$ is the same as multiplying by $\frac{1}{x}$, we can equivalently say that if you change basis by multiplying by some amount, you have to divide the numbers in your vector representation by that amount.)

The same logic applies for time: 2 min = 120 s because we divide a minute by 60 to get a second, and so we have to multiply 2 by 60 to get the equivalent representation in seconds. Similarly for frequency: 4 kHz = 4000 Hz, because 1 kHz = 1000 Hz. But now notice something interesting—frequency, remember, is a function from an amount of time to a quantity. If something is happening at 4 Hz for 60 seconds, it happens 240 times. What if we want to change units from seconds to minutes? We'd like to express 4 Hz, which is the same as 4 $\text{s}^{-1}$, in $\text{min}^{-1}$ instead. Well, the frequency itself isn't changing; it must still be the case that something occurring at this frequency for 60 seconds, or one minute, results in 240 events. That means that 4 Hz = 240 $\text{min}^{-1}$. So this time, multiplying the unit by 60 meant *also multiplying* the numeric representation of the frequency by 60. But this isn't mysterious at all; we did something different here than in the previous examples. Before, we were changing the same units we were using in our representation; this time, we were changing units of time and using units of frequency. In the language of linear algebra, we scaled our basis in one vector space, $\text{Time}$, and scaled the numeric representation of vectors in the *dual* vector space $\text{Time}^*$ by the same amount. This is also a general rule for any vector space and its dual.

The words ‘contravariant’ and ‘covariant’ just distinguish between these two rules. When there is a particular vector space of interest $V$, and we are sometimes working with vectors in that space and sometimes vectors in its dual space, we say that the vectors in $V$ are contravariant because we divide their numerical representations when we multiply basis vectors in $V$, and the dual vectors in $V^*$ are covariant because we multiply their numeric representations when we multiply basis vectors in $V$.

With all that out of the way, let's revisit Speedy's speed. When we say ‘60 cm/min’, we are expressing an exchange rate of a certain number of centimeters for every minute provided. Much like a frequency was a linear function from an amount of time to a dimensionless quantity, a speed is a linear function from an amount of time to an amount of one-dimensional space. If Speedy can travel 60 centimeters given one minute, Speedy can travel 120 centimeters given two minutes, and so on. Unlike frequency, the result of a speed when given an amount of time is not a dimensionless quantity, but another vector. So a speed is a linear function from vectors in one space to vectors in another.

The original math problem asked us to evaluate the function ‘60 cm/min’ on the vector ‘1 s’. We could do this by translating 1 s to $\frac{1}{60} \\,\text{min}$, since time vectors are contravariant with respect to time units. Then $\frac{1}{60} \cdot 60\\,\text{cm} = 1\\,\text{cm}$ is the result of the function and the answer to the question. But instead, we could translate 60 cm/min directly to cm/s. We ought to get the same answer either way, so we know that 60 cm/min must equal 1 cm/s. This implies that speed is covariant with respect to time units; when we scale down our unit of time, we scale down our representation of speed by the same amount. However, if we wanted to translate from cm/min to mm/min instead for some reason, we would have to scale *up* our representation of speed to 600 mm/min; speed is contravariant with respect to space units.

All this without having to talk about tensors yet! Let's fix that.

Speedy's cousin Sticky is treasure hunting and has mounted a compass-like metal detector to her shell. At Sticky's current location, for every centimeter Sticky creeps north, the needle of the detector dips down 0.017 mm and to the left 0.003 mm; for every centimeter Sticky creeps east, the needle moves to the left 0.058 mm. Assuming the detector has a linear response in Sticky's vicinity, how far does the needle move in total if she slides a total of 1.5 cm north and 0.5 cm west?

The intended solution to this problem is quite straightforward: the needle moves $1.5 \cdot -0.017\\,\text{mm} - 0.5 \cdot 0\\,\text{mm} = -0.0255\\,\text{mm}$ vertically, and $1.5 \cdot -0.003\\,\text{mm} - 0.5 \cdot -0.058\\,\text{mm} = 0.0245\\,\text{mm}$ horizontally. But just as with Speedy's speed, let's see what kind of mathematical object is represented by the response of Sticky's metal detector.

Just as a speed expresses an exchange rate between one-dimensional space and time, the numbers in this problem express exchange rates between two different two-dimensional spaces—namely, Sticky's displacement and the needle's displacement. The problem tells us to assume this relationship is linear, so it's natural to try to look at these numbers as a linear function from $\text{2-Space} \to \text{2-Space}$. Indeed, that's precisely what they are, and if you've done any linear algebra you might expect such a linear function between vector spaces to be represented as a matrix:

$$
\left\[
\begin{array}{cc}
-0.017 & 0 \\\\
-0.003 & -0.058
\end{array}
\right\]
$$

Instead, let's look at each element individually, and without dropping the units. The first element we're given is that the needle dips down 0.017 mm for every centimeter Sticky travels north—which is to say, each vector $1\\,\text{cm north}$ in the input contributes the vector $-0.017\\,\text{mm up}$ to the output. Notice that this component all by itself is a linear function from $\text{2-Space} \to \text{2-Space}$. So are the other two non-zero components given. So another way to represent this response is as the sum of three primitive linear functions, each of which consumes one basis vector in $\text{2-Space}$ and produces a multiple of a basis vector in the other $\text{2-Space}$. For a given basis for each copy of $\text{2-Space}$, there are only four total such combinations—and of course, these correspond to the four components of the matrix above. The symmetry of this representation suggests an alternate way to look at these primitive functions: as the Cartesian product of basis elements that represent a vector and basis elements that linearly accept a vector (or, in other words, basis linear forms).

This motivates the definition of a tensor product of vector spaces (over a common field of scalars): a new vector space generated by tuples of one basis element from each of the factor vector spaces. We write $V \otimes W$ for the tensor product of vector spaces $V$ and $W$, as well as $\vec{e} \otimes \vec{f}$ for the ‘basis tensor’ represented by the pair of basis vectors $e$ and $f$. By extension, $\vec{v} \otimes \vec{w}$ for arbitrary vectors $\vec{v}$ and $\vec{w}$ can be defined by distributing the $\otimes$ operator over the decomposition of $\vec{v}$ and $\vec{w}$ into sums of basis vectors. (Tensor products can also be defined without referencing specific bases, but that definition is a little more abstract and thus perhaps harder to connect to intuition.)

So Sticky's metal detector response can also be represented as a tensor in $\text{2-Space} \otimes \text{2-Space}^\*$. Specifically, it is the tensor $-0.017\\,\text{mm up}\otimes(\text{cm north})^{-1} - 0.003\\,\text{mm right}\otimes(\text{cm north})^{-1} - 0.058\\,\text{mm right}\otimes(\text{cm east})^{-1}$. In fact, any space of linear functions from a vector space $V$ to a vector space $W$ is equivalent to the tensor product $W \otimes V^\*$, and this generalizes to multilinear functions of multiple vector spaces and so on. By applying reasoning that should now be familiar, we can see that Sticky's detector is contravariant with respect to the basis $(1\\,\text{cm north}, 1\\,\text{cm east})$, and covariant with respect to the basis $(1\\,\text{mm up}, 1\\,\text{mm right})$.

Finally, let's look at what happens when we solve the Sticky problem. Just like Speedy's speed was a function that we applied to an input vector to get an output vector, Sticky's detector response is also such a function. We simply have to apply the above tensor as a function to the vector $1.5\\,\text{cm north} - 0.5\\,\text{cm east}$ to get the answer. The way we do that is by looking at the tensor when decomposed into primitive linear functions operating on single basis vectors, matching each basis-vector-consuming term to the corresponding term in the input vector, and summing. Let's call the detector tensor $T$, and call the coefficients of the tensor $T_{\text{cm north}}^{\text{mm up}}$, $T_{\text{cm east}}^{\text{mm up}}$, $T_{\text{cm north}}^{\text{mm right}}$, and $T_{\text{cm east}}^{\text{mm right}}$ in what I hope is the obvious way. Following convention, I've placed the contravariant basis vectors in the upper position and the covariant basis vectors in the lower position. Similarly, let's represent Sticky's displacement as $v^{\text{cm north}}$ and $v^{\text{cm east}}$. Then the answer vector $\vec{w}$ was given by:

$$
\begin{align}
w\^{\text{mm up}} &= T_{\text{cm north}}\^{\text{mm up}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm up}} v\^{\text{cm east}} \\\\
w\^{\text{mm right}} &= T_{\text{cm north}}\^{\text{mm right}} v\^{\text{cm north}} + T_{\text{cm east}}\^{\text{mm right}} v\^{\text{cm east}}
\end{align}
$$

Now we can save quite a bit of writing by abstracting over the basis vectors with indices. Let the index $s$ represent one of the Sticky displacement basis vectors $\\{\text{cm north}, \text{cm east}\\}$, and let the index $n$ represent one of the needle movement basis vectors $\\{\text{mm up}, \text{mm right}\\}$. Then we can replace the above equations with the general:

$$
w^n = \sum_d T_d^n v^d
$$

This operation of matching a covariant index with a contravariant index and summing is called contraction. It is extremely common, and also quite intuitive once you're comfortable with the underlying concepts—you're just feeding an output vector into a linear form which expects an input vector! Contraction is such a common operation in tensor algebra that there is a convention, [Einstein notation](https://en.wikipedia.org/wiki/Einstein_summation), which enables the $\sum$ to be omitted on terms where the same index variable is used once in an upper position and once in a lower position.

So in summary:

* Tensors are combinations of vectors from multiple vector spaces.
* Basis vectors are like physical units; tensors are like multidimensional quantities involving several units.
* A space of tensors can be covariant or contravariant in an underlying vector space, depending on whether it combines vectors or dual vectors on that space. The terminology just describes how the coefficients of tensors need to be adjusted if the basis of that vector space is scaled.
* But tensors covariant in a given vector space also represent linear functions consuming vectors from that space, while tensors contravariant in a vector space represent producing vectors in that space.
* A variable with upper and/or lower indices represents a coefficient from a tensor given some set of bases for the input vector spaces.
* Upper indices correspond to contravariant basis vectors and lower indices correspond to covariant basis vectors.
* Matching upper and lower indices are meant to be summed over, which corresponds to feeding contravariant vectors to linear forms, a.k.a. covariant vectors.

Communities

Post History