Q&A

# Getting backward of partial differentiation's chain rule

+5
−0

We know that Chain rule of partial derivatives is something just like this ($z$ is function of $x$ and $y$ variable and, $x$ and $y$ is function of $t$) :

$$\frac{dz}{dt}=\frac{\partial z}{\partial x} \frac{\partial x}{\partial t}+\frac{\partial z}{\partial y} \frac{\partial y}{\partial t}$$

Then, I was thinking to getting backward from the equation.

$$\frac{\partial z}{\partial x} \frac{\partial x}{\partial t}+\frac{\partial z}{\partial y} \frac{\partial y}{\partial t} =\frac{\partial z}{\partial t}+\frac{\partial z}{\partial t}=2\frac{\partial z}{\partial t}$$

Can't we write that $\frac{\partial z}{\partial t}=\frac{dz}{dt}$? If not, than what's the difference between them? If yes, than why I couldn't prove backward?

From my study of Calculus, I read that $\frac{d}{dt}$ is used for total differentiation. And, $\frac{\partial }{\partial t}$ is some differentiation under $\frac{d}{dt}$. But, I am not sure if that's correct.

Why does this post require moderator attention?
Why should this post be closed?

+4
−0

The difference between a partial derivative and a total derivative is that the partial derivative measures the change in a function when only one of its arguments varies, while the total derivative measures the change in a function when all of its arguments vary. The $\frac{\partial z}{\partial x}$ notation is somewhat misleading in this way, because you can't detach the top from the bottom and still have the $\partial$ symbol make sense—$\partial z$ only has a meaning when you know which part of the change in $z$ is being measured, the part obtained by varying $x$ or the part obtained by varying $y$.

So your error is in simplifying $\frac{\partial z}{\partial x}\frac{\partial x}{\partial t}$ to $\frac{\partial z}{\partial t}$, even though the notation suggests that this should be valid. The product $\frac{\partial z}{\partial x}\frac{\partial x}{\partial t}$ actually represents just the change in $z$ per unit change in $t$ that is attributed to variation in $x$. If you add that quantity to the change in $z$ per unit change in $t$ that is attributed to variation in $y$, you get the total change in $z$ per unit change in $t$, which is the total derivative.

Why does this post require moderator attention?

+3
−0

Your main error is to treat the derivatives as fractions. From the definition, they are not fractions, they are limits of fractions.

Now for total derivatives treating them like fractions generally gives correct results (at least I never have seen any case where it fails), therefore one might consider this nitpicking, but the difference shows quite clearly as soon as you get to partial derivatives, and your calculation is one example. Indeed, a more striking example is the following relation, which also illustrates another point I'll mention below (and yes, this is exactly how it was written back in the math lecture I attended): $$\frac{\partial y}{\partial x} \frac{\partial z}{\partial y} \frac{\partial x}{\partial z} = -1 \tag 1$$ If partial derivatives could be treated like fractions, then clearly in the above expression, all terms would cancel and the result would be $+1$.

The other point this equation illustrates is that the partial derivative notation does not give the full information. The missing information is generally inferred from the context (which, of course, I've not given for the above equation, in order to illustrate the point; I'll give it below).

Let's go back to the partial derivative chain rule formula: $$\frac{\mathrm dz}{\mathrm dt} = \frac{\partial z}{\partial x} \frac{\mathrm dx}{\mathrm dt} + \frac{\partial z}{\partial y} \frac{\mathrm dy}{\mathrm dt} \tag 2$$ You may notice that this differs from the form you gave by having total derivatives also on the right hand side. In this form, it is also more readily visible that you cannot just cancel things out.

Now what are the hidden assumptions here:

1. $z$ is a function of two arguments, the first one we label $x$, and the second one we label $y$.

2. Furthermore, we have two functions $x$ and $y$ (which we name the same as the arguments of $z$, despite them being very different objects), whose argument we name $t$.

3. From those we form a different function, also named $z$, which takes one argument, named $t$, which is formed as follows:

• We pass the single argument $t$ to the functions $x$ and $y$.

• The results of those function applications are then used as arguments to the two-argument function $z$, matching the function names with the argument names.

Doing this in a mathematically clean way, we would write it as follows:

We have one function of two arguments, $$z:\mathbb R^2\to\mathbb R, (u,v)\mapsto z(u,v)$$ and two functions of one argument each, \begin{align} x: \mathbb R\to\mathbb R&, t\mapsto x(t)\\ y: \mathbb R\to\mathbb R&, t\mapsto y(t) \end{align} We now form a new function $h:\mathbb R\to\mathbb R$ defined by $$h: \mathbb R\to\mathbb R, t\mapsto z(x(t),y(t))$$.

Now with those definitions, the chain rule formula reads: $$\frac{\mathrm df}{\mathrm dt} = \frac{\partial z}{\partial u} \frac{\mathrm dx}{\mathrm dt} + \frac{\partial z}{\partial v} \frac{\mathrm dy}{\mathrm dt} \tag 3$$ Written this way, there clearly isn't anything to cancel.

Now why would we then normally write things in the form $(2)$ instead of $(3)$? Well, for one, $(2)$ needs far fewer letters, and moreover by the matching of letters it is far easier to memorize and to get right.

But on the other hand, it also is a conceptual thing: In particular in physics, we don't usually thing in terms of functions, but in term of relations. That is, we have a relation between $x,y,z$ which happens to in a way that given $x$ and $y$, we can infer $z$, that is, $z = z(x,y)$, where I again abused notation in that I gave the quantity and the function the same name. That is, in physics, the functions are generally implicitly given, and the function arguments are also generally inferred from the context.

Indeed, there is one area in physics where context is not generally sufficient to determine the function, and that is thermodynamics. Therefore in thermodynamics, it is customary to write the other arguments (those that are not derived for) as index to the partial derivative. And thus $\left(\frac{\partial S}{\partial T}\right)_{V,N}$ and $\left(\frac{\partial S}{\partial T}\right)_{p,N}$ are not the same. Written in “thermodynamics notation”, $(2)$ would read: $$\frac{\mathrm dz}{\mathrm dt} = \left(\frac{\partial z}{\partial x}\right)_{y} \frac{\mathrm dx}{\mathrm dt} + \left(\frac{\partial z}{\partial y}\right)_{x} \frac{\mathrm dy}{\mathrm dt}$$

On the question of replacing total with partial derivative, if you have a mathematical function of one argument, then in that context total and partial derivative are indeed the same (as there are no other arguments, the question what you do with them is obviously moot). But imagine you've got some quantity that depends on (one-dimensional) position and time, $Q = Q(x,y)$ and some particle moving according to $x = x(t)$ and $Q(t) = Q(x(t),t)$ is the quantity at the particle's position at time $t$. Then the rate of change (time derivative) reads $$\frac{\mathrm dQ}{\mathrm dt} = \frac{\partial Q}{\partial x} \frac{\mathrm dx}{\mathrm dt} + \frac{\partial Q}{\partial t}$$ As you can see, this formula contains both the total and the partial derivative, and both are clearly not the same. This is because the partial derivative tells you how $Q$ changes at the current position, while the total derivative also takes into account the indirect effect of the changing position.

And now I can also lift the mystery of eq. $(1)$: This is about a relation $f(x,y,z)=0$ in which each of $x,y,z$ can be written as function of the other two: $$x = x(y,z), y = y(x,z), z = z(x,y)$$ Using the “thermodynamic notation”, $(1)$ would read

$$\left(\frac{\partial y}{\partial x}\right)_z \left(\frac{\partial z}{\partial y}\right)_x \left(\frac{\partial x}{\partial z}\right)_y = -1$$

In that notation, it looks less as if cancelling anything were possible.

Why does this post require moderator attention?

+2
−0

Traditional mathematical notation for calculus (both integral and differential) is rather incoherent. I don't think there exists a write-up providing systematic rules that would allow you to correctly and unambiguously parse this kind of notation, i.e. the kind of notation used in a typical undergrad multivariable calculus textbook. By "systematic", I mean you could write a program to do it (and, for simplicity, I'll say the input comes in the form of a subset of MathJaX, I'm not asking for optical character recognition). By "correctly and unambiguously", I mean that program produces one result and the result is the one intended by the author (modulo typos). I'm also not just talking about inconsistent notation between authors/books, though that also doesn't help. I'm imagining a scenario where you randomly select a multivariable calculus textbook and make a system that only needs to (correctly) handle notation in the style of that book. Certainly something like this for a "traditional" notation isn't common knowledge. If you try to build a formal system for this notation, you quickly find that it is, at best, non-obvious and unusual.

The fact that traditional notation is unclear doesn't mean that there don't exist other notations that are clear. As an extreme example, the notation used in the book Structure and Interpretation of Classical Mechanics is undoubtedly unambiguous as it's literally executable code. Of course, it's also quite far from traditional notation. I recommend the preface and the footnote quoting Spivak's Calculus on Manifolds for more specific and authoritative critiques of traditional notation. (The footnote talks about literally exactly this example of the chain rule.)

The starting point for most approaches to clearer notation is the fact that semantically differentiation acts on functions. Before continuing here, another common conflation is a function, $f$, with an open expression $f(x)$. To say that differentiation acts on functions means (syntactically) that it should act on $f$ not $f(x)$. This will be illustrated. The simplest example is differentiating a real-valued real function. We might write the derivative of such a function, $f$, as $Df$. The result of this is also a function. So, for example, if $f(x) = x^2$ then $(Df)(x) = 2x$. The differential operator $D$ takes highest precedence so $(Df)(x) = Df(x)$. This should be easy to remember because it doesn't make sense to apply $D$ to a real number $f(x)$. Note how since $Df$ is a function, we need to apply it to an argument. That argument could be anything standing for a real number, in particular, $Df(y) = 2y$ and $Df(3) = 6$. To really make this notation usable, it helps to have a notation for anonymous functions, e.g. $D(x \mapsto x^2) = x \mapsto 2x$.

However, this one-dimensional case is too simple and, as we'll see, a bit misleading. One approach for handling multiparameter differentiation, i.e. partial derivatives, is to have the differential operator indicate which parameter it's operating on, e.g. if $f$ took two arguments, then you might write $\partial_1 f$ to indicate (partial) differentiation with respect to the first argument and $\partial_2 f$ with respect to the second. However, I think it's cleaner and more effective to talk about directional derivatives. In fact, I think directional derivatives make a nice powerful, yet approachable basis for differential calculus. There are other rather elegant foundations too, such as taking the vector derivative as primitive, but that takes a bit more setup and often you end up working with directional derivatives anyway.

So, a (potentially vector-valued) function $f$ taking $n$ real arguments can instead be viewed as a function taking a single argument that is an $n$-dimensional vector. That is, we can think of $f(x, y, z)$ as being $f(x\mathbf e_x + y\mathbf e_y + z\mathbf e_z)$ where $\mathbf e_x$, $\mathbf e_y$, $\mathbf e_z$ are orthonormal basis vectors of $\mathbb R^3$, in this case. Given a function $f : \mathbb R^n \to \mathbb R^m$ and an $n$-dimensional vector $\mathbf v$, we can define the directional derivative of $f$ in the direction $\mathbf v$ as $$\partial_{\mathbf v}f(\mathbf x) = \lim_{\epsilon \to 0}\frac{f(\mathbf x + \epsilon\mathbf v) - f(\mathbf x)}{\epsilon}$$ $\partial_i f$ can now be identified with $\partial_{\mathbf e_i} f$ where we (arbitrarily) label the basis vectors $\mathbf e_1, \dots, \mathbf e_n$. The $D$ operator from above is also recovered as the $n=1$ case of $\partial_{\mathbf e_1}$.

For your specific case we have the functions $\mathbf x : \mathbb R \to \mathbb R^2$ and $z : \mathbb R^2 \to \mathbb R$ where $\mathbf x(t) = x(t)\mathbf e_1 + y(t)\mathbf e_2$. $z \circ \mathbf x : \mathbb R \to \mathbb R$ so we can apply $D$ to it, i.e. $D(z \circ \mathbf x)$ makes sense. The chain rule is then saying $$D(z \circ \mathbf x)(t) = \partial_1 z(\mathbf x(t))(\mathbf e_1 \cdot (D\mathbf x)(t)) + \partial_2 z(\mathbf x(t))(\mathbf e_2 \cdot (D\mathbf x)(t))$$ This notation makes your error harder to make and easier to understand. Namely, $\partial z/\partial t$ doesn't make sense since it means differentiate $z$, a function defined on $\mathbb R^2$, in the direction of a vector in $\mathbb R^1$. This doesn't make sense vectors in $\mathbb R^1$ aren't vectors in $\mathbb R^2$. (They can certainly be embedded into $\mathbb R^2$ but in many distinct ways.)

One thing you might have noticed is I didn't define a "total derivative". Here "total" derivative is to disambiguate between whether we mean differentiation of $z$ or of $z \circ \mathbf x$. (This is more ambiguous when $z$ is itself explicitly a function of $t$ so that partial differentiation of $z$ by $t$ also makes sense.) The problem is caused by the common conflation of $f$ with $f(x)$ I mentioned before. Now it becomes ambiguous whether $z$ means $z(x, y)$, or $z(x(t), y(t))$ (or $z(x(t), y)$ or $z(x, y(t))$ for that matter).

All this said, I'm not saying you should never use traditional notation or that you should only use this notation. Instead, when traditional notation seems confusing, falling back to a function-oriented approach and the directional derivative can help clear things up. Also, it's important to understand that a huge amount of relevant information is left implicit in traditional notation.

1. This distinction is even clearer when we consider the more general case of functions between arbitrary manifolds. In that case, we need to consider vectors in the tangent spaces for those manifolds and those could be totally different. ↩︎

2. We can define what total derivative actually means, and it only further illustrates the ambiguity and hidden complexity of traditional notation. Here's a simple example. Let $f(x, y) = x^2 + y^2$. The idea is that $x$ and $y$ will represent components of trajectory. The total derivative of $f$ is then a function $g(x, y, u, v) = 2xu + 2yv$. The idea here is that $$\frac{df(x(t), y(t))}{dt} = g(x(t), y(t), \frac{dx}{dt}(t), \frac{dy}{dt}(t))$$ To really describe what's happening in general leads to the notion of a jet bundle which is usually not discussed, even in simplified form, unless you go deep into certain fields. For the purposes of the discussion here, if we actually want the total derivative of $x^2 + y^2$ which would traditionally be written something like $2x\frac{dx}{dt} + 2y\frac{dy}{dt}$ and not just the resulting function of $t$, then $\frac{dx}{dt}$ and $\frac{dy}{dt}$ are really effectively new parameters. Technically, they are constrained to be values of some derivative of some trajectory that goes through $(x, y)$, but, for the Euclidean plane, that's no constraint at all. With the notation in this answer, the total derivative of $f$ is $(\mathbf x, \mathbf v) \mapsto \partial_{\mathbf v}f(\mathbf x)$. That is, it's the function which takes the point at which to evaluate the directional derivative of $f$ and also the direction (upon which it depends linearly). This notation makes it clear that there are extra parameters involved and what they mean. ↩︎

Why does this post require moderator attention? 