Maxwell’s Equations

It’s of no use whatsoever, this is just an experiment that proves Maestro Maxwell was right – we just have these mysterious electromagnetic waves that we cannot see with the naked eye, but they are there.

– Heinrich Hertz, when asked about applications after demonstrating the existence of EM
waves

Particles and Classical Fields: The Action Principle

Classical electromagnetism is, in essence, an exercise in geometry and calculus, or more succinctly, vector calculus. In this article I will lead the reader through an abridged derivation of Maxwell’s equations from a purely mathematical perspective, following Susskind and Friedman’s The Theoretical Minimum: Special Relativity and Classical Field Theory [1]. These equations historically came about as a collation by Maxwell of a number of empirical observations (Faraday’s Law, Ampere’s Law, Coulomb’s Law), along with an inspired theoretical contribution known as the displacement current. I presume most readers, having reached the upper-division undergraduate level in physics, already know this story. I share Lenny Susskind’s opinion that this approach to Maxwell’s equations hinders the deep understanding one can gain rather quickly by instead constructing them through definitions and mathematics alone. For this reason, I’m going to put such a construction up on my website, although it is perhaps unnecessary. For the reader familiar with Maxwell’s equations who simply wants to brush up on their application to EM waves, check out other Learn pages such as the page on the dielectric permittivity or on the question of “what is light?

In preparation for the remainder of this article, I’m going to introduce some notation and mathematical formalism for describing classical field theory through the lens of Lagrangian mechanics. This is necessary for the derivation of Maxwell’s equations starting from a simple field theory. Many readers may not be familiar with Lagrangian mechanics, so I’ll attempt to gently introduce the ideas as I go. The central idea of Lagrangian mechanics (indeed, the central idea of basically all physics), is the principle of least action, which states that the trajectory for a given system with certain start and end points in spacetime is the trajectory which makes the action \A stationary (i.e. minimizes it). The action is an integral along the trajectory through spacetime of the Lagrangian \La, which for non-relativistic classical mechanical systems is equivalent to the kinetic energy minus the potential energy; T-V.

Joseph-Louis Lagrange

The reader may have encountered this form of the Lagrangian before, and like myself, may be puzzled by the question of where in the world it comes from (this sort of a priori presentation of the Lagrangian always bothered me).  The simple, unsatisfying answer is that the form of the Lagrangian often doesn’t really come from anything in the sense that it cannot be derived. By definition, a valid Lagrangian is the function that, when substituted into the Euler-Lagrange equations, produces the correct equations of motion, and that’s it. When \La=T-V is used in a non-relativistic system (more specifically, on a Riemannian manifold), the Euler-Lagrange equations return F=ma which is the correct equation for describing the dynamics of said system, so that is the Lagrangian we use. For this simple case, you can `derive’ the Lagrangian by postulating Galilean relativity (see Landau and Lifshitz for the details [2]), but in general this approach cannot be used to generate a Lagrangian that correctly describes your system. In short, like always, we physicists are slaves to the natural world. If our choice of Lagrangian produces equations of motion that match our observations, then it is the `correct’ Lagrangian, and we are left with a convenient mathematical framework for describing our system.

The action principle itself is likewise a somewhat mystical framework in physics. The principle has its intellectual roots in Fermat’s principle, which was Pierre de Fermat’s 1662 postulate that light takes the path through space that can be traveled in the least amount of time. Maupertuis and Euler would nearly simultaneously (a bit of a scandal at the time) generalize Fermat’s principle to what we would recognize as the principle of least action in 1744, planting a flag in a new frontier for determining the equations of motion for a given system.

The general idea is the following: the action has the form

    \begin{gather*}\A = \int_a^b \La\s \Te{d}t,\end{gather*}

between two points in time a and b. We then use variational calculus to `try’ every trajectory between a and b and pick the one that makes the action stationary. The calculus of variations tells us that a trajectory makes the action stationary if and only if

(1)   \begin{gather*}\pder[\La]{q_i}-\der{t}\pder[\La]{\dot{q}_i} = 0,\end{gather*}

where q_i are the generalized coordinates and \dot{q}_i are the time derivatives of each generalized coordinate (i.e. the generalized velocities). The Lagrangian \La is taken to be a function of q_i, \dot{q}_i, and the time t. This equation is known as the Euler-Lagrange equation, and it holds for each generalized coordinate q_i. In most cases, the q_i are just your normal spatial coordinates e.g. x,y,z. The \dot{q}_i would then be the time derivative of each of the spatial coordinates, and it is important to note that these constitute additional independent variables of the system. p_i = \pder[\La]{\dot{q}_i} are known as the components of the generalized momentum of the system. Unsurprisingly, for non-relativistic mechanics, \Bf{p}=m\Bf{v}.

Let’s just look at one very simple example, a particle of constant mass m in a gravitational potential described by \phi(x) = gx. Then, the Lagrangian is just \La = T-V = m\dot{x}^2/2-mgx. If we substitute this into the Euler-Lagrange equation (Eq. 1), we obtain

    \begin{gather*}-mg-\der{t}(m\dot{x}) = 0\\ m\ddot{x} = -mg,\end{gather*}

which we easily recognize as the equation of motion for a particle in a gravitational field.

Choosing the Action, and Physics-Flavored Notation

So the question remains, how do we choose the Lagrangian in general? Like I said, there’s no way to know if a given Lagrangian will produce the correct equations of motion for a particular physical system until you substitute it into the Euler-Lagrange equation, but there are certain rules it must adhere to. The laws of physics must be the same for all observers, so the action must be invariant to changes in reference frame. Taking relativistic effects into account, the proper action for a free particle is

    \begin{gather*}\A = -\int m \Te{d}\tau\end{gather*}

where \int\de\tau is the `proper time’, or the integral of the differential line element that describes the length of an infinitesimal time-like interval in spacetime which is invariant under the Lorentz transform (I realize this sentence may sound like the adults in Charlie Brown, but don’t worry, the jargon is unimportant and will be at least partially explained). The Lorentz transform is how you boost from one reference frame to another while maintaining the same speed of light for all observers. We measure distances in spacetime in terms of this Lorentz-invariant s^2, which is given by

    \begin{gather*}s^2 = -t^2+x^2+y^2+z^2=X^{\mu}X_{\mu},\end{gather*}

where x,y,z,t are the spacetime coordinates you know and love, and X^{\mu}/X_{\mu} are known as the contravariant/covariant spacetime 4-vectors. The only difference between contravariant and covariant 4-vectors in this context is that the sign of the time-like coordinate switches, so X^{\mu}=(-t,\s x,\s y,\s z), and X_{\mu}=(t,\s x,\s y,\s z). Since \tau is used to measure time-like distances for which t^2>x^2+y^2+z^2, we define it as \tau^2=-s^2. The term X^{\mu}X_{\mu} is an example of the Einstein summing convention that we use for 4-vectors. The use of superscripts and subscripts tells us that, contrary to an ordinary dot product, when we multiply the time-like coordinate (the 0th coordinate), we apply a negative sign. This is known as the Minkowski metric. In practice, repeated indices in the superscripts and subscripts imply a sum over all 4 coordinates, particularly if the summing index is \mu, by convention. If the summing index is some other index like i, then you should assume it is a dot product over just the three spatial coordinates; X^iX_i =X_iX_i = \Bf{x}\cdot\Bf{x} (notice that we could move the superscript to the subscript position since the dot product no longer includes the time-like coordinate). To denote a differential change in spacetime, we use the 4-vector \de X_\mu = (\de t,\s \de x,\s \de y,\s \de z). The differential relativistic line element is thus

    \begin{gather*}\de s^2 = \de X^{\mu}\de X_{\mu} = -\de t^2 + \de x^2 + \de y^2 + \de z^2.\end{gather*}

All this to say that, in general, we try to construct a Lagrangian out of Lorentz-invariant scalars like \int \de \tau, often incorporating 4-vector products like X^{\mu}X_{\mu}.

Maxwell’s Equations: A Purely Geometrical Approach

The Lorentz Force Law

Let’s build a field theory for electromagnetism using the ideas we just touched on. We begin with a 4-vector potential A^{\mu}. The time-like coordinate of A^{\mu}, A^0, is best understood as the scalar potential of the electric field divided by the speed of light; \phi/c. The remaining three space-like coordinates of A^{\mu} can be recognized as the magnetic vector potential \Bf{A}. We will actually be compelled to make this determination later. Also, as a note, we’re not going to worry about the unit system at the moment. This is a convention often used in physics called `natural’ units where c = 1. This is a form of nondimensionalization where velocity is measured in units of c. We are trying to illuminate the basic geometric structures of electromagnetism right now, so the dimensions are not important. As an example, 1-\dot{\Bf{x}}^2 is the same as 1-\dot{\ti{\Bf{x}}}^2/c^2, where \ti{\Bf{x}} is the conventionally dimensionalized spatial coordinate vector. We will come back and insert the dimensions later. Let’s choose for our action integral the following quantity:

    \begin{gather*}\A = -\int_a^b m \sqrt{1-\dot{\Bf{x}}^2}\de t+q\int_a^b \de X^{\mu}A_{\mu}\end{gather*}

where m and q are scalar constants. The second term is not yet in the form that we need to define the Lagrangian (the action is an integral over time), so we make a slight adjustment;

    \begin{align*}q\int_a^b \de X^{\mu}A_{\mu} =&\s q\int_a^b \der[X^{\mu}]{t}A_{\mu}dt\\=&\s q\int_a^b (-A_0 + \dot{\Bf{x}}\cdot\Bf{A})dt\end{align*}

since \der[X^0]{t} = \der[(-t)]{t} = -1. So then our Lagrangian in conventional vector notation is

    \begin{gather*}\La = -m\sqrt{1-\dot{\Bf{x}}}-qA_0+q \dot{\Bf{x}}\cdot\Bf{A}.\end{gather*}

Now let’s find out if this Lagrangian produces equations of motion that make sense! Of course, we do this by substituting \La into Eq. 1:

    \begin{gather*}\pder[\La]{X^i}-\der{t}\pder[\La]{\dot{X}^i} = 0\\-q\pder[A_0]{X^i}+q\dot{X}^j\pder[A_j]{X^i}-\der{t}\left[m\frac{\dot{X}_i}{\sqrt{1-\dot{\Bf{x}}^2}}+qA_i\right]= 0\\m\der{t}\frac{\dot{X}_i}{\sqrt{1-\dot{\Bf{x}}^2}} = -q\pder[A_i]{t}-q\pder[A_i]{X_j}\dot{X}_j-q\pder[A_0]{X^i}+q\dot{X}^j\pder[A_j]{X^i},\end{gather*}

where we were careful to take into account the implicit time dependence of A_i (remember: \dot{X}_j = \der[X_j]{t}). I know this notation might be making your head spin, but don’t worry, I will soon relate things back to standard vector notation. Let’s proceed by grouping like terms.

    \begin{gather*}m\der{t}\frac{\dot{X}_i}{\sqrt{1-\dot{\Bf{x}}^2}} = -q\left(\pder[A_0]{X^i}+\pder[A_i]{t}\right)+q\dot{X}^j\left(\pder[A_j]{X^i}-\pder[A_i]{X^j}\right)\end{gather*}

Does this look familiar to you yet? Presume we define

    \begin{gather*}E_i = -\left(\pder[A_0]{X^i}+\pder[A_i]{t}\right),\end{gather*}

or in vector notation;

(2)   \begin{gather*}\Bf{E} = -\left(\nabla A_0+\pder[\Bf{A}]{t}\right).\end{gather*}

Then the first term is exactly what we expect for the electric field portion of the Lorentz force law (recall that A_0 = \phi/c).

So then how can we convince ourselves that the second term is q\Bf{v}\times\Bf{B}? Those familiar with the magnetic vector potential \Bf{A} will already see the pattern. Consider the i=1, or to use vector notation, the x-component for the right-most term:

    \begin{gather*}q\dot{y}\left(\pder[A_y]{x}-\pder[A_x]{y}\right) + q\dot{z}\left(\pder[A_z]{x}-\pder[A_x]{z}\right),\end{gather*}

The x component of \Bf{v}\times\Bf{B} is v_yB_z-v_zB_y, so then we can see that if we choose

    \begin{align*}B_x =&\s \pder[A_z]{y}-\pder[A_y]{z}\\B_y =&\s \pder[A_x]{z} - \pder[A_z]{x}\\B_z =&\s \pder[A_y]{x}-\pder[A_x]{y},\end{align*}

otherwise known as

(3)   \begin{gather*}\Bf{B} = \nabla\times\Bf{A},\end{gather*}

then the last term is indeed what we expect and our equation of motion becomes

(4)   \begin{gather*} m\der{t}\frac{\dot{\Bf{x}}}{\sqrt{1-\dot{\Bf{x}}^2}} = e\Bf{E} + q\Bf{v}\times\Bf{B},\end{gather*}

which most readers will recognize as the Lorentz force law. The term on the left is just the relativistic version of m\ddot{\Bf{x}}. If we chose mv^2/2 for the free particle Lagrangian, then it would be in its more familiar form. Thus, we have shown how the Lorentz force law can actually be derived from an action principle, and we have come up with definitions for \Bf{E} and \Bf{B} in terms of a 4-vector potential A^{\mu}.

Two of Maxwell’s Equations from Definitions

We can actually already get two of Maxwell’s equations for free solely from our definitions of the electric and magnetic fields. It is a fact of vector calculus that the divergence of the curl of any vector is equal to 0. Thus,

    \begin{gather*}\Div\Bf{B} = \Div(\curl\Bf{A})\\ \Div\Bf{B} = 0,\end{gather*}

which is the Maxwell equation that tells us that magnetic monopoles do not exist. If we take the curl of the electric field given our definition in Eq. 2 which noting that the curl of a gradient is likewise equal to zero, we obtain

    \begin{gather*}\curl\Bf{E} =  -\curl\nabla A_0-\curl\pder[\Bf{A}]{t}\\\curl\Bf{E} = -\pder{t}(\curl\Bf{A})\\\curl\Bf{E} = -\pder[\Bf{B}]{t},\end{gather*}

also known as Faraday’s law.

Maxwell’s Equations with Source Terms

You might have noticed that the two equations we got from definitions were both homogeneous. The remainder of Maxwell’s equations are nonhomogeneous, with source terms equal to the current density and charge density for Ampere and Gauss’ laws, respectively. In recognition of this, we introduce the 4-vector J^{\mu} = (\rho,\s J_x,\s J_y,\s J_z). With this structure we can write the whole charge continuity equation as

    \begin{gather*}\pder[J^{\mu}]{X^{\mu}} = 0,\end{gather*}

noting that since the repeated indices are both superscripts, there is no negative sign attached to the time-like coordinate.

To finish deriving Maxwell’s equations, we have to introduce yet another new Lagrangian that we again cannot justify except to say that it has some properties that we like and it produces the correct equations. For those who are curious, the desirable properties are Lorentz invariance as before, locality, and gauge invariance. I’ll leave it to the reader to learn what we mean by these (Susskind/Friedman do a good job explaining them in ref. [1]), for now we will just push forward. The Lagrangian that will get us to the finish line is

    \begin{gather*}\La = -\frac{1}{4}\left(\pder[A_{\nu}]{X^{\mu}}-\pder[A_{\mu}]{X^{\nu}}\right)\left(\pder[A_{\nu}]{X^{\mu}}-\pder[A_{\mu}]{X^{\nu}}\right) + J^{\mu}A_{\mu}.\end{gather*}

The quantity in the parentheses is known as the field tensor F^{\mu\nu}, which is an antisymmetric tensor with the components of the electric and magnetic field off its diagonal (go ahead and check this for yourself using Eq. 2 and Eq. 3), and the repeated indices tell you this is a big double sum over the two indices \mu and \nu. Thus the Lagrangian can be written more compactly as \La = -\frac{1}{4}F^{\mu\nu}F_{\mu\nu}+J^{\mu}A_{\mu}. One more thing: since we will be treating the fields in this Lagrangian as independent coordinates in their own right, we need to modify the Euler-Lagrange equations to the form

(5)   \begin{gather*}\pder{X^{\mu}}\pder[\La]{A_{\nu,\mu}}-\pder[\La]{A_{\nu}}=0\end{gather*}

where A_{\nu,\mu} = \pder[A_{\nu}]{X^{\mu}} to keep things clean. This is the Euler-Lagrange equation in the context of field theory when the fields themselves are treated as dynamical variables, not X^{\mu}. I’ll leave it to the reader to review how this version of the equation is derived (UT has nice public lecture notes on this). Now we just need to compute the relevant derivatives, starting with

    \begin{gather*}\pder[\La]{A_{\nu,\mu}} =\s \pder[\La]{\left(\pder[A_{\nu}]{X^{\mu}}\right)}.\end{gather*}

For this derivative, we obviously only need to pay attention to the term on the left  (the big sum). The terms in this sum essentially look just like the term in the Lagrangian, but with every possible permutation of \mu and \nu. The derivative is being taken with respect to a specific pair of indices, however, so most of the terms in the big sum drop out when taking the derivative. Say we are looking at the derivative with respect to \pder[A_x]{y}. The only terms in the big sum that contain this quantity sum up to

    \begin{gather*}-\frac{1}{2}\left(\pder[A_x]{y}-\pder[A_y]{x}\right)^2,\end{gather*}

so

    \begin{gather*}\pder[\La]{\left(\pder[A_{x}]{y}\right)} = \pder{\left(\pder[A_{x}]{y}\right)}\left[-\frac{1}{2}\left(\pder[A_x]{y}-\pder[A_y]{x}\right)^2\right] = \pder[A_y]{x}-\pder[A_x]{y}.\end{gather*}

This of course holds for all the other pairs of indices, so the left term in the field theory Euler-Lagrange equation 5 is

    \begin{gather*}\pder{X^{\mu}}\pder[\La]{A_{\nu,\mu}} = \pder{X^\mu}\left(\pder[A_{\nu}]{X^{\mu}}-\pder[A_{\mu}]{X^{\nu}}\right).\end{gather*}

The other term in the Lagrangian is easy to evaluate;

    \begin{gather*}\pder[\La]{A_{\nu}} = J^{\nu},\end{gather*}

Finally yielding our equation of motion:

    \begin{gather*}\pder{X^\mu}\left(\pder[A_{\nu}]{X^{\mu}}-\pder[A_{\mu}]{X^{\nu}}\right) = J^{\nu}.\end{gather*}

It may not seem like it, but both of the remaining Maxwell equations are represented here. To see Gauss’ law, we look at the \nu=0 component, recalling Eq. 2:

    \begin{gather*}\pder{X^\mu}\left(\pder[A_{0}]{X^{\mu}}-\pder[A_{\mu}]{t}\right) = \rho\\\Div\Bf{E} + \pders[A_0]{t}-\pders[A_0]{t} = \rho\\\Div\Bf{E}  = \rho.\end{gather*}

We can see that the units are slightly off (we’re missing a constant factor of \eps_0), but the form of the equation is exactly as we expect. What about the other three components?

    \begin{gather*}\pder{X^\mu}\left(\pder[\Bf{A}]{X^{\mu}}-\pder[A_{\mu}]{\Bf{x}}\right) = \Bf{J}\\\curl\Bf{B} +\pder{t}\left(\pder[\Bf{A}]{t}-\pder[A_{0}]{\Bf{x}}\right)   = \Bf{J}\\\curl\Bf{B} - \pder[\Bf{E}]{t} = \Bf{J}\end{gather*}

et voila, we have Ampere’s law (also missing a constant factor, c^2 this time, meant to be attached to the curl term).

So why in the world do you need to know this? You certainly don’t need this derivation to use Maxwell’s equations effectively. Nevertheless, this is something I learned during my doctoral studies that I found to be particularly beautiful; and I want more people to be aware of it. I hope that, at the very least, you have a bit more of an appreciation for the beauty of Maxwell’s equations than you had before you started reading this page.

Bibliography

This article is an excerpt from Prof. Rodríguez’s PhD thesis. If you would like to cite it for your own work, please cite:

– J. A. Rodríguez, Electromagnetic Wave Manipulation with Plasma Metamaterials, Ph.D. thesis, Stanford University (2023).

[1]: Leonard Susskind and Art Friedman. The Theoretical Minimum: Special Relativity and Classical Field Theory. Basic Books, New York, NY, 2017.

[2]: Lev Davidovich Landau and Evgeni Mikhailovich Lifshits. Mechanics, volume 1. CUP Archive, 1960.