You could have invented quantum mechanics

Quantum mechanics is somewhat infamous for its alleged abstraction. It is often stated, in varying degrees of seriousness, that quantum mechanics cannot be understood in any real sense and must be treated as a piece of black magic. This could not be further from the truth.

The issue at play here is primarily didactic in nature. A cursory glance at any introductory book on quantum mechanics, particularly modern ones, will reveal that they all dive straight into the mathematical formalism underlying quantum mechanics. They introduce, without any real physical motivation, the concept of a (separable) Hilbert space, and some form or other of the Schrödinger equation. If you’re lucky, the author will bother to point out the analogy with Newton’s first law, but often this analogy is confined to a footnote or an off-hand remark.

This is quite a sad state of affairs. Many of the seemingly arbitrary postulates governing quantum mechanics are in fact entirely natural when viewed from the right angle. In fact, I claim that, once one has digested the fundamental of classical Hamiltonian mechanics, effectively the entire framework of quantum mechanics may be deduced from just a single modification of propositional logic. The goal of this document is to elaborate on this approach, working our way up to the classical Schrödinger equation.

I will assume that the reader is acquainted with classical mechanics, including the Hamiltonian formalism and its mathematical backbone. I can wholeheartedly recommend V. I. Arnold’s Mathematical Methods of Classical Mechanics for more on this topic. Additionally, I assume some basic familiarity with functional analysis. Conway’s A Course in Functional Analysis is pretty nice.

A mathematical summary of Hamiltonian mechanics

I will begin with a brief review of classical Hamiltonian mechanics. In fact, I will just capture the bare formality, with the tacit assumption that the reader has enough familiarity with the subject matter to know where all these formalities come from.

Our primary object of interest is a symplectic manifold $M$, which may be regarded as the phase space or state space of a physical system. When considering $N$ particles in $3$-dimensional space, this manifold is taken to be the cotangent space of $\mathbb{R}^{3N}$ with its canonical symplectic structure, though there are situations in which other state spaces are more suited; for instance, when studying rotations of rigid bodies, it makes sense to let $M$ be the cotangent space of $\mathrm{SO}(3)$.

In Hamiltonian mechanics, an observable is any measurable quantity depending on the state of a system, and as such, it may simply be defined to be a Borel measurable function on $M$, typically with values in $\mathbb{R}$. In practice, such an observable would typically be something like momentum or energy. A true–false statement about a system will, tautologically, depend only on the state of that system, and as such is captured entirely by a measurable subset of $M$; in fact, we may as well define a statement in this way.

The symplectic $2$-form $\omega$ on $M$ defines a correspondence between $1$-forms and vector fields; specifically, if $\varphi$ is a $1$-form, then the corresponding vector field $X_{\varphi}$ is uniquely defined by the property that $\omega(X_{\varphi},Y) = \varphi(Y)$ for all vector fields $Y$. As a special case, for any smooth function $H \colon M \to \mathbb{R}$ the vector field $X_{dH}$ associated with the $1$-form $dH$ is called the Hamiltonian vector field of $H$. Typically $H$ is taken to be a suitable energy function, in which case the flow along $X_{dH}$ is precisely the time evolution of the physical system.

By a theorem of Darboux, any $2n$-dimensional symplectic manifold $M$ has local coordinates $(x^1,\ldots,x^n,p_1,\ldots,p_n)$ on which $\omega$ takes on the form $\sum_i dx^i \wedge dp_i$. With respect to such a coordinate system, a curve $\gamma(t) = \big(x(t),p(t)\big)$ is an integral curve for the Hamiltonian vector field $X_{dH}$ if and only if it is a solution to Hamilton’s equations \[ \dot{x}^i = \frac{\partial H}{\partial p_i} \qquad \text{and} \qquad \dot{p}_i = -\frac{\partial H}{\partial x^i}\text{.} \] Note that $H$ is constant along integral curves — as would be expected from the principle of conservation of energy.

Quantum logic

The passage from classical mechanics to quantum mechanics is captured entirely by the following postulate: In a quantum system, true–false statements do not adhere to the distributive law of ordinary propositional logic. By this we mean the following. If $P$, $Q$, and $R$ are three true–false statements, then the classical equivalence between “($P$ or $Q$) and $R$” and “($P$ and $R$) or ($Q$ and $R$)” fails to hold.

Although abstract, this postulate is motivated entirely from actual physics. In fact, the postulate is apparent already in Young’s classical double-slit experiment. In this experiment, a laser beam illuminates a barrier with two narrow slits in it, and the light particles^{Note 1} that pass through the slits then arrive at a screen where the particles are detected. The wave-like nature of light causes the light passing through the slits to interfere, producing bright and dark patterns on the screen.

Now let $P$ be the statement that a given light particle goes through one specified slit, and let $Q$ be the statement that the light particle goes through the other slit. Finally, $R$ is the statement that a given electron arrives at the screen. Observing the light which has hit the screen is effectively a verification that these light particles satisfy “($P$ or $Q$) and $R$”.

At this point, let’s add detectors to each of the slits which are capable of detecting single photons passing through them. This allows us to keep track of which slit each of the photons pass through; in other words, whether $P$ holds, or $Q$ holds. When we run the experiment in this way, the interference pattern disappears. Instead, one merely finds a sum of two adjacent diffraction patterns coming from both of the slits. In effect, owing to the detectors, observing a light particle hitting the screen now amounts to a verification of the statement “($P$ and $R$) or ($Q$ and $R$)”.

The propositional lattice

Consider again a symplectic manifold $M$, interpreted as the state space of a Hamiltonian system. Recall our observation that the set of true–false statements about our system may be identified with the Borel subsets of $M$.^{Note 2} Moreover, the basic logical connectives ‘or’, ‘and’, and ‘not’ correspond to the operations of union, intersection, and complement. The resulting mathematical structure is that of a distributive lattice equipped with an orthocomplementation, which we call the propositional lattice of our physical system.

By our single postulate, the propositional lattice of a quantum system must no longer be distributive; rather, we expect it to be an orthomodular lattice, which is an orthocomplemented lattice such that, for any two elements $P$ and $Q$, \[ P \leq Q \implies P \vee (P^{\perp} \wedge Q) = Q \text{.} \] It is at this point that we reach a crucial insight of Birkhoff and Von Neumann, which is that the propositional calculus of an orthomodular lattice may be realised using Hilbert spaces. More precisely, if $\mathcal{H}$ is a Hilbert space, then the collection of closed linear subspaces of $\mathcal{H}$ defines an orthomodular lattice. The lattice operations join (‘or’) and meet (‘and’) are given by the closed linear span and intersection, while the complement (‘not’) is given by orthogonal complements.^{Note 3}

The states of a system captured by a Hilbert space $\mathcal{H}$ are to be taken to be unit vectors. In fact, to be more precise, we ought to let the state space be the projective Hilbert space $\mathbb{P}\mathcal{H}$, but we will typically be sloppy in this regard.^{Note 4} In any case, suppose now that $v$ is a unit vector, and $P$ is a proposition corresponding to a closed linear subspace of $\mathcal{H}$, then the statement $P$ is true for $v$ if $v$ resides in this closed subspace. But when is the statement false? To understand this, it helps to identify a closed linear subspace with the orthogonal projection onto that subspace, which for convenience we’ll also denote by $P$. Then it seems reasonable to say that $P$ is false for $v$ if $v$ is the kernel of the projection $P$. But what if $v$ is neither in the range nor in the kernel of $P$? In general, $v$ can be decomposed uniquely as $v_0 + v_1$, where $v_0$ is in the kernel, and $v_1$ is in the range. By orthogonality, we have $||v_0||^2 + ||v_1||^2 = 1$, and so it seems reasonable enough that we should interpret interpret $||v_0||^2$ and $||v_1||^2$ as the probabilities that $P$ is false and $P$ is true, respectively.

What are the observables? In line with classical mechanics, it may seem tempting to say that observables should be real-valued functionals on $\mathcal{H}$. However, suppose that $O \colon \mathcal{H} \to \mathbb{R}$ is an observable, and we make the true–false statement “$O = 2$” — a statement that we’ll denote by $P(2)$. The states for which $P(2)$ is valid would be given by the pre-image $O^{-1}(2)$, which is clearly not a closed linear subspace. In addition, from the (as of yet heuristic) uncertainty principle we know that it would not make sense to attach explicit numbers to each and every state.

We wish to find a model for observables so that a statement such as $P(2)$ is a closed linear subspace. In fact, more generally, for any Borel subset $E$ of $\mathbb{R}$, we have a true–false statement $P(E)$ saying that the value of $O$ is in $E$. This association $E \mapsto P(E)$ is a familiar mathematical structure: it is a projection-valued measure on $\mathbb{R}$. Inasmuch as the observable $O$ is entirely captured by the mapping $E \mapsto P(E)$, we may as well define observables as projection-valued measures.

At this point we make use of an important mathematical tool known as the spectral theorem, which says that projection-valued measures are in one-to-one correspondence with (possibly unbounded) self-adjoint operators. The correspondence is nontrivial to state; it relies on a theory known as functional calculus. The upshot which matters to us is that observables may be identified with self-adjoint operators, and we shall continue to do so henceforth.

The proto-Schrödinger equation

Now that we know what observables are, we wish to associate to every observable a “Hamiltonian flow” governing the evolution of the system. The obvious analogue of a vector field on a symplectic manifold would be a one-parameter family of operators on the state space $\mathbb{P}\mathcal{H}$. In other words, we want to look at projective representations $\mathbb{R} \to \operatorname{Aut}(\mathbb{P}\mathcal{H})$.

First things first. What is $\operatorname{Aut}(\mathbb{P}\mathcal{H})$? In other words, what are the operators on $\mathbb{P}\mathcal{H}$? The answer is known as Wigner’s theorem, which states that all such automorphisms come from either unitary or anti-unitary operators on $\mathcal{H}$, which form two connected components on the automorphism group. Thus, whenever we have a projective representation $\mathbb{R} \to \operatorname{Aut}(\mathbb{P}\mathcal{H})$, we may hope that it lifts to a unitary representation on $\mathcal{H}$. As it happens, this is always true, and so we may safely identify flows with one-parameter families of unitary operators.^{Note 5}

So what are the one-parameter families of unitary operators? This brings us to another piece of mathematics. Stone’s theorem, named after the American Marshall Stone, states that any and every one-parameter unitary operator is of the form $t \mapsto e^{itA}$ for some self-adjoint operator $A$. This suggests an obvious candidate for our Hamiltonian flow. Starting with an observable in the form of a self-adjoint operator $A$, the resulting Hamiltonian flow must surely be $e^{itA}$.

Or is it? Actually, upon fixing any scalar $c$, we could form a one-to-one correspondence $A \leftrightarrow e^{ictA}$, and it will turn out that one wants $c$ to be negative to ensure that our formalism ends up resembling classical mechanics. Moreover, dimensional analysis reveals that $c$ must not be dimensionless, but must have dimensions $[\mathrm{m}^{-1}\, \mathrm{l}^{-2}\, \mathrm{t}]$. In fact, it turns out that $c$ must be $-1/\hbar$, where $\hbar$ is Planck’s constant $1.05 \times 10^{-34}\, \mathrm{kg} \,\mathrm{m}^2 \,\mathrm{s}^{-1}$. If you wish, you can leave the identification of $c$ in the middle for now; as we proceed with the quantisation of classical observables in the next section and we work out the precise shape of the Schrödinger equation, we will see that $c$ must be negative for the equation to make physical sense.

The ‘Hamiltonian flow’ associated to an observable $A$ can be stated in terms of a differential equation: if $\Psi_t$ is the state of the system at time $t$, then \[ \frac{d\Psi_t}{dt} = ic A \Psi_t\text{.}\] This equation may be regarded as a first iteration of the Schrödinger equation. The real Schrödinger equation will arise from a suitable choice of $A$, as we shall see below.

Quantisation of position and momentum

Although we now know that observables correspond to self-adjoint operators, we haven’t considered the question which operators correspond to position and momentum. To answer this question, we observe that there is an algebraic property that we expect these observables to have, after which we are able to invoke a mathematical theorem which essentially shows that there’s only one way to fill in the blanks.

In a Hamiltonian system of $N$ moving particles in $\mathbb{R}^3$, we have $3N$ position and momentum observables. What will end up being a crucial observation is that these observables adhere to commutation relations \[ \{x_i,p_j\} = \delta_{ij}\qquad \text{and} \qquad \{x_i,x_j\} = \{p_i,p_j\} = 0\] where $\{\,\cdot\,,\,\cdot\,\}$ denotes the Poisson bracket. What would be the quantum-mechanical analogue of these relations? To start off with, what is the analogue of the Poisson bracket? Self-adjoin operators form a Lie algebra under the association $(A,B) \mapsto ic[A,B]$, and I claim that this association is the correct analogue of the Poisson bracket. This can be motivated by the fact that the flow of the Lie bracket behaves ‘as would be expected’. Indeed, in classical mechanics, the evolution of an obervable $g$ under the Hamiltonian flow $\Phi_t$ along the vector field $X_f$ generated by an observable $f$ is given by \[ \frac{d}{dt} (g \circ \Phi_t) = \{g,f\} \circ \Phi_t \text{.} \] By analogy, the one-parameter family $e^{ictA}$ generated by an observable $A$ can be regarded as transforming the other observables by $B \mapsto B_t = e^{-ictA} \circ B \circ e^{ictA}$, and as such the evolution of $B$ under this flow would be \[ \frac{d}{dt} B_t = e^{-ictA} \circ ic[A,B] \circ e^{ictA} \text{.} \] With this analogy in place, we are able to make the educated guess that position and momentum correspond to operators $X_i$ and $P_j$ such that $ic [X_i,P_j] = \delta_{ij}$ and $[X_i,X_j] = [P_i,P_j] = 0$.

Mathematically speaking, we have reduced our problem to the Lie algebra representation theory of the Heisenberg algebra and the key input from mathematics that we now need is known as the Stone–Von Neumann theorem. Although a bit too complicated to state formally, what it effectively states is that there is only one way to fill in the details, which is that the Hilbert space $\mathcal{H}$ must be $L^2(\mathbb{R}^{3N})$, $X_i$ must be multiplication by the $i$-th coordinate $x_i$, and $P_j$ must be $ic \,\partial_i$. It is worth pointing out that the dimensions of these operators match up with our physical intuition.

We are now ready to state the Schrödinger equation in its final form. The classical Hamiltonian, stated in terms of momentum and potential energy, would be \[ H(x,t) = \frac{p^2}{2m} + V(x,t)\text{,} \] and so in quantum mechanics, we expect the Hamiltonian to be the self-adjoint operator $H$ on $L^2(\mathbb{R}^{3N})$ given by \[ H = -\frac{1}{2c^2 m} \nabla^2 + V(x,t)\text{,}\] where $V(x,t)$ is to be interpreted as a scalar multiplication. Upon inserting this choice of operator into the proto-Schrödinger equation, we find that a state (or wave function) $\Psi_t$ must evolve as \[ i \hbar \frac{d \Psi_t}{dt} = \bigg(-\frac{\hbar^2}{2m} \nabla^2 + V(x,t)\bigg) \Psi_t \text{,}\] which, at last, is the Schrödinger equation as we know and love it.

Loose ends

We have worked our way up to the Schrödinger equation, but of course the story doesn’t end there. Textbook quantum mechanics typically comes with several postulates that I haven’t even mentioned, such as a postulate on expectation values of observables and a postulate on the physical meaning of eigenvalues. As it happens however, these can all be inferred from things that have been mentioned in some way or other.

Additionally, I have skipped over another fundamental classical observable, which is angular momentum. Its quantisation proceeds by considering its vector component, which can be expressed in terms of position and momentum. Here, I may as well mention that there is another famous quantum observable called spin which is not the quantisation of any classical observable, and which is usually placed into the theory in an ad hoc manner, though it is worth pointing out that its existence can be inferred once one modifies our setup to be consistent with special relativity.

Finally, in regards to the mathematics, I have entirely ignored the issue of boundedness. The self-adjoint operators arising from the spectral theorem, and indeed most operators that are of physical relevance, are unbounded, and so in principle one has to worry about the domains on which these operators are defined. For instance, what is the domain of the Lie bracket of two unbounded operators, or of the Hamiltonian flow of an unbounded operator? Frankly speaking, life is too short to fuss about these technicalities.

If the reader wants to learn more, I highly recommend Folland’s Quantum Field Theory. Among many other things it contains an introductory section on quantum mechanics which is in fact where I learned most of what I’ve written up here.

Footnotes

If ‘light particle’ rubs you the wrong way, just think about an electron gun instead.
Measure theorists would call this the $\sigma$-algebra of $M$.
Is there any intrinsic connection between orthomodular lattices and Hilbert spaces? Say, are there any characterisations saying that an orthomodular lattice of such-and-such nature is always isomorphic to a Hilbert lattice?
Physicists call $\mathbb{P}\mathcal{H}$ the ray space of the system.
More generally, there is a cohomological obstruction theory governing whether a projective representation of a simply connected Lie group $G$ can be lifted to a unitary representation — a fact known as Bargmann’s theorem — and these obstructions vanish when $G = \mathbb{R}$.