PaperTrace — Interactive ML Paper Deep-Dives

TL;DR

Entropy always increases (2nd law of thermodynamics), but complexity — like the swirling patterns of coffee and milk mixing — first rises, then falls. This paper formalizes that intuition using a cellular automaton and Kolmogorov complexity after coarse-graining. It shows that "apparent complexity" peaks at an intermediate time, connecting physics, information theory, and the story of complexity in our universe.

1. The Coffee Question

Pour a splash of milk into your morning coffee. At first, you see a clear boundary: black coffee, white milk, sharply separated. Then you stir. The boundary explodes into intricate swirls — tendrils of white curling through the black, fractal-like eddies at every scale. Finally, after enough time, the cup is uniform: a homogeneous beige.

This mundane observation hides a deep puzzle. The Second Law of Thermodynamics tells us that entropy — disorder — monotonically increases. The coffee cup goes from low entropy (separated) to high entropy (mixed). But the complexity of the system does not monotonically increase. It rises from simple (separated layers) to complex (swirling patterns) and then falls back to simple (uniform gray). Entropy increases; complexity does not.

This paper asks: can we make this precise? Can we formally define complexity so that it provably rises and then falls in a closed physical system? And what does this tell us about complexity in the universe at large?

2. Entropy vs. Complexity

Entropy, in the Boltzmann sense, counts the number of microstates consistent with the macroscopic description of a system. For our coffee cup, entropy tracks how many ways the coffee and milk molecules could be arranged at a given moment. The Second Law guarantees this count only grows.

Complexity is something different. Intuitively, a highly ordered state (all black on the left, all white on the right) is not complex — it is simple to describe. A completely random state (every pixel independently black or white) is also not complex in the interesting sense — it is just noise, also simple to describe ("pick each color with probability 1/2"). Complexity peaks somewhere in between: where there is structure, but structure that takes many bits to describe.

The formal tool for measuring descriptive complexity is Kolmogorov complexity. The Kolmogorov complexity K(x) of a string x is the length of the shortest program that outputs x. A fully ordered state has low K (short program: "output half black, half white"). A fully random state also has low K in principle — but its apparent description, after coarse-graining, will be low too. It is the intermediate, swirling state that requires a long program to describe.

3. The Coffee Automaton Setup

To make things concrete and simulatable, the paper defines a cellular automaton that models the coffee-mixing process. A cellular automaton is a grid of cells, each in a discrete state, that evolves in synchronous discrete time steps according to a local rule.

The coffee automaton uses a two-dimensional grid where each cell is either black (coffee, B) or white (milk, W). The initial state has all black cells on one side and all white cells on the other — a clean separation, like just-poured milk. The automaton evolves by probabilistically swapping adjacent cells, mimicking thermal diffusion.

The key parameters are: grid size N×N, initial condition (half black / half white), swap probability p, and observation time t. The system is a closed system — no energy or information flows in or out. Like a sealed coffee cup.

4. Coarse-Graining and Apparent Complexity

Here is the central technical idea. The fully fine-grained state of the coffee automaton at late times looks random — and indeed, the Kolmogorov complexity K(x) of the raw state is high (near-random). But when you look at the cup with your naked eye, you do not see individual molecules. You see regions of roughly uniform color. This is coarse-graining.

Coarse-graining at resolution ε means: divide the grid into blocks of size ε×ε, and replace each block with its average color (the fraction of black cells in the block). The coarse-grained state x_ε is a lower-resolution image of the system.

x_\varepsilon[i,j] = \frac{1}{\varepsilon^2} \sum_{(i',j') \in \text{block}(i,j)} x[i',j']

The apparent complexity at resolution ε is then simply the Kolmogorov complexity of the coarse-grained image:

\mathrm{AC}_\varepsilon(x) = K(x_\varepsilon)

Intuitively: AC_ε(x) answers the question "how hard is it to describe what this system looks like from a distance of ε?" It strips away fine-grained randomness that an observer at resolution ε cannot see anyway.

5. The Three Phases

The coffee automaton simulation confirms the three-phase picture with precision:

Initially ordered (t = 0)

The system starts in a low-entropy, low-complexity state: half black, half white, perfectly separated. The coarse-grained image is easy to describe — one edge, two colors. AC_ε is small.

Complex mixing patterns (t = intermediate)

As cells mix, intricate patterns emerge. At intermediate times, the coarse-grained image shows complex spatial structure — tendrils, gradients, islands of black in white and white in black. No short description captures this. AC_ε peaks.

Homogeneous gray (t → ∞)

At equilibrium, every block has roughly equal fractions of black and white. The coarse-grained image is approximately uniform gray. It can be described very compactly: "constant 0.5 everywhere". AC_ε collapses near zero.

Throughout all three phases, the fine-grained entropy H(x) is non-decreasing — it starts low and plateaus at its maximum. This is consistent with the Second Law. But AC_ε(x) traces out an arc: low → high → low. The two quantities measure genuinely different things.

6. Mathematical Formalization

The paper provides rigorous definitions for the quantities involved. Let x ∈ {0,1}^{N×N} be a state of the N×N grid. Define entropy and apparent complexity as follows:

H(x_t) \;\leq\; H(x_{t+1}) \quad \forall\, t \geq 0

Here H(x) is the Shannon entropy of the probability distribution over fine-grained states consistent with the coarse-grained observation. The Second Law guarantees this is monotone.

\mathrm{AC}_\varepsilon(x) = K(x_\varepsilon) = K\!\left(\left\{\frac{1}{\varepsilon^2}\sum_{(i',j') \in \text{block}_{ij}} x[i',j']\right\}_{i,j}\right)

The key finding: there exists a time t* such that AC_ε(x_{t*}) > AC_ε(x_0) and AC_ε(x_{t*}) > AC_ε(x_∞). The complexity has a non-monotone arc.

The paper also introduces complextropy as a way to define a single complexity measure by minimizing over all coarse-graining scales:

\mathcal{C}(x) \;\approx\; \min_{\varepsilon} \,\mathrm{AC}_\varepsilon(x) \;=\; \min_\varepsilon K(x_\varepsilon)

The relationship between entropy H and apparent complexity AC_ε can be summarized compactly. Let x_0 be the initial state, x_t the state at time t, and x_∞ the equilibrium state:

\underbrace{H(x_0) \leq H(x_t) \leq H(x_\infty)}_{\text{monotone (2nd law)}} \qquad \underbrace{\mathrm{AC}_\varepsilon(x_0) \lesssim \mathrm{AC}_\varepsilon(x_{t^*}) \gtrsim \mathrm{AC}_\varepsilon(x_\infty)}_{\text{non-monotone (complexity)}}

7. Connection to the First Law of Complexodynamics

This paper is directly motivated by Scott Aaronson's earlier essay "Why Philosophers Should Care About Computational Complexity" (2011), which introduced the term complextropy and posed the question: is there a formal analog of entropy for complexity — a quantity that tracks the interesting structure of a system rather than just its disorder?

The coffee automaton paper provides concrete evidence for what Aaronson called the "First Law of Complexodynamics": complexity first increases, then decreases, as a closed system evolves from an ordered initial state to thermal equilibrium. It is not a law in the strict sense (unlike the Second Law of Thermodynamics), but a robust empirical finding confirmed in the automaton setting.

The key asymmetry between entropy and complexity: Entropy is a function of the probability distribution over microstates — it is a property of our knowledge. Complexity (apparent complexity / complextropy) is a property of the coarse-grained state itself — what the system looks like at a given resolution. This distinction is physically meaningful: an observer with limited resolution will perceive complex patterns even when the fine-grained state is already maximally entropic.

The paper also notes that complextropy is related to logical depth (Bennett 1988): the time required by an optimal program to reproduce the coarse-grained state. A state with high complextropy is one that is hard to describe but also hard to produce — it takes a long computation to generate it from a short description.

8. Why This Matters for AI and Physics

The coffee automaton is a toy model, but its implications are profound.

For physics: the universe started in a very low-entropy state (the Big Bang). Entropy has been increasing ever since. But the universe has also been getting more complex — stars, galaxies, planets, life, minds emerged where none existed before. The Second Law cannot explain this: rising entropy should lead to eventual heat death (maximum disorder), not increasing complexity. The coffee automaton framework suggests that complexity is a transient, non-monotone quantity: we are in the "rising complexity" phase of the universe's arc.

For AI and machine learning: the coarse-graining framework has a direct analog in representation learning. A neural network that learns to represent data is doing something like coarse-graining: it is compressing the raw input (which may be high-entropy) into a compact representation that captures structure at the right level of abstraction. The question of what makes a good representation is related to the question of what resolution ε minimizes AC_ε — where is the information that is genuinely complex and useful?

For complexity theory: Kolmogorov complexity is uncomputable, but the ideas here suggest practical approximations. Compression algorithms like gzip or LZMA approximate K(x); applying them to coarse-grained images gives computable versions of AC_ε. This has been used in practice to detect structure in time series, images, and biological data.

← Back to Top Papers