The Rich Get Richer: Preferential Attachment in Neural Networks

This essay was written in response to a question from Professor Chris Lynn for my doctoral qualifying examination. The question asked me to explain preferential attachment in complex networks, contrast it with a newer version used to explain synaptic weight distributions in connectomes, and relate both to Hebbian plasticity.

What appears at first as a narrow technical question opens onto something vast. The same mathematical structure—power laws, scale-free distributions, "rich-get-richer" dynamics—appears wherever you look: the connectivity of neurons, the branching of rivers, the distribution of earthquake magnitudes, the spread of forest fires, the clustering of galaxies. This is not coincidence. These systems share a deep kinship through concepts like percolation theory, self-organized criticality, and deterministic chaos.

Don't let the symbols scare you. Behind every equation here is something you can touch, watch, or imagine: sand piling until it avalanches, a magnet aligning its spins, a fire leaping from tree to tree, a thought crystallizing from neural noise. The mathematics isn't abstraction—it's the skeleton key that unlocks why these wildly different phenomena obey the same statistical laws. If you want to watch these principles unfold visually, see Veritasium on power laws and the systems that tune themselves to criticality, and Delta t's elegant primer on percolation—the mathematics of when a network suddenly connects.

What I find most compelling about Lynn et al.'s 2024 work is how it closes a loop: the Ising model (originally built to describe magnets, now a cornerstone of statistical physics) meets forest-fire dynamics, meets Hebbian plasticity, meets network theory. The "rich-get-richer" phenomenon isn't just a metaphor—it's the inevitable consequence of activity-dependent learning under resource constraints, operating at the critical boundary where order and chaos meet.

The Essay

The Rich Get Richer

Preferential Attachment in Neural Networks

Imagine you're at a party where people are constantly arriving. When someone new walks in, who are they most likely to start talking to? Probably the person who already seems to know everyone—the social hub who's surrounded by people. This is preferential attachment in action: the popular get more popular.

More formally, preferential attachment, as introduced by Barabási and Albert (1999), represents a fundamental growth mechanism whereby the probability of connecting to a node increases linearly with the number of connections that node already possesses. This mechanism captures a ubiquitous phenomenon in evolving systems: entities with greater connectivity, visibility, or resources tend to accumulate advantages at an accelerating rate. In the canonical model, when a new node enters the network with $m$ edges to distribute, it connects to existing node $i$ with probability $\Pi(k_i) = k_i / \sum_j k_j$ , where $k_i$ represents node $i$ 's current degree.

This mechanism embodies several real-world processes. In citation networks, highly cited papers gain visibility and attract citations at increasing rates. On the World Wide Web, popular websites accumulate inbound links more rapidly than obscure ones. In social networks, individuals with many connections have greater opportunities to form new relationships. This same principle operates across vastly different networks. When new websites are created, they tend to link to already-popular sites like Wikipedia or YouTube. When scientists write papers, they preferentially cite already well-cited "classic" papers. When proteins evolve new interaction capabilities, they often bind to proteins that already interact with many partners.

The defining feature of scale-free networks is their power-law degree distribution: $P(k) \sim k^{-\gamma}$ . This functional form fundamentally differs from the exponential or Gaussian distributions characterizing random and regular networks. The power-law distribution is "scale-free" because it satisfies the scaling relation $P(ak) = a^{-\gamma} P(k)$ , meaning the distribution retains its functional form under rescaling. This mathematical property has structural implications which guarantees the presence of nodes with degrees far exceeding what random chance would produce.

Three key features distinguish scale-free architectures. First, they exhibit extreme heterogeneity, with most nodes having few connections while a small fraction becomes highly connected hubs. The maximum degree scales as $k_\text{max} \sim N^{1/(\gamma-1)}$ , growing unboundedly with network size $N$ . Second, they display the small-world property, with average shortest path lengths scaling as $\langle \ell \rangle \approx \log N / \log \log N$ for large $N$ , even more compressed than the $\log(N)$ scaling of random networks. Third, they show correlated resilience: remarkable robustness to random node failures (percolation threshold approaching zero as $N \to \infty$ ) coupled with extreme fragility to targeted hub removal.

The "scale-free" name comes from a mathematical property: if you zoom in or out on a power-law distribution, it looks the same—these dynamics form a line when plotted on a log-log reference frame and is related to the self-similar properties of the transcendental boundary $e^x$ , the unique eigenfunction equal to its own derivative. Whether you're looking at nodes with 10–100 connections or 1,000–10,000 connections, the statistical pattern is identical. There's no "typical" node degree, unlike in random networks where most nodes have similar connectivity. This creates networks that are simultaneously robust and fragile: remove random nodes and the network barely notices (you probably weren't connected through that random small airport anyway), but remove the hubs and everything falls apart (cancel flights at Atlanta and the whole system cascades into chaos).

The Barabási-Albert model demonstrates that these complex structural properties emerge from just two ingredients: growth (networks expand through node addition) and preferential attachment (new connections favor well-connected nodes). This minimal mechanism generates $\gamma = 3$ universally, though variations incorporating fitness parameters, aging effects, or nonlinear attachment kernels can tune the exponent. The model's elegance lies in explaining the ubiquity of scale-free structure across technological, biological, and social systems through a simple, parameter-free mechanism.

In contrast, the application of preferential attachment to neuronal connectomes by Lynn et al. (2024) represents a reconceptualization of the mechanism, shifting focus from global network topology to how local connection weights can self-organize into networks that display these network properties. While traditional preferential attachment governs which nodes become connected during network growth, the connectome version governs how synaptic strength redistributes among existing connections. This distinction reflects a critical difference between engineered or social networks, which can expand indefinitely, and neural networks, which operate under strict biological constraints on total synaptic resources. Specifically, these patterns emerge under the two-fold antagonistic optimization constraints where evolution finds a dynamic balance between wiring length minimization and maximizing computational connectivity.

In this model, the network topology remains fixed while synaptic weights undergo continuous reorganization. The dynamics proceed through two coupled processes: random pruning and biased redistribution. At each time step, a connection is randomly selected and pruned (its weight set to zero), and the pruned synaptic strength is redistributed unit by unit. Each unit is allocated either preferentially—with probability proportional to existing connection strengths—or randomly with uniform probability. The balance between these modes is controlled by parameter $p$ , the probability of preferential redistribution.

This mechanism yields a steady-state weight distribution following $P(s) \propto \Gamma[s + \bar{s}(1/p - 1)] / \Gamma[s + \bar{s}(1/p - 1) + 1 + 1/p]$ , which exhibits power-law scaling $P(s) \sim s^{-\gamma}$ for large $s$ , with exponent $\gamma = 1 + 1/p$ . Crucially, the total synaptic strength $S = \sum_{ij} A_{ij}$ remains constant throughout the dynamics, reflecting metabolic and spatial constraints in neural tissue. This conservation principle fundamentally distinguishes the connectome model from traditional preferential attachment, where total network connectivity grows without bound.

The mathematical framework also differs substantially. Traditional preferential attachment uses a forward-time approach: new nodes arrive and form connections based on existing degree distributions. The connectome model employs a redistribution approach: existing resources continuously reallocate through competitive dynamics. This generates heavy-tailed distributions not of node degrees but of connection weights—addressing the empirical observation that while most neuron pairs remain unconnected or weakly connected, some develop extremely strong multi-synaptic connections.

Three key innovations distinguish this approach. First, it operates on weighted rather than binary networks, recognizing that connection strength, not mere presence, determines neural information flow. Second, it implements competition for finite resources, capturing the fundamental constraint that neurons cannot indefinitely increase their total synaptic output. Third, it produces power-law weight distributions while maintaining sparse topology—most neuron pairs remain unconnected ( $A_{ij} \approx 0$ ), but connected pairs show extreme heterogeneity in strength. This framework thus explains how neural networks achieve their characteristic structure: sparse topology for efficiency combined with heavy-tailed weights for computational power.

Further, the Lynn et al. model showed that preferential attachment in neural networks emerges naturally from Hebbian plasticity—the fundamental principle that synaptic connections strengthen between neurons with correlated activity. This unification bridges abstract network theory with concrete biological mechanisms, illustrating that the "rich-get-richer" dynamics of preferential attachment can be understood as the mathematical consequence of "fire together, wire together" at the cellular level.

To establish this connection, Lynn et al. extend their model to incorporate neural activity through a mean-field Ising framework. Neurons reach steady-state activities defined by $x_i = \tanh(\beta \sum_j A_{ij} x_j)$ , where $\beta$ parameterizes interaction strength and $A_{ij}$ represents the synaptic weight matrix normalized by the average connection strength. For symmetric connections, the fluctuation-dissipation theorem yields pairwise covariances $C = \beta(I - \beta DA)^{-1}D$ , where $D_{ij} = (1 - x_i^2)\delta_{ij}$ is a diagonal matrix capturing single-neuron variability.

The key insight emerges from the Taylor expansion of covariances in the weak-interaction regime ( $\beta \ll 1$ ):

C_{ij} = \beta \bigl( A_{ij} + \beta A_{ik} A_{kj} + \beta^2 A_{ik} A_{kl} A_{lj} + \cdots \bigr)

To first order in $\beta$ , covariances are directly proportional to connection strengths: $C_{ij} \approx \beta A_{ij}$ . This linear relationship means that Hebbian growth—where connection strength changes proportionally to correlated activity—becomes mathematically identical to preferential attachment based on existing weights. The biological mechanism of activity-dependent plasticity thus naturally implements the abstract principle of cumulative advantage.

Higher-order terms reveal additional structure-function relationships. The second-order term $\beta A_{ik} A_{kj}$ promotes connections between neurons sharing common inputs, fostering triangular motifs and increasing clustering. The third-order term strengthens longer paths, creating recurrent loops. As interaction strength $\beta$ increases toward the critical value ( $\beta = 1$ for random networks), these higher-order effects become prominent, transitioning the network from purely preferential (weight-based) to more complex (path-based) growth patterns.

This framework resolves a fundamental puzzle: why do neural networks exhibit heavy-tailed weight distributions? The answer lies in the feedback loop between structure and function. Strong connections generate correlated activity, which through Hebbian plasticity further strengthens those connections. This positive feedback, operating under resource constraints, inevitably produces power-law distributions. The exponent $\gamma = 1 + 1/p$ now gains biological interpretation: $p$ represents the fraction of synaptic resources allocated through activity-dependent (Hebbian) versus activity-independent (homeostatic) mechanisms.

The insight from this model suggests that scale-free synaptic distributions are not genetically hardwired but emerge from general plasticity principles. It also predicts that manipulating interaction strengths or correlation structures should systematically alter weight distributions. Lastly, it provides a principled framework for understanding how local learning rules generate global network architectures optimized for information processing.

✦

Afterword

The party metaphor that opens this essay came from trying to make preferential attachment viscerally intuitive before diving into the mathematics. But as I wrote, I kept bumping into something larger: the same equations kept appearing in different costumes.

The Ising model was invented to explain why iron becomes magnetic—how local interactions between neighboring spins cascade into global order. The forest-fire model was built to understand how trees catch fire from their neighbors, how blazes spread and die. The sandpile model asks when adding one more grain triggers an avalanche. These seem like entirely different problems. And yet they share a mathematical skeleton: local rules, critical thresholds, power-law distributions of event sizes, and the strange regime where the system tunes itself to the boundary between order and chaos.

This is self-organized criticality—the discovery that many complex systems don't need to be fine-tuned to reach criticality; they evolve there naturally. The brain, it turns out, may be one of these systems. Neural networks operating near criticality maximize their dynamic range, their sensitivity to inputs, their capacity to transmit information. The power-law weight distributions that Lynn et al. explain through preferential attachment may be a signature of this deeper principle: the brain balancing itself at the edge.

The Taylor expansion showing $C_{ij} \approx \beta A_{ij}$ is the crux of the whole argument. It's the moment where network theory and neuroscience become the same mathematics. Correlated activity is preferential attachment, expressed in different notation.

What strikes me most is the closure of the loop. Start with abstract network theory. Add the physics of phase transitions. Incorporate the biology of Hebbian learning. You arrive at the same place from every direction—a convergence that suggests we're not just finding patterns but touching something true about how complex systems organize themselves across every scale, from neural synapses to ecosystems to the large-scale structure of the universe.

There's a broader lesson here about the relationship between structure and dynamics in complex systems. The network shapes the dynamics (connectivity determines correlations), but the dynamics reshape the network (correlations drive plasticity). This circular causation—where structure and function continuously co-evolve—may be the defining feature of biological computation.

The symbols are just shorthand. The reality is avalanches, and magnets, and thoughts catching fire.