Born to Fly: Developmental Programming and Neural Machine Code
How evolution programs neural circuits before birth—exploring the computational demands of precocial species and what they reveal about developmental 'machine code' in the brain.
This essay was written in response to a question from Professor Gregory McCarthy for my doctoral qualifying examination. The question asked whether developmental processes in the brain might serve as an analogue to "inserting machine code" into a neural system—and if so, what kinds of computations might be "pre-compiled" by neurodevelopment.
The prompt drew on Kim and Bassett's 2023 framework for directly programming recurrent neural networks without training data—a paper that resonates deeply with my own research on analytically programming reservoir computers. Where machine learning typically discovers structure through gradient descent on massive datasets, these approaches ask: what if we could simply write the dynamics we want?
I found myself drawn to an extreme test case: the megapode chick, a bird that hatches fully feathered and capable of sustained flight and hunting on its first day of life. No matched filter or simple reflex can explain such capabilities. Something more must be pre-installed.
Born to Fly
Developmental Programming and Neural Machine Code
Consider the computational puzzle of precocial species. Newly hatched Trichogramma wasps immediately assess host egg volumes to optimize offspring deposition. Toad hatchlings (Bufo) snap at prey with no prior experience. Conventional wisdom, championed by Rüdiger Wehner's influential work on neural matched filters, dismisses these as peripheral sensory tricks requiring no internal world model. The wasp merely matches bristle activation to motor output; the toad triggers when visual features cross specialized detectors. No computation, just reflexes.
This comfortable narrative shatters when we encounter a "superprecocial" species such as the avian family Megapodiidae—megapodes, colloquially. These Australian birds hatch fully feathered, capable of sustained flight, and successfully hunt on their first day of life. No matched filter can explain how a day-old chick navigates turbulent air currents, exploits thermals, compensates for wind shear, tracks prey in 3D space, predicts evasion trajectories, and distinguishes edible from inedible targets—all while coordinating complex flight mechanics it has never practiced. This essay argues that these capabilities must be pre-installed before birth, suggesting that evolution has discovered how to program neural circuits with sophisticated computational capabilities, not just peripheral reflexes.
The computational demands of megapode behavior expose the poverty of the matched filter hypothesis. Flight alone requires real-time solutions to coupled nonlinear differential equations governing turbulent flow dynamics, thermal convection patterns, and variable air density. Add hunting, and the complexity explodes exponentially. The chick must categorize prey (beyond simple motion detection), predict nonlinear 3D escape trajectories evolved over millions of years, plan intercept courses, and execute precise motor control—all while maintaining stable flight. This is not stimulus-response mapping. This is model-based prediction, planning, and control, all embedded before the first breath.
How might evolution pre-install such sophisticated computations? Recent advances in our understanding of neural dynamics, particularly the role of attractor networks in computation, offer a compelling framework. As seen in Khona and Fiete's comprehensive 2022 review, attractor dynamics provide computational primitives perfectly suited to the challenges faced by precocial species: noise-robust representation through massive dimensionality reduction, integration of sensory evidence over time via continuous attractors, structured dynamics encoding predictive models, and compositional flexibility through modular combinations. Crucially, these attractor geometries need not be learned—they can be programmed through developmental processes that shape synaptic connectivity according to genetic and epigenetic instructions.
This brings us to Kim and Bassett's framework for programming recurrent neural networks. They demonstrate that complex computations—from solving differential equations to playing Pong—can be directly compiled into RNN connectivity without any training data. Their approach reveals that desired dynamics can be expressed as analytical functions and compiled into synaptic weights through closed-form solutions, bypassing iterative learning entirely. The key insight is that developmental processes could implement a biological version of this programming framework through molecular gradients establishing spatial coordinates, activity-independent synaptic specification via molecular recognition codes, and spontaneous activity patterns serving as initialization procedures.
Could developmental processes implement such programming? Several mechanisms suggest this is not only possible but likely. Molecular gradients and morphogen fields establish spatial coordinates that could directly specify the weight profiles needed for continuous attractors. Just as Kim and Bassett use analytical derivation to program specific dynamics, biochemical gradients could establish the translation-invariant connectivity patterns required for ring and grid attractors seen throughout spatial navigation circuits. Activity-independent synaptic specification through molecular recognition codes could implement precise connectivity patterns without requiring experience. The genome doesn't need to specify every synapse—only the rules for generating appropriately structured connectivity. Spontaneous activity patterns during development, far from being noise, could serve as boot sequences that position attractors at appropriate operating points.
Several well-studied neural circuits support this programming interpretation. Central pattern generators produce rhythmic motor patterns without sensory feedback, implementing limit cycle dynamics that emerge from developmentally programmed connectivity. Head direction circuits across species exhibit circular attractor geometry with precisely structured inhibitory profiles—a computational architecture present from birth. The oculomotor integrator performs mathematical integration through line attractor dynamics, with synaptic weights precisely tuned to maintain eye position. While refined by experience, the basic computational architecture emerges developmentally. Grid cells implement modular arithmetic on spatial positions through hexagonal firing patterns, with specific periodicities potentially programmed via molecular gradients.
This perspective has several implications. If development programs neural circuits by installing specific attractor geometries, then evolutionary optimization operates on the developmental processes that install computational primitives, not directly on behaviors. This provides a more evolvable substrate for complex behaviors. Learning would then refine pre-existing structure rather than create it from scratch, explaining the remarkable efficiency of animal learning compared to artificial systems. The contrast with machine learning is stark: while we train networks from random initialization, requiring massive datasets to discover basic operations, biology takes a fundamentally different approach by pre-installing computational infrastructure.
Yet this analogy has important limitations. Development is not deterministic compilation but a noisy, probabilistic process requiring robustness to variability. Biological networks must satisfy constraints at multiple scales—from molecular interactions to systems-level dynamics— creating trade-offs absent in engineered systems. The boundary between "programmed" and "learned" computation is necessarily blurry, as circuits remain plastic throughout life. Moreover, biological systems must balance the benefits of pre-programmed structure against the need for flexibility in unpredictable environments.
However, the very indeterminacy of development also points toward the potential for a probabilistic form of neural programming—one that looks less like a hard-wired ROM and more like the diffusion-based generative models now powering some of the state-of-the-art AI. In a diffusion model, a signal is repeatedly corrupted by Gaussian noise whose increments follow Brownian motion; the network is then trained not to reproduce the original data directly but to predict the noise that was added at each step. Because the loss is expressed in the space of noise perturbations rather than backpropagation to pixel-perfect reconstructions, the model learns an entire manifold of valid solutions and can "walk" stochastically across it at inference time. This trick buys two things that classical gradient-descent training on deterministic targets does not: (i) generalizability—every training example covers a neighborhood of nearby states, making few-shot transfer far easier; and (ii) flexibility—the same network can sample multiple plausible outcomes instead of committing to a single over-fitted mapping.
Chaotic dynamical systems offer a useful intuition. Suppose the circuit must approximate a Lorenz or logistic attractor, whose trajectories are bounded yet extremely sensitive to initial conditions. Trying to learn a one-to-one mapping from every possible starting point to a specific future state is hopeless; after enough "stretch-and-fold" iterations even infinitesimal differences explode. The better strategy—much like kneading dye into dough—is to learn the statistics of intermediate swirl patterns: each batch of dough ends up uniquely streaked, yet every batch exhibits the same characteristic striations at a given kneading time. Diffusion models capture exactly that ensemble regularity, producing one admissible striation pattern out of the infinitely many that live on the attractor manifold.
Seen in this light, a way forward for the Kim-and-Bassett compiler is to treat its analytically derived, dense sub-circuits as development's "firmware" while allowing a diffusion-style, noise-conditioned learning rule to fill in the sparse, long-range connections that stitch those primitives into task-specific programs. Concretely, one could (i) compile each dynamical primitive into a tight N×N diagonal block via the existing L2-norm formulation, (ii) initialize all off-diagonal blocks to zero, and (iii) optimize only those sparse weights with an L1-regularised objective that predicts injected Gaussian noise between blocks. The result would marry evolution's pre-compiled attractor libraries with an online, diffusion-driven mechanism for rapid adaptation—aptly an idea developed further in the accompanying response to Chris Lynn's question!
The megapode chick, soaring on its first day of life, hunting in a complex three-dimensional world, stands as a testament to the power of developmental programming. Its capabilities cannot be dismissed as simple matched filters or reflexive responses. Instead, they point to a deeper principle: evolution has discovered how to program neural computers by installing structured attractor dynamics through developmental processes. This perspective bridges Kim and Bassett's neural programming framework with the biological reality of developmental specification, suggesting that complex competencies can be developmentally installed as dynamical primitives, then refined and contextualized through experience.
This view opens new avenues for both neuroscience and artificial intelligence. For neuroscience, it suggests searching for the developmental "dynamical programming languages" that specify neural dynamics. For AI, it points toward new architectures that combine the robustness of programmed dynamics with the flexibility of learning. The humble megapode chick, taking flight on its first day, may teach us how minds can be born ready to engage with the world's complexity—not through matched filters or simple reflexes, but through developmentally programmed dynamical manifolds that encode the computational primitives of thought itself. Thus, neurodevelopment does not merely resemble machine-code insertion—it is a biologically plausible instantiation of it, endowed with the constraints and flexibilities we have traced.
Afterword
The megapode example came to me while struggling with the question's framing. "Machine code" suggests rigid, deterministic programming—but biology is anything but deterministic. I needed a case so extreme that no amount of hand-waving about "reflexes" or "matched filters" could explain it away.
What strikes me most, revisiting this essay, is how it connects to a broader theme in my research: the idea that structure and dynamics are not opposed but deeply intertwined. The Kim-Bassett framework shows that you can derive connectivity from desired dynamics. Development shows that you can grow connectivity that produces desired dynamics. Learning shows that you can refine connectivity to adapt dynamics to context.
These are not three separate processes but three aspects of a single phenomenon: the continuous negotiation between what a system is built to do and what its environment asks of it. The megapode chick embodies this negotiation in its most compressed form—a lifetime of computational infrastructure installed in the span of embryonic development, ready to unfold into flight.
The speculative connection to diffusion models emerged late in the writing process, almost as an afterthought. Now it feels like the most important idea in the piece. If development is probabilistic programming rather than deterministic compilation, then the noise isn't a bug—it's the feature that enables generalization.