The Last Twenty Percent, and the Next

Building software with agentic coding tools has changed what we spend our time on. We ship faster and own more lines of code than we used to, often at a pace we only dreamed about a few years ago. The trade-off is the recall and precision that come from building a system yourself, from the ground up.

It is worth sitting with how slippery that word “faster” is. When METR ran a randomized controlled trial in 2025 on experienced developers working in repositories they knew well, the result was not the one anyone expected:

AI makes them slower.

METR, early-2025 AI developer productivity trial

The same developers came away certain it had sped them up, even as the trial clocked them roughly 19% slower. METR has since published an update urging caution about reading too much into any single result, but the gap between felt speed and measured speed is the part that stays with me. Speed is real, but it is lagged. It arrives once you have rebuilt your instincts around the tools, not the moment you adopt them.

It feels like being back in my first years of driving, before smartphones and GPS were everywhere. I could remember how to get anywhere I needed to go. As information got cheaper to pull on the fly, my awareness of the terrain shifted. I stopped reading the scenery for landmarks or gauging distance by feel. Instead I tracked the traffic around me more closely, and I absorbed more of whatever audiobook was playing. The map in my head got thinner. The rest of my attention got richer.

It felt like a fair trade, right up until the day I needed to find my way without the little blue dot. The research is unsentimental about this. A 2020 study in Scientific Reports put numbers to the intuition:

GPS use reduces people’s propensity to gain and memorize spatial information.

Dahmani & Bohbot, Scientific Reports (2020)

The deficit shows up precisely when you are forced to navigate on your own. The capability does not just go quiet. It atrophies, and the atrophy transfers.

We have a name for the same thing in our world now. Some call it comprehension debt: the understanding you quietly skip when you accept code you did not write and could not have written. One engineer, quoted in a piece on how the work has moved upstream, put the cost more sharply: AI-assisted coding, he wrote, had “increased my productivity and the distance between me and the code.” The faster you move, the further you stand from the thing you are responsible for. The bottleneck moves with you. Teams shipping AI-assisted code are watching pull requests balloon while review becomes the real constraint, which is just to say the hard part relocated from writing to reading.

This is where I think the popular framing undersells itself. People talk about the “70% problem”, the stubborn gap between a generated draft and a system you would put in front of users. But the cleaner lens is one we have had for a century. I picked 80% on purpose, because this is just the Pareto principle wearing new clothes, and I am surprised more people have not connected the two. 20% of the effort has always produced 80% of the result. The agents are extraordinary at that 80%: the boilerplate, the scaffolding, the plumbing, the obvious shape of a thing. What they cannot reach is the inverse, the last stretch where 80% of the real effort buys you the final 20%.

They can get 70% of the way there surprisingly quickly, but that final 30% becomes an exercise in diminishing returns.

Addy Osmani, The 70% Problem

That final stretch is the part that makes software production-ready, maintainable, and robust. The novel problems nobody has solved yet. The distributed systems we keep breathing. The easy part has been lifted off the table. What is left is only the hard part.

Take reliability, the promise sitting underneath all of it. We made a quiet bargain when we moved our foundations onto other people’s platforms, and the bill comes due in public now. In October 2025, a DNS race condition in a single AWS region took a long list of services down for the better part of a day and dragged half the consumer internet along with it. GitHub Actions and Copilot have logged their own steady drip of degradations across the same stretch. Anthropic has had repeated outages this year, including a global one, which means the model writing your code can simply stop answering in the middle of your workday.

Here is the part that should bother us. These are missed SLAs and availability numbers we would never have dared to ship a few short years ago. We would have been paged, postmortemed, and quietly embarrassed. Now we absorb them like weather. When you lean your entire stack on a handful of providers, their technical debt becomes your operational crisis, and you find yourself defending uptime you no longer control.

So the questions keep coming, and they do not get easier. Do all the pieces connect the way I expect them to? How do I make the product feel better for a human to use? And the one I think about most: how do I make it work well for an LLM agent to use, now that agents are a real class of user? A practical guide to this emerging discipline, agent experience, draws the line plainly: where a human “can infer context from a sentence or connect steps based on intuition,” agents “need clear instructions about what to call, when, and in what order.” They need structure where people improvise, and designing for them is its own craft that most of us are inventing as we go.

But step back from the day-to-day, and a larger question sits underneath all of it. How do we keep growing the business and the craft without losing the skills that got us here? It is the same tension as the GPS, scaled up to a career. We are standing roughly where Blockbuster stood, watching Netflix walk through the door. Do we defend what has worked, or reinvent ourselves until we find what works for everyone now? Clayton Christensen named this decades ago, the innovator’s dilemma, and the graveyard is full of incumbents who kept perfecting the thing they were already good at while the ground moved under them.

From here, this is where I think it goes. The near future is an explosion of bespoke software riding on ever-evolving models. Software that was never worth building becomes a weekend project, and more of it will ship with little human involvement. But software is the part racing toward commodity. The interesting move is what comes after.

Once the models level off, we start casting them into silicon. We have run this play before. Bitcoin mining began on CPUs in 2009, moved to GPUs, passed briefly through FPGAs, and by 2013 had collapsed onto purpose-built ASICs that made everything before them obsolete. Those ASICs only became worth fabricating once the algorithm underneath, SHA-256, stopped changing. Models are now approaching their own version of that stability. Pretraining gains are flattening, the frontier is rotating toward post-training and inference-time compute, and architecture is becoming something you tune for the cost of serving rather than a fixed decision made up front. That is exactly the moment it starts paying to bake a model into hardware.

It is already underway. Taalas, a Toronto-based startup, prints a specific model’s weights straight into custom silicon. Their first chip hardwires an 8-billion-parameter Llama and is claimed to run inference many times faster than a top Nvidia card at a fraction of the power, on more than $200 million in funding. Etched, Groq, and Cerebras are circling the same idea from different directions. The wager underneath all of it is that flexibility was never free, and that once a model is worth keeping, you stop simulating it and start etching it.

The deeper turn is that the agents are starting to design the hardware. Google’s AlphaChip already floorplans its own TPUs with reinforcement learning, collapsing a job that once took human experts weeks or months into hours, and a lab founded by its creators recently raised at a four-billion-dollar valuation. Right now we are able to drive. Soon the real enrichment moves back down the stack, out of software and into the silicon and the factories that print it. Automating manufacturing, and producing new hardware with agentic development riding alongside, is where I think the next fortunes get made. The interns helped us write the code. The next thing they help us build is the machine.

Sources

The claims and quotes above, with the pieces they came from, so you can check the work:

“AI makes them slower” (a measured ~19% slowdown): METR, Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (Jul 2025), with a later update urging caution (Feb 2026).
“GPS use reduces people’s propensity to gain and memorize spatial information”: Dahmani & Bohbot, Habitual use of GPS negatively impacts spatial memory during self-guided navigation, Scientific Reports (2020).
“Increased my productivity and the distance between me and the code”: Dmytro Gaivoronsky, quoted in Csaba Okrona’s The Work Moved, on the tradeoff of working at a higher altitude over the code.
Review as the new bottleneck, with pull requests ballooning (a 154% jump in PR size): Faros AI, The AI Productivity Paradox Report 2025.
The “70% problem”, where the final 30% is “the part that makes software production-ready, maintainable, and robust”: Addy Osmani, The 70% Problem: Hard Truths About AI-Assisted Coding.
The two agent-experience quotes: a human who “can infer context from a sentence or connect steps based on intuition” and agents that “need clear instructions about what to call, when, and in what order” both come from Nolan Sullivan, Designing Agent Experience (Speakeasy).
Agent experience as an emerging discipline: Art Anthony, What Is Agent Experience (AX)? (Nordic APIs).
The October 2025 AWS outage (a latent DNS race condition in a single region, down for the better part of a day): Amazon Web Services, Summary of the Amazon DynamoDB Service Disruption in the Northern Virginia (US-EAST-1) Region (Oct 2025).
Provider reliability across the same stretch: GitHub’s incident history; and Anthropic’s repeated outages, including the June 2026 global outage that took Claude.ai, the API, and Claude Code down together.
Taalas printing a model into silicon (an 8-billion-parameter Llama hardwired into a chip, on more than $200M in total funding): Taalas raises $169M to develop model-specific AI chips (SiliconANGLE, Feb 2026); the Toronto base is listed on the company’s own careers page.
AlphaChip and the lab its creators founded: Google DeepMind, How AlphaChip transformed computer chip design; and TechCrunch on Ricursive Intelligence’s $335M raise at a $4B valuation, founded by AlphaChip creators Anna Goldie and Azalia Mirhoseini.