Back to Blog
AI Strategy

The AI Capability Overhang: Why Sam Altman Still Codes Like It's 2023

Medianeth AI
February 6, 2026
14 minutes read

The AI Capability Overhang: Why Sam Altman Still Codes Like It's 2023

Last updated: 2026-02-06 | 10 min read

Sam Altman, CEO of OpenAI, made a confession recently. Despite having the best access to the most capable AI tools on the planet, despite his own internal data showing that AI now beats human experts on 3/4 of well-scoped knowledge tasks, he still hasn't really changed how he works.

Altman admitted at a town hall that he still runs his workflow the same way. Even though, he said, "I know that I could be using AI much more than I am."

This is the strange paradox at the center of AI right now. Something fundamental shifted in December 2025. The people closest to the technology are calling it a phase transition, a threshold crossing, a break in the timeline. And yet, most people—including the CEO of OpenAI—haven't caught up.

The capability is there. The adoption is not. And that gap is the biggest opportunity in 2026.

What Happened in December 2025

The shift wasn't one thing. Previously, you could point to a model release and say "this was the change." Not anymore.

This was a convergence of three things that all crossed their respective thresholds in the same compressed window:

1. Models Designed for Sustained Autonomous Work

In the space of just 6 days late last year, three frontier releases landed:

  • Google's Gemini 3 Pro
  • OpenAI's GPT 5.1 Codeex Max (followed by 5.2)
  • Anthropic's Claude Opus 4.5

These aren't incremental improvements. They're explicitly optimized for something previous models couldn't do well: sustained autonomous work over hours or days, not just minutes.

GPT 5.1 and 5.2 are designed for continuous operation more than a day of autonomous work. Claude Opus 4.5 introduced an effort parameter that lets developers dial reasoning up or down. And Anthropic priced it 2/3 cheaper than the previous version.

Context compaction techniques from both OpenAI and Anthropic now let models summarize their own work as sessions extend, maintaining coherence over longer time frames.

The Cursor team has tested these models. Reports are coming back of models being able to do a week of work autonomously and code up to three million lines before coming back for more.

This is a new category of work. It's not the same thing we were seeing in September and October 2025.

2. Orchestration Patterns Went Viral

Better models were necessary, but not sufficient. The real unlock came from orchestration patterns that spread in late December.

Ralph was the first. Jeffrey Huntley, an open-source developer in rural Australia, got frustrated with agentic coding's central limitation: models keep stopping to ask permission or report progress incorrectly, requiring human attention every time.

So he wrote a bash script that runs Claude Code in a loop, using git commits and files as memory between iterations. When the context window fills up, a fresh agent picks up where the last one left off.

The technique is embarrassingly simple for an engineer. And while the AI industry was building elaborate multi-agent frameworks, Jeffrey discovered that you can just be really persistent. Repeat the goal. Wipe the context window. Keep going against the task.

A loop that keeps running until tests pass is more reliable than very carefully choreographed agent handoffs. VentureBeat called it the biggest name in AI right now.

Gas Town followed on January 1st. While Ralph is minimalist, Gas Town is unabashedly maximalist—a workspace manager that spawns and coordinates dozens of AI agents working in parallel.

Both patterns share the same core insight: the bottleneck has shifted. Your productive capacity is now limited only by your attention span and your ability to scope tasks well.

Then in late January, Anthropic shipped Claude Code's task system. Suddenly even Ralph looked like a clever workaround. A simple task system—just a to-do list—was what it took to coordinate agents across complex multi-agent problems.

The task system changes the architecture. Each agent focuses on just one thing. When a task completes, anything blocked by it automatically unblocks and the next wave of agents kicks off. The key innovation: dependencies are structural, not cognitive.

You externalize dependencies so the graph doesn't forget or drift. You never need to reexplain to the agent because it never got stored in memory to begin with.

3. Proof Points That Autonomous Work Actually Works

Cursor has been running experiments using AI agents to build:

  • A browser (3 million lines of code)
  • A Windows emulator
  • An Excel clone
  • A Java language server

These are big codebases—half a million to one and a half million lines—all being generated autonomously. The point isn't that Cursor will immediately ship Excel and compete with Windows. The point is that autonomous AI agents can build complex software.

At Davos, Dario Amadei described what he called the self-acceleration loop. Engineers at Anthropic tell him, "I don't write code anymore. I let the model write the code."

And this is important to understand: fundamentally, they're accelerating the production of the next AI systems using AI. AI has entered a self-acceleration loop.

OpenAI is even slowing hiring because of this. Altman announced plans to dramatically slow down hiring because of the capabilities and span he sees from existing engineers. The expectation for new hires is sky-high because of what AI tooling can give.

The Capability Overhang

Here's the paradox: if models are beating human experts on 3/4 of scoped knowledge tasks and doing it faster, why hasn't work transformed more? Why is the CEO of OpenAI still running his workflow in much the same way?

This is a capability overhang—capability has jumped way ahead, and humans don't change that fast. Adoption hasn't kept up.

Most knowledge workers are still using AI at a ChatGPT 3.5 or ChatGPT 4 level. Ask a question, get an answer, move on. Summarize this document. Draft this email.

They're not running AI agent loops overnight. They're not assigning hour-long tasks to their AI co-workers. They're not managing fleets of parallel workers across their backlog.

The overhang explains why the discourse feels so disconnected. Why it feels like you have constant jet lag if you're living at the edge of capability and going back to look at how work looks today.

Someone running task loops in Anthropic or Ralph is living in a different technical reality than someone who queries ChatGPT four or five times a day—even though they have daily access to the exact same underlying tools.

One person is seeing acceleration, everything happening all at once. The other is seeing incremental improvement and wondering why AI is such a big deal.

This creates a very temporary arbitrage. If you figure out how to use these models before your competitors do—if you can get your teams to do that—you have a massive edge.

If you're waiting for AI to get smart enough before changing your workflow, you're already behind. And you're showing that you're not using AI well.

Closing the Overhang: What Power Users Do Differently

What does closing this overhang look like? What are specific skills that power users describe? A few patterns emerge.

1. Assign Tasks, Don't Ask Questions

Power users assign tasks. They don't ask questions. When you treat AI as an oracle, you're in the wrong mental model.

The shift is toward what I'd call declarative specification. Describe the end state you want. Provide the success criteria and let the system figure out how to get there.

This is beyond prompting—it's specifying. It looks a lot more like writing a spec than chatting.

2. Accept Imperfections and Iterate

Ralph works because it embraces failure. The AI will produce broken code, so we make it retry until it fixes it. It never gets tired and it keeps retrying.

You go make coffee or lunch, come back, and it's done. This requires abandoning the expectation that AI should get things right the first time. It often won't, and it doesn't matter because it doesn't get tired.

3. Invest in Specification and Reviews, Not Implementation

The work is shifting. It's less time writing code. It's much more time defining what you want. It's much more time evaluating whether you got there.

Most engineers have spent years developing their intuitions around implementation, and those are now not super useful. The new skill is:

  • Describing the system precisely enough that AI can build it
  • Writing tests that capture real success criteria
  • Reviewing AI-generated code for subtle conceptual errors

The errors get very interesting. Maggie Appleton, a designer who's been analyzing these tools, puts it well: "When agents write the code, design becomes a bottleneck."

The questions that slow you down are less about code syntax and more about architecture, user experience, composability. What should this feel like? Do we have the right abstraction here?

These are decisions agents cannot make for you. They require your context, your taste, your vision.

4. Use Multiple Agents in Parallel

This is transformative because every single agent stacks your capability. Some developers are going from a few PRs per day to dozens. The constraint moves from coding to coordination.

How can you scope your tasks? How can you review outputs? Fundamentally, even if it's tricky and you have to figure out what review looks like in this new world, this is where we're all going.

The multiplicative effect of agents pointing in the right direction—just stacking up on top of each other and solving multiple tasks at once—is why this matters.

5. Let Agents Run All the Time

Ralph was designed for overnight sessions. Define the work, start the loop, and go to bed. This is a new engineer's day.

This only works with proper guardrails, but when it works, you're getting productive hours around the clock from time that was previously idle.

6. Actually Try It

This sounds incredibly obvious, but it's the main barrier. Most people haven't run an agent loop for more than a couple of minutes.

The models improved a lot in December. If you have not revisited your AI workflow since, you're probably operating on stale assumptions about what is actually possible.

The Shape of Work Is Changing

Andre Karpathy noted something important about the errors current models make. They're not simple syntax errors. A hasty junior developer would make very similar conceptual errors to the quality of errors the models are making now.

This is a good thing. It means the models are getting stronger—getting to the level of a junior developer—because they're making wrong assumptions, running without checking, failing to surface trade-offs.

These are supervision problems, not capability problems. The solution isn't to do the work yourself. It's to get better at your management skills.

You have to watch the agents, but if you do, you can catch moments when you've implemented a thousand lines to solve a problem that could have taken a hundred.

This is what Sam means when he talks about engineering changing so quickly. You're not spending time typing. You're not debugging. You're spending most of your time as a manager.

Yes, the ability to code manually is going to start atrophy as a skill set because you're just not using it as much. Generation and discrimination are very different skill sets, and you're using those every day.

This isn't a failure and it's not something to be embarrassed about. It's a reallocation of scarce human cognitive resources toward a skill that has higher leverage.

How Close Should Developers Stay to the Code?

There are widely differing opinions by senior developers here, and I think the right answer is a function of what you're building.

If your risk tolerance for a mistake is very low, you're going to have to watch the agent coding in an IDE and write your eval super carefully if you want to leave it alone.

If you're willing to experiment, if you're willing to iterate, if it's a greenfield project and it's a prototype, you really can step back.

This calls for another level of abstraction from engineering. We need to think as technical leaders about where engineers should stand in relation to the code based on the risk profile of that codebase itself.

That becomes something we can intentionally set as a policy for teams. Hey, this is production. This is not something we can mess up. This is our expectation for how you code with agents against this codebase.

Otherwise it's just going to be a free-for-all and everyone will make their own rules, and you're going to get all sorts of issues in production.

Where This Leaves Us

The December convergence of models, orchestration patterns, and tools like Ralph established a new baseline. Models can now maintain coherence for days. Orchestration patterns exist that manage fleets of agents. The economics absolutely work.

This doesn't mean you have to use Ralph specifically. The point is that the problems these tools wrestle with are fundamentally different and point to a very rapid change in how we work—particularly in technical domains.

If you're wrestling with context persistence and parallel coordination and those problems suddenly get an order of magnitude easier because of how we handle tasks, workflow, and more capable models designed for long-running work patterns—well, suddenly it's like the ceiling lifts.

Everything gets an order of magnitude easier when you're building big stuff.

And the overhang that generates when this happens all at once is real. If Dario is right and AI can handle end-to-end software engineering tasks within 6 to 12 months, then the gap between what we're doing today and full automation has never felt larger.

If the overhang feels big after the last few weeks, it's only going to get bigger because AI is continuing to accelerate. Look at how quickly Anthropic turned around and shipped co-work just 10 days after its task system. Look at how quickly they turned around and shipped their version of Ralph that was more natively integrated.

The people who are building this moment sometimes aren't fully into it yet. They're still moving their furniture into the new AI way of working. Sam Altman admitted that about himself.

But the future is here now. And if you can get through the overhang and start to accelerate into a world where you're asking the AI to do big tasks for you—moving from prompting with questions to defining specifications, running multi-agent patterns—this is going to fundamentally change your day.

The future belongs to people who know how to handle that speed responsibly and be thoughtful with it.

What This Means for Developers and Teams

For developers reading this, here's the actionable takeaway:

  1. Revisit your AI workflow. If you haven't tried running an agent loop for more than 10 minutes since December 2025, your assumptions about what's possible are stale.

  2. Shift from questions to specs. Stop treating AI as an oracle. Treat it as a contractor that needs clear specifications, success criteria, and the freedom to figure out implementation.

  3. Embrace iteration over perfection. Ralph works because it doesn't expect the AI to get it right the first time. Build guardrails, not perfectionism.

  4. Invest in reviews, not implementation. Your value now comes from defining what needs to be built and evaluating what got built. Implementation is increasingly commodity.

  5. Use multiple agents in parallel. Every agent you can run multiplies your capability. The constraint isn't coding—it's coordination and task scoping.

  6. Let agents run overnight. Define the work, start the loop, and go to sleep. That's a new engineer's day in 2026.

For engineering leaders:

  1. Set policies on risk profiles. Not all code deserves the same level of oversight. Define where your team should be hands-on vs. hands-off based on risk.

  2. Train on supervision skills. The bottleneck is shifting from technical execution to management and discrimination. Invest in those skills.

  3. Invest in specification practices. Clear specs are the new superpower. Build tools and practices for writing them.

  4. Don't wait for "smart enough." If you're waiting for AI to get smarter before changing your workflow, you're behind. The tools are already there.

The overhang is real. The capability gap is widening. And the arbitrage opportunity won't last forever. The developers and teams who figure this out first are going to build a massive lead.

Everyone else is going to be left wondering why Sam Altman is still coding like it's 2023.


Generated by Medianeth AI - February 6, 2026

Let's Build Something Great Together!

Ready to make your online presence shine? I'd love to chat about your project and how we can bring your ideas to life.

Free Consultation