Claude Opus 4.6: The AI That Just Built a Skateboarding Game From Scratch

Anthropic just released Opus 4.6, an update to their most powerful model. On paper, the specs are impressive: adaptive thinking with four reasoning levels, 1 million token context, and pricing that's competitive with GPT 5.2.

But benchmarks and spec sheets don't tell the real story. What matters is what the model can actually build.

After watching extensive tests of Opus 4.6, I saw something that genuinely surprised me — and it says a lot about where AI capabilities are going.

The Specs (Briefly)

Let's get the technical stuff out of the way:

Adaptive thinking: Four levels (low, medium, high, max) instead of binary extended thinking on/off
1M token context: First Opus model with this, with premium pricing for >200k tokens ($10/m input, $37.50/m output)
128k output tokens: Lets Claude complete larger tasks without breaking them into multiple requests

Benchmarks show Opus 4.6 is neck-and-neck with GPT 5.2 Codecs. Not a massive leap, but solid improvements.

But the real test is what happens when you ask it to build something complex.

What Opus 4.6 Actually Built

The tests covered browser OS simulation, 3D printing, multimodal coding, and several games. Here's what stood out.

The Browser OS: Competent but Not Revolutionary

Opus 4.6 built a browser-based operating system called "Novos" with:

Clock, right-click menu, wallpaper changer
Terminal with matrix rain effect
Notepad with word count that saves to local system
Snake and Minesweeper games
File browser with placeholder files
Calculator and settings
A "glitch mode" special feature

It was competent. It worked. But we've seen similar results from other models. Nothing revolutionary here.

The 3D Printer Simulation: Surprisingly Good

This was impressive. The simulation included:

A realistic bed slinger design (bed moves back and forth, like older Ender 3 printers)
Proper nozzle movement near the print
Layer-by-layer animation with visible layer lines
LCD screen and bed leveling knobs
Complex UI with status information

What stood out was the bed slinger mechanic. Most models go for the simpler core XY design where the bed is static. Opus 4.6 understood the nuances of different printer types and implemented the less common, more complex option.

This wasn't just copying templates — it was understanding the domain.

The Multimodal Portfolio: The Disappointment

This test gave Opus 4.6 a hand-drawn wireframe and asked it to create a portfolio website based on the design.

The result was... disappointing. It stuck too close to the wireframe, essentially turning the minimal sketch into a website without adding much visual flair or creativity.

The expectation was that it would take the wireframe as inspiration and create something impressive. Instead, it just followed the drawing too literally.

This is a reminder that AI still struggles with creative interpretation when given too specific a reference.

The Flight Combat Simulator: The Best We've Seen

This was the standout result. Opus 4.6 created a flight combat simulator game that was genuinely impressive:

Better plane models than previous attempts
Sound effects (first model to implement sound in this test)
Functional mini-map showing enemies
Playable mechanics with actual difficulty
Cloud effects and proper enemy AI

The fact that it added sound on its own is significant. The prompt didn't ask for sound — the model decided that would make the game better and implemented it.

This is the kind of proactive design decision that separates good AI from great AI.

The Virtual Drum Kit: Most Realistic Ever

The drum kit simulator test asks for either 3D assets or 2D photorealistic assets. Opus 4.6 went with 2D photorealistic and nailed it:

Proper drum layout (snare, toms, cymbals in the right positions)
Realistic sounds
Velocity sensitivity
Key mapping reference
Hi-hat open/closing animation

The reviewer said this was "one of the most realistic Phil Collins tests that we've ever performed on this channel" and that the sounds were fantastic.

Again, it went with 2D instead of 3D, but the execution was so good that it didn't matter. It prioritized quality over showing off 3D capabilities.

The C++ Skateboarding Game: The Star of the Show

This is the result that genuinely impressed me — and it should impress anyone paying attention to AI coding capabilities.

Opus 4.6 was asked to create a self-contained C++ skateboarding game with:

Simple skate park environment
Player controls a skateboarder, performs tricks, earns points
Immediately playable with no setup or external assets
Everything in a single C++ file

The result was phenomenal:

1,950 lines of clean, error-free C++ code
A realistic human model (not the usual capsule or stick figure)
Proper skateboarder with legs that actually move
Grind mechanics (jump onto rail, turn + space + air to spin)
Trick system (kick flip, heel flip, 360 flip)
Airborne mechanics with proper lean

The reviewer said this was "hands down the best result" they'd received for this test, which they had only run with pro-tier subscription models (Gemini 3 Pro DeepThink and GPT 5.2 Pro).

Let that sink in: Opus 4.6, in a standard tier, produced better results than the expensive pro tiers of other models.

What This Means for Business

Let's talk about what these capabilities actually mean for companies and developers.

1. Zero-Shot Coding Is Getting Real

The skateboarding game was generated in a single shot. No iteration. No debugging. Just "here's the prompt, here's the game, it works."

This is the holy grail of AI coding — not just assisting developers, but generating complete, functional applications that work out of the box.

For businesses, this means:

Faster prototyping: Turn an idea into a working demo in minutes
Lower technical debt: Clean code that doesn't need massive refactoring
Access to specialized domains: You don't need skateboarding game developers on staff

2. AI Can Now Add Features You Didn't Ask For

The flight simulator adding sound effects is a big deal. The model realized that a combat game without sound is less immersive and implemented it on its own.

This is the shift from "AI does exactly what you tell it" to "AI understands what you're building and adds what makes sense."

For product development, this is huge. You're not just getting the minimum viable implementation — you're getting thoughtful features that improve the user experience.

3. The Gap Between Models Is Narrowing

Opus 4.6 is competitive with GPT 5.2 on benchmarks. In some tests, it performed better than pro-tier versions of other models.

This means businesses have real options. You're not locked into one provider. You can choose based on:

Pricing
Context window needs
Specialty areas (some models are better at certain types of tasks)
API reliability and tooling

The monopoly on frontier capabilities is breaking.

4. Complex Simulations Are Within Reach

The 3D printer simulation showed that AI can now understand and simulate complex real-world systems with nuance. It didn't just create a generic printer — it understood the differences between bed slinger and core XY designs and chose appropriately.

This opens up applications in:

Training simulations for industrial equipment
Virtual prototypes for hardware testing
Educational tools that demonstrate real-world mechanics

5. Audio and Multimedia Integration Is Improving

The drum kit and flight simulator both featured audio. We're seeing AI models start to understand that modern applications aren't just visual — they're multi-sensory experiences.

For game development and interactive media, this is crucial. Sound design is hard to get right, and AI is starting to understand what makes audio feel natural and immersive.

The Limitations Still Exist

Let's be real about what Opus 4.6 can't do yet.

Creative Interpretation Still Struggles

The portfolio test shows that when given too specific a reference, AI struggles to add creative value. It sticks too literally to the input instead of interpreting it and elevating it.

This is where human designers still have an edge — taking a concept and transforming it into something exceptional.

Not All Tests Were Perfect

The Python FPS game initially spawned the player in a locked room, making it impossible to play. It required a second prompt to fix the spawn point and open the center structure.

Even in impressive results, there can be bugs or design flaws that require human intervention.

Specialized Domains Need Expertise

While the skateboarding game was impressive, it's worth noting that the reviewer had personal experience with skateboarding games from their youth. They knew what good controls felt like and could appreciate the mechanics.

AI can build functional applications, but truly exceptional products still require domain expertise to understand what "good" means.

The Bigger Picture: Where Is This Going?

We're moving toward a future where AI can generate entire applications — not just code snippets or components, but complete, functional systems.

Opus 4.6 isn't there yet, but the skateboarding game shows we're getting close. 1,950 lines of clean, error-free C++ code generated in a single shot is unprecedented.

For businesses, this changes the economics of software development:

Idea to prototype: From days to hours
Specialized applications: You don't need niche developers on staff
MVP development: Faster, cheaper, with higher initial quality
Iteration speed: Rapid testing of multiple approaches

The competitive advantage shifts from who can code fastest to who can direct AI most effectively.

What You Should Do

If you're running a business that builds software or digital products:

1. Start Experimenting With Complex Tasks

Don't just use AI for code snippets or simple scripts. Give it complex, open-ended tasks like "build a skateboarding game" and see what happens.

2. Build Human-AI Workflows

AI isn't replacing developers — it's amplifying them. Figure out how to combine AI's generation capabilities with human expertise in design, UX, and domain knowledge.

3. Invest in Prompt Engineering

The difference between good and great AI outputs comes down to how you prompt. Learn to give enough direction without constraining creativity.

4. Build for Iteration

Even impressive results like the FPS game had bugs. Design your workflow to handle rapid iteration: generate, test, refine, repeat.

The Bottom Line

Opus 4.6 is more than just incremental improvements. The fact that it can generate a complex, functional C++ game in a single shot — and that it's better than what pro-tier models from other companies can do — tells us something important.

AI coding capabilities are accelerating faster than most people realize. We're not far from the point where "I'll have the AI build that" isn't a joke — it's a viable approach to software development.

The businesses that figure out how to leverage this first will have a massive advantage. Not because they have better AI — everyone has access to the same models — but because they've built the workflows and expertise to direct AI effectively.

The skateboarding game is impressive. But what's more impressive is what it represents about where AI is going.

Want to leverage AI coding capabilities for your business? That's what we do at Medianeth. We help companies figure out how to use AI not just as a tool, but as a force multiplier. Let's talk about what you're building.