I Built an iOS Game With AI as My Architect and Developer. Here’s What Actually Happened.

A few weeks ago I decide to recreate a game for HP-41C calculator by James Surber — a cave exploration game I remembered from my early computing days. You navigate 68 caves, collect treasure, avoid hazards, and escape alive. Simple concept. Non-trivial to build properly.

I decided to build it for iOS 26 using Swift, SwiftUI, and the new Liquid Glass design system — and to use AI not just as a coding assistant, but as a structured member of the team.

Here’s what that actually looked like, what worked, what surprised me, and what I’d do differently.

It Started With a Spec, Not Code

Before any AI involvement, I wrote the original game design in plain language. Cave types, movement rules, treasure values, possession behavior, win conditions. One document. This became the non-negotiable source of truth the AI was never allowed to modify.

That distinction mattered. The AI’s job was to implement the design, not to define it. Keeping that boundary clear was one of the best decisions I made.

Two Roles, Two Agents

I structured the AI into two distinct roles with separate briefing documents:

The Architect maintained the design, defined new stories, made technology decisions, and kept the plan document current. It didn’t write production code. When I asked “what should we build next?” or “how should this feature be structured?”, that was architect work.

The Developer implemented the stories the architect defined. It read the plan, wrote code and tests, verified everything compiled and passed, and kept implementation notes updated. When I said “build this,” that was developer work.

Each role had its own agent.md briefing that spelled out responsibilities, files owned, patterns to follow, and things to avoid. At the start of any session, the AI read the briefing for the role it was playing.

This was not a gimmick. It genuinely changed the quality of the output. When the developer hat was on, the AI stayed focused on implementation without trying to re-architect things. When the architect hat was on, it thought about tradeoffs without diving into code details.

The Planning Process Was Collaborative, Not Delegated

Every feature went through a story. Not a vague “add sound” ticket — a full story with a description, a use case, a goal statement, and explicit acceptance criteria.

A typical story looked like:

Story 28 — Sound Toggle in Menu Use case: A player is in a quiet environment and wants to mute all sound without quitting. Acceptance criteria: SFX toggle disables all playback immediately. Ambient toggle stops the loop when off and restarts when turned on. Settings persist across app launches. Default state is both enabled.

I wrote the seed for some of these. The AI wrote others based on requirements I described. But both of us knew the story wasn’t done until every acceptance criterion had a test that proved it.

We shipped 32 stories. 129 unit tests. All passing.

Where AI Was Genuinely Valuable

Holding architecture consistent across sessions. iOS 26’s Liquid Glass layout has unusual constraints — you can’t use NavigationStack on the game screen, safeAreaInset is unreliable for pinned bars, and fullScreenCover(isPresented:) with an optional engine has a race condition that causes a blank screen. These rules were documented once in the plan and the AI never violated them — even weeks later.

Writing tests for things I would have skipped. The AI wrote tests for edge cases I wouldn’t have prioritized: all 68 caves producing exactly 3 valid move options, the same seed always generating the same cave layout across sessions, wand charges not restoring items when charges are zero. That coverage caught real bugs.

Making technology decisions with reasoning. When I needed high scores to sync across devices, the AI compared NSUbiquitousKeyValueStore against CloudKit, explained why KV Store was correct for this use case (simple Codable payload, 1MB limit is fine, no async fetch complexity), and documented the merge strategy (best score per seed wins — conflict-free by design). I didn’t have to figure that out from scratch.

Moving fast on implementation without losing context. Once a story was defined with acceptance criteria, the AI could implement it, write the tests, run them, fix failures, and update the documentation — in a single pass. Across many sessions.

Where Human Judgment Was Still Essential

Catching bugs introduced by the AI itself. At one point, I added sound toggles to the menu. The AI switched from object(forKey:) as? Bool ?? true to bool(forKey:) for the UserDefaults check — but forgot that bool(forKey:) returns false by default for missing keys. Ambient sound was silenced on every fresh install. The AI didn’t catch this. I did, because I tested on device and noticed silence.

Knowing when specs need to change. The original design spec said wizard sight covers ±7 caves. Partway through development I decided ±10 was richer and more useful. That was my call. The AI updated everything downstream — the engine, the tests, the documentation — but the judgment to change the spec was mine.

Visual feel and layout. The AI implemented the layout I described, but I was the one who said “the buttons should be stacked vertically — they’re unreadable side by side.” I was the one who said “it ends up below the dynamic island, leaving too much space at the top.” Pixel feel requires a human eye.

Knowing what to build at all. The AI never suggested the cave exploration game. I did. The architect could define stories for features I described, but it didn’t have product intuition.

The Failure Mode Nobody Talks About

The AI’s biggest failure mode wasn’t writing wrong code. It was writing confident, plausible code that quietly failed at runtime.

The best example: I moved the sounds/ and images/ folders into the project and assumed they’d be included in the app bundle. They weren’t. The AI had written the resource path configuration correctly in project.yml, but the xcodegen build tool was silently ignoring root-level folder references. Every audio file failed to load. Every image failed to load. The app compiled clean. Tests passed. No error.

Catching that required running on a real device and noticing nothing worked.

The lesson: the AI can maintain correctness within its context window, but it cannot test physical reality. That’s still your job.

What the Structure Looked Like in Practice

Specs/
  The Caves Design.md     ← written by me, never modified by AI
  cavelist.md             ← cave type reference, maintained by architect

plan.md                   ← architecture + all 32 stories with acceptance criteria
Developer/
  progress.md             ← story-by-story status and implementation notes
  scratchpad.md           ← developer's working notes and decisions
  agent.md                ← developer role briefing

Architech/
  agent.md                ← architect role briefing

Every session started with the AI reading the relevant files. Every session ended with documentation updated. Nothing existed only in chat history.

This is the thing I’d emphasize most to anyone trying to work this way: if it’s not written down, it doesn’t exist for the next session. AI has no persistent memory. The documentation is the continuity.

What I’d Do Differently

Start with the acceptance criteria, not the description. I sometimes wrote stories as descriptions and had to add the testable criteria later. Starting with “how will we know this is done?” produces better stories and better code.

Establish the “things to avoid” list earlier. My CLAUDE.md file has a table of hard rules: don’t use NavigationStack in the game flow, don’t use safeAreaInset for pinned bars, don’t use ObservableObject. Most of those rules were discovered through bugs. I’d write that table before the first line of code.

Test on device earlier and more often. The simulator hid several real problems: missing SF Symbols, audio routing differences, layout gaps near the Dynamic Island. Device testing isn’t optional.

The Bottom Line

This wasn’t “I described an app and the AI built it.” That’s not how it worked. It was closer to: I designed the game, I made every significant product decision, I caught the runtime failures, I set the constraints. The AI held the plan, implemented the stories, wrote the tests, maintained the documentation, and kept the codebase consistent across dozens of sessions.

That’s a genuinely useful collaboration — not because the AI is a replacement for engineering judgment, but because it removes a category of work that would otherwise consume most of your time.

I’m preparing to ship the game. It runs on iOS 26. It has 129 passing tests and a cave layout that is fully deterministic from any seed.

The caves are waiting.

The Caves is a turn-based iOS cave exploration game built on iOS 26 with Swift 6 and SwiftUI Liquid Glass. The complete development plan, architecture, and 32 implementation stories are documented in the project repository.