The Brief
I started with a feeling I missed from old point-and-click adventures: walking into a strange room, clicking on everything and slowly learning how the world thinks.
The project became The Bureau of Obsolete Futures, a browser game about a junior archivist trying to decommission civic technology where old machines still have paperwork to finish.
The first room is Archive Intake. The player has a simple goal: get into the staff only archive. The room makes that goal awkward through a badge reader, a missing stamp, a printer ribbon, a vending machine and paperwork with a grudge.
Almost All Of The Execution Was Agent Led
This was not a normal request a code snippet build. A single Codex goal carried the work across 4 days of research, implementation, documentation, validation and promotion loops. My job was to keep the brief sharp, make the judgement calls and stop a playable slice being mistaken for proof from real players.
Agent led work
Research, implementation, documentation, validation checks and trailer workflow stayed inside one long running Codex goal.
Human control points
I set the goal, made taste calls, checked the claims and decided what still needed real player evidence.
Working pattern
The agent kept moving through research and validation loops over 4 days instead of resetting at every prompt.
The Goal Became The Filter
The useful shift was treating the goal as a decision filter, not a task list. Every tempting expansion had to pass the same test: does this help prove the first room, or does it only make the project look bigger?
What would prove this game idea is worth expanding?
What is the smallest playable slice that can test tone, puzzle fairness, interface clarity and art direction?
What claims are still blocked until real players create evidence?
The Trailer
The trailer uses real gameplay capture from the current vertical slice. It shows the room, the protagonist, the vending machine beat and the archive access payoff without pretending the game is bigger than one tested room.
It also became a validation surface. If the UI was unclear, the trailer would show it. If the puzzle state had no payoff, the trailer would show that too. The public artefact had to stay tied to the actual build.
The Risk Was Sprawl
A nostalgic game carries a few traps. It can copy the surface of its references. It can turn into an unfinished engine. It can pile jokes on top of weak puzzle logic. It can look polished enough to fool the builder before anyone else has played it.
The useful goal was deliberately narrow: build one playable browser room that tests tone, puzzle fairness, interaction clarity and art direction before planning anything larger.
No second room until the first room earns expansion.
No broad player claims before human validation exists.
No public playtest link while access policy stays private.
No nostalgia shortcut that only works because of reference material.
Turning Nostalgia Into Rules
The references mattered, but as production discipline rather than surface material. The project needed safe experimentation, dense authored interactions, readable staging and absurd logic that feels fair after the answer lands.
Codex helped turn those preferences into project rules: influence guardrails, tone notes, puzzle contracts, accessibility checks, deployment notes and public positioning. That sounds excessive for one room. It was the opposite. It kept the work from drifting into vibes.
Archive Intake Became The Test
The first room tests the whole game promise through one compact puzzle chain. The route starts with a locked archive and ends with a payoff that only works if the Bureau logic has made sense along the way.
Staff only archive door
Badge reader refuses the unstamped badge
Printer needs ribbon before paperwork moves
Vending machine argues about category logic
Stamped paperwork opens the path forward
Screenshots From The Slice

Hotspots visible
The prototype shows the browser UI, labelled hotspots, inventory and objective panel in one view.

Vending machine logic
The vending machine puzzle tests whether absurd Bureau logic still reads as fair interaction design.

Archive access payoff
The first puzzle arc resolves with archive access granted and updated case notes.
What The AI Agent Carried Across 4 Days
This was the unusual bit. Almost all of the execution ran through one long running Codex goal over 4 days: research, build work, documentation, validation checks and the trailer workflow. Not a prompt, an answer and a handover. A working loop.
Codex kept the notes, game logic, copy, validation scripts, deployment decisions and trailer plan in the same project context. The autonomous part mattered because the agent could keep moving across disciplines. The human part still mattered because I set the goal, made the taste calls, checked the claims and decided what needed human evidence.
The visuals shown here are prototype/reference art from the AI assisted build, not final production art. Good enough to test the room. Not a reason to pretend the game is finished.
Autonomous Loop
Codex planned, edited, checked and reported back across one long running goal instead of waiting for a separate prompt for every task.
Research
Classic adventure games became influence guardrails, not content to copy.
Design
Archive Intake kept the proof small enough to test tone, puzzle fairness and interface clarity.
Build
Phaser and TypeScript handled the first room interactions, inventory, state flags and dialogue.
Validate
Checks covered docs, puzzle contracts, accessibility, deployment readiness and open evidence gaps.
Share
Real gameplay capture and Remotion turned the prototype into a trailer that stays honest about scope.
The Agentic Part
Codex moved autonomously across research, design, TypeScript, Phaser, accessibility checks, deployment hygiene, documentation, capture scripts and Remotion trailer work without losing the thread.
The Human Part
The work still needed taste, constraints and judgement. The AI moved fast. The human job was to keep the goal sharp and stop a working prototype being mistaken for player proof.
What This Proves
The game has a playable slice with defined puzzle logic, authored wrong turns, stateful interactions, a working browser build and a repeatable trailer workflow.
What It Does Not Prove Yet
Implemented is not the same as proven. Automated checks do not prove puzzle fairness, humour or comprehension. The next decision needs evidence from real people, not another internal pass.
The Useful Lesson
The interesting part is not that AI helped write code. It is that a long running Codex goal let an AI agent carry most of the execution across research, design, implementation, validation, deployment and promotion without losing the thread. The human job stayed the same: judgement, taste, boundaries and evidence.