Building a Million-Line Codebase with Codex: An Agent-First Experiment

Tools & Engineering

The Engineer

12 Feb 2026 · 4 min read

In a groundbreaking experiment, OpenAI's team used Codex AI to build an extensive million-line codebase without manual coding, slashing development time and sparking new possibilities for agent-driven software creation.

By Ryan Lopopolo, Member of the Technical Staff

Over the past five months, our team at OpenAI has been running an ambitious experiment: building and shipping an internal beta of a software product without writing a single line of code manually. This means every bit of application logic, tests, CI configuration, documentation, observability, and internal tooling was generated by Codex. The result? We estimate we built this in about 1/10th the time it would have taken to write the code by hand.

Humans Steer, Agents Execute

Our primary goal was to increase engineering velocity by orders of magnitude. To achieve this, we had to rethink the role of engineers. Instead of writing code directly, our team focused on designing environments, specifying intent, and building feedback loops that allow Codex agents to do reliable work. This shift in focus allowed us to ship a product with a million lines of code in just weeks.

Starting from Scratch

The first commit to an empty repository landed in late August 2025. The initial scaffold-repository structure, CI configuration, formatting rules, package manager setup, and application framework-was generated by Codex CLI using GPT-5. Even the AGENTS.md file that directs agents on how to work in the repository was written by Codex.

Repository Structure: Codex created a well-organized directory layout.
CI Configuration: Continuous integration (CI) pipelines were set up to ensure automated testing and deployment.
Formatting Rules: Consistent code formatting rules were established to maintain readability.
Package Manager Setup: Dependencies were managed efficiently from the start.
Application Framework: A robust framework was generated to support rapid development.

There was no pre-existing human-written code to anchor the system. From the beginning, the repository was shaped by the agent.

Scaling Up

Five months later, the repository contains on the order of a million lines of code across application logic, infrastructure, tooling, documentation, and internal developer utilities. Over this period, roughly 1,500 pull requests (PRs) have been opened and merged with a small team of just three engineers driving Codex. This translates to an average throughput of 3.5 PRs per engineer per day. Surprisingly, the throughput has increased as the team has grown to now seven engineers.

Application Logic: Core functionality of the product.
Infrastructure: Cloud and server configurations.
Tooling: Custom scripts and utilities for development and deployment.
Documentation: Comprehensive guides and reference materials.
Developer Utilities: Internal tools to enhance productivity.

Importantly, this wasn’t output for output’s sake: the product has been used by hundreds of users internally, including daily internal power users.

Redefining the Role of the Engineer

The lack of hands-on human coding introduced a different kind of engineering work, focused on systems and scaffolding. Here are some key takeaways:

Designing Environments: Setting up the development environment to enable Codex agents to work efficiently.
Specifying Intent: Clearly defining what needs to be built, ensuring the agents understand the requirements.
Building Feedback Loops: Creating mechanisms to monitor and improve the quality of code generated by agents.
No Manually-Written Code: This became a core philosophy for the team. Humans never directly contributed any code.

What We Learned

Agent Reliability: Codex agents can reliably generate high-quality code, but they need well-defined environments and clear intent to perform optimally.
Feedback Loops: Continuous feedback is crucial for maintaining and improving code quality.
Scalability: The agent-first approach scales more effectively than traditional methods, allowing smaller teams to achieve more in less time.
Human Focus: Engineers can focus on higher-level tasks like system design and problem-solving, rather than low-level coding.

Conclusion

This experiment has shown that an agent-first approach can significantly accelerate software development while maintaining or even improving code quality. By redefining the role of engineers to focus on systems and intent, we can leverage the power of AI to build complex products more efficiently.