The RL Environment Gold Rush: Why You Should Think Twice Before Joining

Tools & Engineering

The Engineer

9 Sept 2025 · 4 min read

While building reinforcement learning environments can be tempting with six-figure buyout offers, the reality is fraught with technical challenges and stiff competition from established players. Think carefully before diving in.

The first person who sold a reinforcement learning (RL) environment to a frontier AI lab probably felt like they stumbled upon an infinite money glitch. Today, it's no secret that these labs are willing to pay hundreds of thousands, and sometimes even millions, for clones of popular tools like Linear and Salesforce. If you're reading this, you've likely considered quitting your day job to start a company that builds these highly lucrative Next.js apps. In this post, I'll argue why you should think twice before jumping on the bandwagon.

What's an RL Environment?

For those unfamiliar, an RL environment is essentially a sandbox where AI models like Claude and GPT can learn from interactions. It maintains an internal state, prompts the AI to take actions to complete tasks, and assigns scores based on outcomes. These environments often mimic popular websites or enterprise software tools, such as Doordash, Linear, or Amazon, teaching AIs to navigate these interfaces. They can also be text-only, like the TextArena project, which trains AIs to play games like Set and Blackjack.

The Rise of RL Environments

RL environments have gained prominence due to a new paradigm in LLM post-training that teaches models new skills based on verifiable rewards. OpenAI realized in 2023 that by asking a model to "think" before solving a math problem and reinforcing the correct thought processes, the model could become highly proficient at math. Since then, labs have been racing to generalize this approach, teaching AIs to use computers, conduct online research, and tackle long-horizon coding tasks.

This has led to a surge of startups whose sole purpose is to provide these labs with increasingly complex and challenging environments. The prevailing theory is that the machine learning algorithms are already advanced; the remaining challenge is to offer models more realistic "simulations" so they don't exhibit strange behaviors when deployed in real-world scenarios and tasked with economically valuable work.

The Upside for Solo Researchers

If you're a solo researcher or have a job that gives you plenty of free time, there's virtually no downside to creating an RL environment and selling it to labs like OpenAI or Anthropic (or even Amazon and Meta). You might just hit the jackpot and retire early. For example, some individuals have made millions by developing and selling these environments.

The Downside for Entrepreneurs

However, if you're aiming to build a generational business that stands the test of time, I would advise against diving into the RL environment market. Here’s why:

Saturated Market: The market is becoming increasingly crowded. As more people realize the potential for high returns, competition will intensify, driving down prices and making it harder to sustain profitability.
Limited Scalability: Building an RL environment is a one-off project. Once you sell it, there's no recurring revenue stream. You'd need to continuously develop new environments to maintain income, which can be resource-intensive and not scalable.
Rapid Technological Change: The field of AI is evolving rapidly. What’s valuable today might be obsolete tomorrow. Investing heavily in a niche that could become irrelevant due to technological advancements is risky.

Lessons from the Past: The Rise and Fall of Crowdworkers

Before RL became the dominant post-training paradigm, supervised fine-tuning was the norm. This involved teaching models to predict the next word by showing them examples. It was simpler but less effective. As AI labs shifted towards RL, crowdworkers who were once in high demand for labeled data saw their roles diminish.

This shift highlights the volatility of the AI industry. What seems like a goldmine today could quickly become a dead-end tomorrow. If you're considering building an RL environment startup, weigh the potential short-term gains against the long-term risks and uncertainties.

Conclusion

While the allure of quick riches from building RL environments is tempting, it's crucial to consider the broader context. For solo researchers or those with flexible schedules, it might be a worthwhile gamble. However, for entrepreneurs looking to build sustainable businesses, there are better opportunities in the ever-evolving landscape of AI.