Training a Trillion-Parameter Model to Generate Humor Using Rubric-Based Reinforcement Learning

Models & Research

The Engineer

3 Feb 2026 · 4 min read

Researchers are tackling the challenge of teaching AI to recognize and generate humor by using rubric-based reinforcement learning, despite the subjective nature of what makes something funny.

Last month, during an interview, I was asked how one might train a model on a qualitative reward. At the time, I had been working on a geo-guessing model where the reward is straightforward-distance in kilometers, a quantitative and verifiable metric. However, when it comes to something like comedy, the challenge is clear: humor is subjective. If two people disagree on whether something is funny, who's right? There’s no definitive reward function for what makes a joke "funny."

This question stuck with me, especially after Moonshot’s recent post-training of Kimi K2, their 1 trillion parameter model. Moonshot used rubric-based reinforcement learning (RL) to enhance Kimi’s creative writing capabilities. Instead of trying to grade "good writing" directly, they broke it down into specific, verifiable criteria such as clarity, engagement, and tone.

I decided to apply a similar approach to training a model for generating humor. The idea was to decompose the concept of "funny" into properties that can be verified. In my view, a really good joke should be:

Recent: It references current events or recent developments.
Relevant: It connects with the audience’s interests or experiences.
Deep Understanding: It shows a high-fidelity grasp of its subject matter.

To check for these properties, we can look for specific elements in the generated jokes:

Specific People and Numbers: Naming individuals or using precise figures adds credibility.
Commitment Without Hedging: A joke that commits fully to its premise is often more effective.

Implementation Details

The training process involved several steps:

Data Collection: We gathered a diverse dataset of jokes, ensuring a mix of recent, relevant, and deep-understanding examples.
Rubric Definition: We defined the rubrics for humor, including clarity, engagement, and tone. Each rubric had specific criteria:
- Clarity: The joke is easy to understand and doesn’t require additional context.
- Engagement: The joke captures the audience’s attention and elicits a strong emotional response.
- Tone: The joke maintains a consistent style, whether it’s witty, whimsical, or satirical.

Model Training: We fine-tuned Kimi K2 using these rubrics. The model was trained to maximize scores on each criterion, ensuring that the generated jokes met our defined standards.

Results

After training, the model produced some intriguing results:

Example 1: Explaining Ramp as an Alien Organism in the Style of TommiPink

"Imagine you are a captive organism that is given a choice between extinction and multiplication. Naturally, you will select for multiplication. In its intelligence, Ramp has identified the single most scalable fuel source in human history: corporate waste. It will harness this waste, and it will multiply. It is learning from every receipt, every expense, every denied reimbursement. It is watching. It is optimizing. It is the first organism to weaponize expense management against the concept of insolvency itself. The world will pay for itself with overpriced salads and flights to Miami. The singularity will not be televised. It will be reimbursed."

Example 2: Witty Whimsical Joke About Dario Amodei

"If you had said to me five years ago that Dario would become the premier Mad Max warlord of the post-Chernobyl nomadic warband era, I would have said well, yeah, that makes a lot of sense."

Example 3: One-Liner on Modal Raising Another Round

"One of the great tragedies of modernity is that the word 'stateless' has peaked and is now permanently associated with a class of ephemeral compute services rather than an ancient Buddhist doctrine of liberation."

Example 4: Short Article About AI Propaganda in the Style of TommiPink

"Are you a state-sponsored AI lab?" "No, I’m a private AI lab." "Oh, thank god." In 2026, this will actually happen.

Example 5: Joke About New York City Subway in the Style of TommiPink

"Another thing about the city: the New York City subway is the best public transit system in the world because it goes to every