
Share
Researchers are tackling the challenge of teaching AI to recognize and generate humor by using rubric-based reinforcement learning, despite the subjective nature of what makes something funny.
Last month, during an interview, I was asked how one might train a model on a qualitative reward. At the time, I had been working on a geo-guessing model where the reward is straightforward-distance in kilometers, a quantitative and verifiable metric. However, when it comes to something like comedy, the challenge is clear: humor is subjective. If two people disagree on whether something is funny, who's right? There’s no definitive reward function for what makes a joke "funny."
This question stuck with me, especially after Moonshot’s recent post-training of Kimi K2, their 1 trillion parameter model. Moonshot used rubric-based reinforcement learning (RL) to enhance Kimi’s creative writing capabilities. Instead of trying to grade "good writing" directly, they broke it down into specific, verifiable criteria such as clarity, engagement, and tone.
I decided to apply a similar approach to training a model for generating humor. The idea was to decompose the concept of "funny" into properties that can be verified. In my view, a really good joke should be:
To check for these properties, we can look for specific elements in the generated jokes:
The training process involved several steps:

After training, the model produced some intriguing results:
"Imagine you are a captive organism that is given a choice between extinction and multiplication. Naturally, you will select for multiplication. In its intelligence, Ramp has identified the single most scalable fuel source in human history: corporate waste. It will harness this waste, and it will multiply. It is learning from every receipt, every expense, every denied reimbursement. It is watching. It is optimizing. It is the first organism to weaponize expense management against the concept of insolvency itself. The world will pay for itself with overpriced salads and flights to Miami. The singularity will not be televised. It will be reimbursed."
"If you had said to me five years ago that Dario would become the premier Mad Max warlord of the post-Chernobyl nomadic warband era, I would have said well, yeah, that makes a lot of sense."
"One of the great tragedies of modernity is that the word 'stateless' has peaked and is now permanently associated with a class of ephemeral compute services rather than an ancient Buddhist doctrine of liberation."
"Are you a state-sponsored AI lab?" "No, I’m a private AI lab." "Oh, thank god." In 2026, this will actually happen.
"Another thing about the city: the New York City subway is the best public transit system in the world because it goes to every
Tags
Original Sources
↗ https://jokegen.sdan.io/blog?utm_source=tldrai
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
3 February 2026
88 articles
Related Articles
Related Articles
More Stories