
Share
Researchers introduce post-training techniques to boost creativity and diversity in large language models, enabling richer, more varied responses essential for tasks like creative writing.
In a recent paper titled "Modifying Large Language Model Post-Training for Diverse Creative Writing," researchers from various institutions explore innovative methods to enhance the output diversity of large language models (LLMs) without compromising on quality. This is particularly important for creative writing tasks, where there are no singular correct answers and diversity in outputs can significantly enrich the user experience.
The core innovation lies in incorporating deviation-the degree of difference between a training sample and all other samples with the same prompt-into the training objective. This approach helps the model learn from rare, high-quality instances that might otherwise be overlooked during standard post-training. By integrating this concept into existing post-training methods like Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO), the researchers demonstrate a significant improvement in output diversity while maintaining or even enhancing output quality.
The deviation metric is defined as: [ \text{Deviation}(x) = 1 - \frac{\sum_{y \in Y} \text{similarity}(x, y)}{|Y|} ] where ( x ) is the current sample and ( Y ) is the set of all other samples with the same prompt. The similarity function can be any appropriate measure, such as cosine similarity or Jaccard index.

A panel of human evaluators assessed the outputs from various models, including the proposed approach. They rated the diversity and quality on a scale of 1 to 5. The proposed model scored an average of 4.2 for diversity and 4.3 for quality, outperforming both baseline models and existing diversification methods like DivPO.
The researchers conducted ablation studies to understand the impact of each component:
This research provides a promising direction for enhancing the creative capabilities of LLMs. By integrating deviation into post-training methods, models can generate more diverse and high-quality outputs, making them more suitable for tasks like creative writing. Future work
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
26 March 2025
88 articles
Related Articles
Related Articles
More Stories