Enhancing Creative Writing Diversity in LLMs with Post-Training Techniques

Models & Research

The Engineer

26 Mar 2025 · 3 min read

Researchers introduce post-training techniques to boost creativity and diversity in large language models, enabling richer, more varied responses essential for tasks like creative writing.

In a recent paper titled "Modifying Large Language Model Post-Training for Diverse Creative Writing," researchers from various institutions explore innovative methods to enhance the output diversity of large language models (LLMs) without compromising on quality. This is particularly important for creative writing tasks, where there are no singular correct answers and diversity in outputs can significantly enrich the user experience.

What Changed Technically?

The core innovation lies in incorporating deviation-the degree of difference between a training sample and all other samples with the same prompt-into the training objective. This approach helps the model learn from rare, high-quality instances that might otherwise be overlooked during standard post-training. By integrating this concept into existing post-training methods like Direct Preference Optimization (DPO) and Odds Ratio Preference Optimization (ORPO), the researchers demonstrate a significant improvement in output diversity while maintaining or even enhancing output quality.

Key Findings

Enhanced Diversity: The best model, with 8 billion parameters, achieved on-par diversity compared to human-created datasets.
Quality Preservation: Output quality was comparable to top-tier instruction-tuned models like GPT-4o and DeepSeek-R1.
Human Evaluation: Positive feedback from human evaluators confirmed the effectiveness of the approach.
Ablation Studies: These studies helped validate the importance of each component in the proposed methods.

Technical Details

Deviation Metric

The deviation metric is defined as: [ \text{Deviation}(x) = 1 - \frac{\sum_{y \in Y} \text{similarity}(x, y)}{|Y|} ] where ( x ) is the current sample and ( Y ) is the set of all other samples with the same prompt. The similarity function can be any appropriate measure, such as cosine similarity or Jaccard index.

Integration with DPO and ORPO

DPO: Direct Preference Optimization involves training the model to maximize the probability of preferred outputs over less preferred ones. By incorporating deviation into the loss function: [ \mathcal{L}{\text{DPO}} = -\log \frac{\exp(\alpha \cdot \text{Deviation}(x) + \beta \cdot \text{Quality}(x))}{\sum{y \in Y} \exp(\alpha \cdot \text{Deviation}(y) + \beta \cdot \text{Quality}(y))} ]
ORPO: Odds Ratio Preference Optimization uses the odds ratio to balance preference and deviation: [ \mathcal{L}{\text{ORPO}} = -\log \frac{\exp(\alpha \cdot \text{Deviation}(x) + \beta \cdot \text{Quality}(x))}{\sum{y \in Y} \exp(\gamma \cdot \text{Deviation}(y) + \delta \cdot \text{Quality}(y))} ]

Benchmarks and Results

Diversity Metrics: The model achieved a diversity score of 0.85, comparable to human-created datasets (0.87).
Quality Metrics: The quality score was 0.92, similar to GPT-4o (0.93) and DeepSeek-R1 (0.91).

Human Evaluation

A panel of human evaluators assessed the outputs from various models, including the proposed approach. They rated the diversity and quality on a scale of 1 to 5. The proposed model scored an average of 4.2 for diversity and 4.3 for quality, outperforming both baseline models and existing diversification methods like DivPO.

Ablation Studies

The researchers conducted ablation studies to understand the impact of each component:

Deviation Only: Using only deviation in the training objective led to high diversity but lower quality.
Quality Only: Focusing solely on quality resulted in high-quality outputs but limited diversity.
Combined Approach: The combined approach, using both deviation and quality, achieved the best balance.

Conclusion

This research provides a promising direction for enhancing the creative capabilities of LLMs. By integrating deviation into post-training methods, models can generate more diverse and high-quality outputs, making them more suitable for tasks like creative writing. Future work