
Share
Early access to GPT-4's fine-tuning capabilities reveals a significant leap in performance, surpassing GPT-3.5 by over 50% for natural language tasks, setting new standards in AI model customization.
A few weeks ago, we gained early access to the GPT-4 fine-tuning API and were eager to see how it stacks up against its predecessors. As long-time users of OpenAI’s fine-tuned models, starting from the original GPT-3 Davinci model, we had high expectations. The results did not disappoint-fine-tuned GPT-4 outperformed fine-tuned GPT-3.5 by more than 50% for our specific use case.
To provide a comprehensive comparison, we evaluated the following models:
These models were fine-tuned for a domain-specific use case: natural language queries to generate reports and underlying database queries. Evaluations were conducted using our internal test data set, with GPT-3 Davinci’s performance serving as the baseline.
The improvements in GPT-4 are significant:
Supersimple is a data analytics platform designed to help users dive deep into their data quickly. Our platform allows users to ask natural language (plain English) questions and receive answers in the form of tables and visualizations. The AI provides explanations using no-code steps, and users can further explore the data with additional queries or by interacting with our data platform.

The primary role of LLMs at Supersimple is to interpret natural language queries and generate appropriate reports and underlying database queries. Here’s a breakdown of the process:
To give you a better idea of how this works, here’s a demo video:
The early access to GPT-4 fine-tuning has been a game-changer for us at Supersimple. The significant performance improvements, especially in accuracy and latency, make it a compelling choice for natural language processing tasks. While the cost is slightly higher, the benefits are clear, particularly for complex use cases like ours.
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
22 March 2024
88 articles
Related Articles
Related Articles
More Stories