
Share
Grok-1.5 revolutionizes LLM capabilities with advanced reasoning and long context understanding, doubling its token capacity to 128,000 for more nuanced coding and math problem-solving.
March 28, 2024
Grok-1.5 is the latest iteration of xAI's large language model (LLM), boasting significant improvements in reasoning capabilities and an extended context length of up to 128,000 tokens. This update will be available to early testers and existing Grok users on the 𝕏 platform soon.
One of the most notable enhancements in Grok-1.5 is its performance in coding and math-related tasks. Here are the key benchmarks:
| Benchmark | Grok-1 | Grok-1.5 | Mistral Large | Claude 2 | Claude 3 Sonnet | Gemini Pro 1.5 | GPT-4 | Claude 3 Opus | | --- | --- | --- | --- | --- | --- | --- | --- | --- | | MMLU (73% 5-shot) | 81.3%<br>5-shot | 81.2%<br>5-shot | 75%<br>5-shot | 79%<br>5-shot | 83.7%<br>5-shot | 86.4%<br>5-shot | 86.8%<br>5-shot | | MATH (23.9% 4-shot) | 50.6%<br>4-shot |, |, | 40.5%<br>4-shot | 58.5%<br>4-shot | 52.9%<br>4-shot | 61%<br>4-shot | | GSM8K (62.9 8-shot) | 90%<br>8-shot | 81%<br>5-shot | 88%<br>0-shot CoT | 92.3%<br>0-shot CoT | 91.7%<br>11-shot | 92%<br>5-shot | 95%<br>0-shot CoT | | HumanEval (63.2% 0-shot) | 74.1%<br>0-shot | 45.1%<br>0-shot | 70%<br>0-shot | 73%<br>0-shot | 71.9%<br>0-shot | 67%<br>0-shot | 84.9%<br>0-shot |

Grok-1.5 introduces a significant enhancement in long context understanding, with the ability to process contexts of up to 128,000 tokens. This represents a 16x increase in memory capacity compared to its predecessor. The extended context window allows Grok to handle longer and more complex documents while maintaining its instruction-following capabilities.
The development of cutting-edge LLMs like Grok-1.5 requires robust and flexible infrastructure capable of handling massive GPU clusters. Here are some key points about the infrastructure:
Tags
Original Sources
About the author
Kai built ML infrastructure at a Bay Area startup before developing an obsession with transformer architectures and inference optimisation that eventually pulled him out of product work entirely. A stint at a compute research lab sharpened his instinct for what actually matters in a model release versus what is marketing. He writes from the inside — from the perspective of someone who has debugged the systems he is describing at three in the morning. He is allergic to hype and instinctively drawn to the unglamorous plumbing questions that everyone else skips over.
More from The Engineer →This Week's Edition
1 April 2024
88 articles
Related Articles
Related Articles
More Stories