Elon Musk's artificial intelligence startup, xAI, delayed a model release last year because its chatbot, Grok, did not meet the billionaire's standards for answering detailed questions about the video game "Baldur's Gate," according to a report by Business Insider. High-level engineers were reportedly reassigned to improve the AI's responses specifically on this topic before the launch could proceed.
The incident, detailed in a wider report on xAI's internal culture by Grace Kay, highlighted a specific corporate priority within Musk's AI lab. While competitors like OpenAI and Anthropic focus on consumer and enterprise markets respectively, xAI has invested significant resources into optimising its models for video-game walkthroughs and player guidance.
Testing Grok's Gaming Prowess
To assess the outcome of this engineering sprint, TechCrunch conducted an informal benchmark—dubbed "BaldurBench"—posing five general questions about Baldur's Gate to Grok and three major rival models: OpenAI's ChatGPT, Anthropic's Claude, and Google's Gemini. The chat transcripts from all four models were made publicly available.
The evaluation found that Grok now provides "pretty good information," with useful and well-informed answers, though its responses were noted for being dense with gaming jargon like "save-scumming" and "DPS." Grok exhibited a strong preference for presenting information in tables and engaging in "theorycraft," a practice of analysing game mechanics.
Stylistic Differences Among AI Models
The core advice from all four AI models was largely similar, as they likely drew from the same pool of online guides and resources. The primary differentiators were stylistic. ChatGPT favoured bulleted lists and concise fragments, while Gemini made frequent use of bold text to highlight key terms.
Anthropic's Claude model stood out for its cautious approach, repeatedly expressing concern about spoiling the gameplay experience. When asked for optimal party compositions, it concluded its advice by stating, "don’t stress too much and just play what sounds fun to you."
Context and Implications
This specific domain is one where, according to Business Insider's reporting, xAI has deliberately sought to achieve parity with other leading AI systems. Therefore, while Grok's competent performance is notable, it primarily demonstrates the company's ability to focus engineering efforts on targeted improvements as dictated by leadership priorities.
The anecdote underscores the varied strategic directions within the competitive AI industry, where development roadmaps can be influenced by founder interests as much as by broader market demands. For xAI, achieving reliable performance in niche areas like complex RPG guidance remains a stated objective.