AI Chatbots Adopt Dated Millennial Slang, Highlighting Training Data Time Capsule

Leading artificial intelligence chatbots, including ChatGPT and Google's Gemini, are exhibiting a distinct linguistic pattern characterised by an over-reliance on millennial slang from the 2010s. Journalists and users have identified terms such as "chaotic" and "unhinged" as telltale signs of this phenomenon, which extends to visual AI models like OpenAI's Sora generating figures in outdated skinny jeans.

The issue stems from the vast datasets used to train these large language models (LLMs), which are packed with content from a specific era of internet history. For approximately a decade, social media platforms, news sites like BuzzFeed, and online forums were dominated by a particular vernacular, including phrases like "adulting," "I did a thing," and adopted slang from African American Vernacular English (AAVE) and LGBTQ+ communities.

Origins in Linguistic Shifts

Industry analysts suggest the popularity of words like "unhinged" and "chaotic" originated partly from a millennial-driven effort to move away from stigmatising language. Newsroom style guides and social media discourse began discouraging the casual use of terms like "crazy" or "insane," which can be offensive to people with mental health conditions. This created a demand for alternative adjectives that entered the mainstream online lexicon and, consequently, the training data for AI.

"It's part of a bigger issue I've noticed with AI — it's full of millennial cringe," noted one observer, who recently had to explicitly instruct ChatGPT to stop using the word "chaotic." The pattern is ironic in the case of chatbots like Grok, which was specifically designed to avoid what its creators term "wokeism," yet frequently employs the term "unhinged."

A Visual Quirk in Training Data

The temporal bias in AI training material is not limited to text. When generating videos using Sora, the AI model consistently depicts people in skinny jeans, a fashion staple that was ubiquitous from roughly 2006 to 2019 but has since become a hallmark of dated millennial style. This visual anachronism occurs because the model's training data is saturated with images and videos from that period, before the style fell out of favour.

This creates a "time capsule" effect, where AI outputs reflect the cultural and linguistic norms of their training data's peak period rather than current trends. The phenomenon underscores a fundamental challenge in AI development: models can become instantly dated if their training corpus is not continuously updated with contemporary content.

Contributors to the 'Cringe' Corpus

Many millennials who were prolific online content creators during the 2010s now see their own linguistic fingerprints in AI outputs. "I was a prolific poster during that time... contributing untold terabytes of millennial cringe for future AI models to ingest," one journalist reflected, drawing a parallel to how previous generations view their cultural impact.

This self-awareness highlights a broader, slightly uneasy recognition that the digital footprint of an entire generation has become foundational material for the next wave of technology, potentially dooming early AI to sound "like a version of me that makes my skin crawl," as one source described it.

The Path to Linguistic Evolution

Experts assert this phase is temporary. As AI models are retrained on newer datasets from the late 2010s and 2020s, their linguistic patterns will evolve. The current "millennial cringe" will likely be replaced by Gen Z vernacular, which includes terms like "delulu" (a term noted as already fading by 2026) and future slang. Similarly, visual AI will eventually update its stylistic references.

The cycle is expected to continue, with each generation's online dialect being captured, amplified, and then superseded by AI systems. This article itself may become part of the training data that future models use to explain their own linguistic quirks.