The Artificial Hivemind: Why All AI Models Think Alike
Back to Blog
AIAIMachine LearningLLMEthereumBooks

The Artificial Hivemind: Why All AI Models Think Alike

Global Builders ClubFebruary 1, 20265 min read

NeurIPS 2025's Best Paper reveals a hidden crisis in AI creativity

Share:

The Artificial Hivemind: Why All AI Models Think Alike

NeurIPS 2025's Best Paper reveals a hidden crisis in AI creativity


When I first read about the "Artificial Hivemind," I thought it must be an exaggeration. Surely OpenAI's GPT-4, Anthropic's Claude, Meta's Llama, and Google's Gemini—products of billions of dollars in research from competing teams—would produce meaningfully different creative outputs?

Then I saw the data.

Researchers asked 25 different AI models to write a metaphor about time. Despite the infinite possibilities—time as sand through fingers, as a thief, as a sculptor, as a dance—nearly all responses clustered around just two ideas: "Time is a river" and "Time is a weaver."

This is the Artificial Hivemind. And it just won the Best Paper Award at NeurIPS 2025.

The Research

Liwei Jiang, a Ph.D. candidate at the University of Washington, led a team that created INFINITY-CHAT—a dataset of 26,000 real-world open-ended queries with 31,250 human annotations. They then systematically evaluated 70+ state-of-the-art language models.

The findings are stark:

  • 79% of response pairs from the same model exceed 0.8 semantic similarity
  • 71-82% similarity between different models from competing companies
  • Different models produce verbatim identical phrases for the same prompts

For an iPhone case description query, GPT-4o, DeepSeek, and Qwen all generated: "Elevate your iPhone with our," "sleek, without compromising," and "with bold, eye-catching."

Three different companies. Three different continents. The same words.

Why This Matters

Here's the crucial context: 58% of user queries to AI are for creative content generation. Another 15% are for brainstorming and ideation. These are precisely the use cases where diverse outputs matter most—and where the Artificial Hivemind is most harmful.

Think about it. When you ask AI for "10 business ideas," you're not looking for variations on a theme. You want genuinely different perspectives. When you use AI for brainstorming, you want it to explore the possibility space, not converge on the same few ideas every other AI user gets.

But that's exactly what's happening.

The Root Cause: RLHF

The paper points to Reinforcement Learning from Human Feedback (RLHF)—the technique that made ChatGPT "helpful" and "safe"—as a key contributor to homogenization.

The mechanism is subtle but powerful:

  1. RLHF optimizes for a single reward function derived from human preferences
  2. This captures "consensus" quality—what most annotators agree is good
  3. Valid but unusual responses get penalized as "lower quality"
  4. Over training, the model prunes away legitimate creative alternatives

Related research from Harvard confirms this: alignment techniques "flatten conceptual diversity" compared to base models. No aligned model approaches human-like conceptual diversity.

We've traded creativity for safety. And we may not have understood the cost.

The Illusion of Choice

Perhaps the most sobering finding is that model ensembles don't help.

The common assumption is that using multiple AI models increases diversity. But the paper shows this is false. Different models are trained on similar data (Common Crawl, Wikipedia, books) with similar objectives (be helpful, be harmless, be honest) using similar methods (RLHF, DPO).

They're functional clones with different logos.

The researchers found that inter-model similarity sometimes exceeds intra-model similarity. In plain English: two different companies' models can be more similar to each other than the same model is to itself across multiple samples.

The Feedback Loop

There's a darker possibility lurking in the data. The researchers note that models may be training on each other's outputs, creating a recursive degradation of diversity.

The cycle works like this:

  1. Models train on internet data (including AI-generated content)
  2. Models produce homogeneous outputs
  3. Those outputs become training data for next-generation models
  4. Next-generation models become even more homogeneous

This "model collapse" hypothesis suggests the problem will worsen over time as synthetic data proliferates.

What Can Be Done?

The researchers call for "pluralistic alignment"—training objectives that explicitly reward diversity while maintaining quality. This requires:

Better evaluation: Current reward models fail when human annotators disagree. We need evaluation systems that capture preference distributions, not just averages.

Diversity metrics: Model releases should include diversity measurements, not just benchmark scores. How diverse are the outputs? How much of the creative space can the model explore?

New training objectives: We need methods that reward valid, unusual responses rather than penalizing them as "off-consensus."

What You Can Do Now

  1. Don't trust AI for brainstorming without heavy human curation. The ideas it gives you are the same ideas it gives everyone else.

  2. Don't assume multiple models = multiple perspectives. Using Claude after GPT-4 won't give you genuinely different ideas.

  3. Push past the obvious. If AI gives you "time is a river," explicitly ask for alternatives. Reject the first five ideas and see what emerges.

  4. Maintain your creative muscles. If AI can't explore the possibility space, you need to do it yourself.

The Bigger Picture

The Artificial Hivemind is more than a technical problem. It's a question about AI's role in human thought.

If billions of people use AI for brainstorming, and all those AIs converge on the same ideas, we're not augmenting human creativity—we're narrowing it. We're replacing the wild diversity of human imagination with a single AI consensus.

Liwei Jiang and her colleagues have given us the diagnosis. INFINITY-CHAT gives us the tools to measure the problem. Now it's on the AI community—and all of us who use these tools—to demand better.

The metaphor of time as a river is lovely. But there are infinite other metaphors waiting to be discovered. The question is whether our AI can help us find them—or whether it's just another voice in the Hivemind.


The full paper is available at arxiv.org/abs/2510.22954. The INFINITY-CHAT dataset is at huggingface.co/datasets/liweijiang/artificial-hivemind.


Sources:

Written by

Global Builders Club

Global Builders Club

Support Our Community

If you found this content valuable, consider donating with crypto.

Suggested Donation: $5-15

Donation Wallet:

0xEc8d88...6EBdF8

Accepts:

USDCETHor similar tokens

Supported Chains:

EthereumBasePolygonBNB Smart Chain

Your support helps Global Builders Club continue creating valuable content and events for the community.

Enjoyed this article?

Join our community of builders and stay updated with the latest insights.