This AI creativity study suggests top human creators still have the upper hand

Image Credit: Omar:. Lopez-Rincon

Generative AI has crossed another important threshold in creative performance, at least when it comes to average output. A large new research project comparing human creativity with modern language models shows that some AI systems can now outperform the typical person on standardized creativity tests. Still, the highest levels of creativity remain firmly human.

The study analyzed results from more than 100,000 participants and compared them with outputs generated by several large language models, including tools such asChatGPT,Claude, andGemini. According to the findings, certain AI models achieved higher scores than the median human participant on a widely used creativity benchmark.

That advantage fades quickly at the upper end of the scale. The most creative half of human participants outperformed every AI system tested, and the gap became even more pronounced among the top ten percent.

The takeaway is not that machines have surpassed human creativity, but that they are increasingly competent at clearing baseline creative tasks. Exceptional human creativity, however, still creates separation that current models struggle to match.

The test that shaped the results

To measure creativity at scale, researchers relied on the Divergent Association Task, a short exercise that asks participants to produce ten words that are as unrelated to one another as possible. Responses earn higher scores when the selected words are more semantically distant from each other.

Because the task is fast and language focused, it allowed researchers to run comparisons across an unusually large dataset. It also plays to the strengths of language models, which can be tuned to generate wide ranging vocabulary on demand.

This alignment helps explain why some AI systems performed well against average human scores. The task rewards breadth and divergence, qualities that large language models can deliver efficiently when prompted.

At the same time, the researchers note that the task captures only a narrow slice of creativity. It does not account for emotional resonance, contextual judgment, or whether an idea is appropriate for a specific audience or purpose. Those dimensions remain difficult to measure with simple benchmarks.

Where people still pull ahead

The most striking pattern in the data is not which group wins overall, but how widely performance spreads. AI systems tend to cluster around the middle, while human scores fan out more dramatically.

In practical terms, AI excels at generating options quickly. If the goal is to explore multiple directions or surface unexpected combinations, models can do that at speed and scale. What they struggle with is selection and refinement, deciding which idea is worth pursuing and shaping it intentionally within real constraints.

This distinction matters in creative work. Producing many plausible ideas is different from choosing the right one and developing it with purpose. That selective judgment is where top human creators continue to stand apart.

The findings also suggest caution in interpreting creativity leaderboards as verdicts on creative professions. The benchmark measures ideation range, not originality under pressure or the kind of insight that reshapes audience expectations.

What the results mean for everyday use

Beyond word association tests, the researchers also compared humans and AI on creative writing tasks such as haiku, plot summaries, and short narrative prompts. These exercises more closely resemble how people typically use AI tools in daily work.

Even in those scenarios, top human creators maintained an edge. While AI produced competent and often fluent results, it struggled to consistently match the nuance and intent found in high level human writing.

For professionals already using AI in creative workflows, the study reinforces a practical approach. Treat AI as an ideation accelerator rather than a replacement. Let it expand the field of possibilities, then apply human judgment to decide what aligns with your voice, your goals, and the expectations of your audience.

The researchers also note that these comparisons reflect a snapshot in time. Model versions, training data, and tuning strategies change quickly, and future follow up studies will be needed to track how those shifts affect creative performance.

Facebook
Twitter
Pinterest
Reddit
Telegram