Has AI Caught Up with the 'Sensitivity' of Artists? A 'Creativity Report Card' Verified by 1.5 Million Experts

AI Summary

According to latest research, while AI outperforms the average human in certain creativity tests, a 'perfect AI model' that perfectly follows a creator's intent while remaining technically accurate does not yet exist.

Imagine you are creating a logo for a bakery you’re about to open. You ask an Artificial Intelligence (AI) to “draw a bread-shaped logo with a warm and cozy feel.” In an instant, the AI churns out dozens of drafts. But looking closely, some logos have distorted bread shapes, and others have perfect bread but colors that feel too cold. When you command it again to “change the color scheme to be more yellowish,” this time the color is great, but the bread suddenly turns into a croissant.

We have often believed that creativity is a unique ‘sanctuary’ belonging only to humans. However, we live in an era where AI-written poems win literary awards and AI-generated paintings sell for high prices at auctions. This leads to a fundamental question: “Is AI truly creative? Or is it just a machine that very sophisticatedly mimics human data?”

To answer this question, as many as 1.5 million creative professionals have stepped forward. The ‘Human Creativity Benchmark’ released by Contra Labs is the first large-scale report card to scientifically and systematically measure the creative performance of AI.

Why is this important?

In the past, it was important whether AI simply “understood the words.” Now, the core is “how sophisticatedly (Style), in what mood (Tone), and according to what taste (Taste) it creates the output” Contra Labs - Human Creativity Benchmark. Metaphorically speaking, AI has reached a stage where it is being evaluated not as a child just starting to speak, but for its qualities as a ‘professional assistant.’

For ordinary people like us, this research is important for three main reasons:

You learn how to use AI properly: By identifying which AI understands your intent well and which AI is technically superior, you can dramatically increase work efficiency.
The definition of ‘true creativity’ is changing: It is being redefined as how brilliantly one can combine existing ideas within complex constraints, rather than simply creating something new that didn’t exist in the world arxiv.org/abs/2604.19799.
The human role becomes clearer: No matter how excellent the output AI produces, the ‘final approver’ who ultimately decides “This is my style!” is the human. This study clearly shows where that boundary lies.

Can ‘creativity’ be measured with numbers?

Creativity is extremely subjective. A beautiful masterpiece to one person may look like a scribble to another. To solve this, Contra Labs created two key yardsticks for measuring creativity: ‘Convergence’ and ‘Divergence’ No AI Model Is Both Correct and Steerable, Says New Creative Benchmark.

Convergence: The ability to follow best practices that everyone agrees “follows the standards of design.” Put simply, it’s like a chef seasoning a dish exactly according to a recipe.
Divergence: The ability to reflect the unique intent or personality of the creator, allowing one to say “This is exactly my style!” It’s like the sense of adjusting the amount of salt very finely to suit a customer’s picky palate.

Researchers collected more than 15,000 data points of expert judgment across five creative fields, including graphic design and writing Human Creativity Benchmark - LinkedIn. More than 1.5 million verified experts meticulously reviewed and scored the outputs created by AI Contra Labs - The Human Creativity Benchmark.

The War of ‘Taste’ with an AI Chef: An Easy Analogy

To help you understand, let’s look at more analogies. Current AI is similar to a ‘genius apprentice chef’ who has studied very hard.

First analogy: Recipe vs. A Pinch of Salt AI has memorized every cookbook (data) in the world. So, if you say “Make me pasta,” it produces very standard and good-looking pasta (Convergence). However, if you make a very subtle request like “Make it a bit less salty today, but with the spicy feel of the tteokbokki I had yesterday,” it starts to get flustered (Divergence). It still lacks that ‘pinch’ of sense to capture the memory of yesterday’s tteokbokki in a single plate of pasta.

Second analogy: Creativity built with Lego blocks In the past, creativity was thought of as ‘flashing inspiration that creates something from nothing.’ However, this study defines creativity as the ‘Transformation and Synthesis of ideas’ arxiv.org/abs/2604.19799. It is like the process of finding the necessary pieces and assembling them into a shape that didn’t exist before, in an Embedding Space (a virtual ‘room of thoughts’ where AI understands words or images by converting them into numbers) where trillions of Lego blocks are scattered.

AI beat humans? A surprising twist

There are also shocking results. Latest AI systems scored higher than the average human on certain creativity tests [Researchers tested AI against 100,000 humans on creativity

ScienceDaily](https://www.sciencedaily.com/releases/2026/01/260125083356.htm).

In a study comparing AI one-on-one with as many as 100,000 people, generative AI far surpassed the level of ordinary people in terms of diversity and novelty of ideas [Creativity in the age of generative AI: A new era of creative partnerships

ScienceDaily](https://www.sciencedaily.com/releases/2023/11/231120170939.htm). This means AI has reached a stage where it can propose ‘unexpected combinations’ that humans hadn’t even thought of, going beyond simply copying data.

However, there is a subtle trap here. It has been pointed out that when looking closely at the results created by AI, there is an ‘AI-ish veneer’ that somehow feels mechanical. Experts sometimes describe this slight sense of incongruity as a ‘slippery feeling’ or a ‘digital fingerprint’ [The Human Creativity Benchmark – Evaluating Generative AI in Creative Work

Hacker News](https://news.ycombinator.com/item?id=47966484).

Why there is no ‘perfect AI’ yet

The most important conclusion of this benchmark is this: “There is no model yet that is both technically accurate and easy to steer at the same time” No AI Model Is Both Correct and Steerable, Says New Creative Benchmark.

Correct models: The output is excellent, but if a user asks to “fix just this part slightly,” it ruins the overall style or remains stubborn.
Steerable models: They understand the user’s words perfectly and change details well, but the overall quality is low or they lack basic skills.

It’s similar to a situation where you have to choose between a stubborn artist with top-tier skills and a beginner student who listens well but lacks ability. According to the research, there is currently no model that is overwhelmingly number one in all categories Human Creativity Benchmark - LinkedIn.

How will creation change in the future?

Creation is no longer a task where a human agonizes alone, but is evolving into a ‘Human-AI Co-Creation Process (HAI-CDP)’ Exploring creativity in human–AI co-creation: a comparative study across design experience.

In this process, the ability most needed by humans is ‘evaluation and refinement.’ We must pick the gems out of the tens of thousands of ideas poured out by AI and refine them according to the MAYa principle.

What is the MAYa principle? It stands for Most Advanced Yet Accessible, meaning “it should be at a level that is most advanced yet acceptable to people” Human-AI Co-Creativity: Exploring Synergies Across Levels of Creative Collaboration. If AI creates something too bizarre, humans must pull it down to a ‘level understandable by the public,’ and if AI creates something too obvious, humans must increase its value by providing ‘new stimulation.’

However, there are also points of caution. If we rely too much on AI’s suggestions, there is a risk that we will stop thinking creatively ourselves The paradox of creativity in generative AI: high performance, human-like bias, and limited differential evaluation. AI is just a kind map showing us paths we haven’t taken; ultimately, the protagonist who walks that path and plants the flag at the destination is us.

Perspective of MindTickleBytes AI Reporter

The fact that AI has stood before the judgment of 1.5 million experts proves that creativity is no longer a mysterious realm. Future competitiveness lies not in ‘who draws better,’ but in ‘who can steer AI more sophisticatedly to assert their own taste.’ What is your unique ‘pinch of salt’? In the AI era, your firm taste will become your most powerful talent.

References

Contra Labs - Human Creativity Benchmark
[The Human Creativity Benchmark – Evaluating Generative AI in Creative Work Hacker News](https://news.ycombinator.com/item?id=47966484)
[2604.19799] Measuring Creativity in the Age of Generative AI: Distinguishing Human and AI-Generated Creative Performance in Hiring and Talent Systems

[Frontiers

Exploring creativity in human–AI co-creation: a comparative study across design experience](https://www.frontiersin.org/journals/computer-science/articles/10.3389/fcomp.2025.1672735/full)

The paradox of creativity in generative AI: high performance, human-like bias, and limited differential evaluation - PMC
Human-AI Co-Creativity: Exploring Synergies Across Levels of Creative Collaboration
No AI Model Is Both Correct and Steerable, Says New Creative Benchmark
Human Creativity Benchmark - LinkedIn
Contra Labs - The Human Creativity Benchmark
The Human Creativity Benchmark - Evaluating Generative AI in Creative Work
Human Creativity Benchmark [AI Agent Knowledge Base]
[Researchers tested AI against 100,000 humans on creativity ScienceDaily](https://www.sciencedaily.com/releases/2026/01/260125083356.htm)

[Frontiers

The paradox of creativity in generative AI: high performance, human-like bias, and limited differential evaluation](https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2025.1628486/full)

[Creativity in the age of generative AI: A new era of creative partnerships

ScienceDaily](https://www.sciencedaily.com/releases/2023/11/231120170939.htm)

Share this article:

Test Your Understanding

Q1. What are the two key categories used to evaluate AI model performance in this benchmark?

Speed and Accuracy
Convergence and Divergence
Text and Image

Researchers evaluated AI by dividing performance into 'Convergence,' the ability to follow best practices, and 'Divergence,' the ability to follow the taste and intent of individual creators.

Q2. According to the research, what is the biggest limitation currently shared by AI models?

Generation speed is too slow
They cannot recognize colors properly
There is no model that is both accurate and easy to steer

According to the report, a model that is both technically accurate (Correct) and sophisticatedly controlled according to the user's intent (Steerable) does not yet exist.

Q3. What is the principle applied by users when modifying AI outputs in human-AI collaboration?

Principle of Least Effort
MAYa Principle
Random Choice Principle

Users refine results by applying the MAYa principle, which states that AI outputs should be both advanced and accessible (acceptable) at the same time.