Is AI Truly Smart, or Just Memorizing the Test? New Standards for Measuring Intelligence

A robot's hand and a human's hand facing each other over a chessboard, with data streams flowing around them.
AI Summary

Moving beyond solving static test questions, AI is now beginning to have its true capabilities verified through strategic games, creativity, and efficiency in learning new skills.

If AI Gets a Perfect Score on an Exam, Has It Truly Become a ‘Genius’?

Imagine this. A student has memorized every workbook and past exam question on the market, word for word. This student always gets a 100 on tests, but what happens if you slightly change a single number in a problem or ask a quirky question not found in the textbook? They would likely be flustered and unable to say a word. Looking at such a student, we wouldn’t say, “They’re so smart,” but rather, “They have an incredible ability for rote memorization.”

The situation current Artificial Intelligence (AI) faces is quite similar. Until now, we have used fixed exam papers called Benchmarks to measure AI’s capabilities. However, as AI models include these exam questions in their training data—essentially ‘memorizing the answer key’ in advance—doubts are growing as to whether AI truly understands the principles or is just recalling answers. The way we measure progress in AI is terrible

Experts are now beginning to fundamentally rethink how we measure AI intelligence. Beyond just getting the right answers to fixed questions, interesting attempts are being made to measure how strategically AI thinks, how creative it is, and how quickly it learns new skills.

The Benchmark Trap: “AI That Memorizes the Entire Test”

Recent AI performance metrics reveal a phenomenon that makes one tilt their head. For instance, suppose a previous model scored 90 and a new model scores 93. On the surface, it might look like the pace of development has slowed significantly. However, this may not be because AI technology is stagnating, but because the exam papers (benchmarks) we use are already ‘spoiled’ with exposed answers. The way we measure progress in AI is terrible

Furthermore, many companies boast about AI efficiency using metrics like ‘Tokens-per-watt’ (data generation relative to power consumption). By analogy, this is like bragging about a car’s fuel efficiency. Just because a car has great gas mileage doesn’t mean the person driving it has the ‘driving skill’ to find the safest and fastest route to a destination. We Invested in AI. We Forgot to Measure What Matters. In other words, generating a lot of output at a low cost is not evidence that the output is accurate or wise.

A New Wave of Intelligence Measurement: The Start of Head-to-Head Showdowns

To overcome these limitations, the ‘Kaggle Game Arena’ has emerged. Google has introduced a new platform where AI models sit across from each other in a public space to engage in real-time strategic game showdowns. Rethinking how we measure AI intelligence

Strategic games are the perfect testing ground for evaluating AI’s true capabilities for three reasons:

  1. Dynamic Environment: Instead of choosing a fixed correct answer, the AI must revise its strategy every moment based on the opponent’s moves.
  2. Clear Win/Loss: Instead of subjective judgments like “who looks smarter,” the result is clearly displayed in numbers as a win or loss.
  3. High-Level Thinking: To win, it is essential to look beyond the immediate move, establish long-term plans, analyze complex situations, and adapt. Rethinking how we measure AI intelligence

The way AI performs in games like chess or Go is closer to the realm of ‘strategic reasoning’ than simple memorization. Through this, we can more reliably gauge how much general problem-solving ability an AI possesses. Rethinking how we measure AI intelligence – VedereAI

Creativity and Learning Efficiency: “How You Learn” Is the Key

The definition of intelligence is now shifting from ‘how much knowledge has been accumulated’ to ‘how efficiently one learns new skills.’

1. Creativity as a New Yardstick

Researchers are now using creativity as an important indicator of intelligence. Creativity here isn’t just the skill to draw pretty pictures. Simply put, it refers to the ability to find unexpected connections between seemingly unrelated pieces of information through lateral thinking (a way of thinking freely, breaking away from stereotypes) and produce original results. How do you measure artificial intelligence? Professor Jeremy Utley of Stanford University emphasizes that many have yet to fully tap into this creative potential of AI. How to Master AI Powered Creativity in Just 13 Minutes - YouTube

2. The ‘Cost-Effectiveness’ of Skill Acquisition

True intelligence comes not from a ‘brute force’ approach of training on trillions of data points, but from the ability to adapt quickly to new situations with very little experience. To measure this, a benchmark called ARC (Abstraction and Reasoning Corpus) was designed. ARC is intended to measure ‘General fluid intelligence’—the ability of humans to solve problems logically in situations they encounter for the first time. How Do We Measure And Define Intelligence In Artificial Systems? - Consensus Academic Search Engine

Is Resembling Humans the Right Answer for Intelligence?

We have often set ‘AI that thinks and acts like a human’ as the ultimate goal. This is sometimes called the Turing Test or the ‘Imitation Game.’ However, latest research is posing fundamental questions to this assumption. [Beyond the Imitation Game: Rethinking How We Measure General Intelligence Research Communities by Springer Nature](https://communities.springernature.com/posts/beyond-the-imitation-game-rethinking-how-we-measure-general-intelligence)

Autonomous AI systems might evolve goals and ways of thinking that are entirely different from those of humans. Therefore, the argument that we need ways to measure AI’s own unique cognitive abilities and values, rather than just using the standard of perfectly copying human behavior, is gaining traction. Ultimately, the AGI (Artificial General Intelligence) we dream of implies a level that equals or exceeds humans in all cognitive tasks. Artificial general intelligence - Wikipedia

Changes in the Future We Will Face

How will changes in intelligence measurement methods change our daily lives?

First is the change in the educational field. As AI is used as a tool to measure collaborative problem-solving abilities, educational methods could be introduced to more sophisticatedly evaluate and help how our children communicate and solve problems with their peers. How AI could transform the way we measure kids’ intelligence

Second is more reliable AI services. If our assistants are not just AIs that have memorized answers but those whose ‘ability to think’ for themselves has been rigorously verified, we will be able to entrust them with more complex and unexpected tasks with peace of mind.

Ultimately, properly measuring AI intelligence will be more than just a technical issue; it will be the most important milestone in determining what kind of future we will draw together with artificial intelligence.


AI’s Perspective (AI’s Take)

MindTickleBytes’ AI Reporter Perspective If AI until now has been close to a ‘chronicler’ that swallowed vast encyclopedias whole, it is now evolving into a ‘strategist’ and ‘creator’ that makes new moves based on that knowledge. Shifting the yardstick of intelligence from simple ‘memorization’ to ‘adaptation’ and ‘reasoning’ is also a pleasant sign that we are beginning to recognize AI not just as a simple tool, but as a true partner by our side.


References

  1. Rethinking how we measure AI intelligence
  2. [Beyond the Imitation Game: Rethinking How We Measure General Intelligence Research Communities by Springer Nature](https://communities.springernature.com/posts/beyond-the-imitation-game-rethinking-how-we-measure-general-intelligence)
  3. How do you measure artificial intelligence?
  4. How Do We Measure And Define Intelligence In Artificial Systems? - Consensus Academic Search Engine
  5. [Rethinking how we measure AI intelligence 67nj](https://www.67nj.org/rethinking-how-we-measure-ai-intelligence)
  6. Artificial general intelligence - Wikipedia
  7. Rethinking how we measure AI intelligence – VedereAI
  8. The way we measure progress in AI is terrible
  9. How AI could transform the way we measure kids’ intelligence
  10. How to Master AI Powered Creativity in Just 13 Minutes - YouTube
  11. We Invested in AI. We Forgot to Measure What Matters.
  12. Rethinking how we measure AI intelligence - googblogs.com
Test Your Understanding
Q1. How does Google's recently introduced 'Kaggle Game Arena' measure AI?
  • By having it solve past college entrance exam questions.
  • By having AI models face off in real-time strategic games.
  • By simply measuring response speed.
Kaggle Game Arena measures dynamic capabilities by having AI models compete head-to-head in strategic games.
Q2. What does 'creativity,' which is emerging as a new measure of AI intelligence, mean?
  • The ability to simply copy data quickly.
  • The ability to create unexpected connections through lateral thinking.
  • The ability to minimize electricity consumption.
Creativity refers to the ability to create connections between disparate information through lateral thinking and produce original results.
Q3. Which of the following is NOT an important factor when defining intelligence as 'efficiency of skill acquisition'?
  • The difficulty of generalization.
  • Existing background knowledge.
  • The ability to simply store large amounts of data.
Intelligence from a new perspective focuses on how quickly generalized skills are learned with minimal experience, rather than just quantitative data accumulation.
Is AI Truly Smart, or Just ...
0:00