Even the Creators Didn't Know How AI Worked? The Surprising Evolution of 'Deep Learning Theory'

AI Summary

Deep learning technology, which has developed by relying on experience and intuition, is now being reborn as a 'scientific theory' that perfectly explains its operating principles with the help of physics and mathematics.

First, imagine a scene from your daily life. You wake up in the morning and tell your smartphone’s voice assistant, “Summarize today’s afternoon meeting materials and email them to me.” A few seconds later, a perfectly organized summary arrives, looking as if a human had written it. Or in a hospital, artificial intelligence spots a microscopic tumor that even a veteran doctor’s eye might miss. We are already living in an era where artificial intelligence operates like a kind of ‘magic.’

But here is a truly surprising (and perhaps slightly chilling) fact: until recently, even the genius engineers and scientists who created artificial intelligence could not clearly explain the fundamental mathematical principles of “exactly why this AI is so smart and works so perfectly.”

Compared to the enormous practical success achieved by Deep Learning (a machine learning technique based on artificial neural networks that mimic the structure of the human brain), the theoretical development to satisfactorily explain its behavior has historically lagged behind [[On the Information Bottleneck Theory of Deep Learning

OpenReview]](https://openreview.net/forum?id=ry_WPG-A-).

To use an analogy: it’s like we’ve been running a giant bakery knowing the ‘recipe’ (experience) for baking the world’s most delicious cake, but without knowing the ‘principles’ (theory) of how flour and sugar chemically combine inside the oven.

However, the atmosphere in academia is now completely changing. This is because brilliant scientists from around the world have begun in earnest to dissect the brains of artificial intelligence and establish ‘A Scientific Theory of Deep Learning’ that transparently explains its operating principles. Today, we will explore in simple language that even a high school student can understand why deep learning remained a mystery to scientists for so long, and how the door to its secrets is recently being opened.

Why It Matters

You might think, “Isn’t it enough that the results are good? Do we really need to know the complex principles mathematically?” That might be true for an everyday chatbot. However, as deep learning begins to make very important decisions in our lives, knowing the principles has become a matter of ‘safety’ and ‘trust.’

Today, deep learning is not just a toy. It is already showing competitive results that surpass humans in highly sensitive medical fields where lives are at stake, such as cancer cell classification, lesion detection, organ segmentation, and image quality improvement [Deep learning - Wikipedia].

Furthermore, deep learning plays a central role in Reinforcement Learning, where an AI is trained to take actions within a specific environment to maximize rewards [Introduction to Deep Learning - GeeksforGeeks]. Simply put, it’s an AI technique that learns optimal behavior through trial and error, much like a child learning to find balance by repeatedly falling and getting back up while riding a bicycle.

When making medical diagnoses directly linked to life, or when giant robots and autonomous vehicles take ‘Actions’ in the real world, a simple empirical belief like “it has worked well so far, so it will probably work well tomorrow” is far from enough. Only when backed by a perfect mathematical theory can we scientifically prove and guarantee that AI will never make a fatal mistake in a specific, unexpected situation. In other words, deep learning theory is the only key to turning AI from a ‘dangerous black box with unknown principles’ into a ‘tool that humans can perfectly control.’

The Explainer: The Paradox of Deep Learning That Baffled Scientists

So, what exactly did the world’s best computer scientists find so difficult to understand about deep learning? To understand this, one must know the golden rule that traditional statistics has worshipped for decades: the ‘Bias-variance tradeoff’ [[A Theory of Deep Learning

Elements of a Vector Space]](https://elonlit.com/scrivings/a-theory-of-deep-learning/).

Imagine you are a tailor at a local shop. You are tasked with making clothes (an AI model) that perfectly fit the body types of your customers (data).

What if you made a very loose, square, one-size-fits-all T-shirt far too simply? It won’t look good on anyone. In statistics, this phenomenon where a model is too simple to properly capture the data is called Underfitting.
Conversely, what if you made an extremely sophisticated bespoke suit, perfectly matching a specific customer’s tiny scars and a 1cm asymmetrical shoulder? It might be a perfect 100 for that customer, but no other new customer would be able to wear that garment. This phenomenon, where a model has such high expressive power that it perfectly memorizes past training data but fails miserably on new data, is called Overfitting.

In traditional classical statistical learning theory, finding the right balance between this ‘simplicity’ and ‘complexity’ was an absolute unwritten law [[A Theory of Deep Learning

Elements of a Vector Space]](https://elonlit.com/scrivings/a-theory-of-deep-learning/).

However, along came ‘Deep Learning’ and completely shattered this old mathematical rule. Deep neural networks have thousands or tens of thousands of times more parameters (numerical values like tens of billions of volume dials that can be finely adjusted inside the AI) than the number of data points they need to learn. They are truly in an ‘Overparameterized’ state [[A Theory of Deep Learning

Elements of a Vector Space]](https://elonlit.com/scrivings/a-theory-of-deep-learning/). This is like memorizing 1 million volumes of an encyclopedia just to take a 100-point exam. According to classical theory, such a mind-boggingly complex AI should normally fall into the trap of ‘overfitting’ and become useless when encountering a new problem it has never seen before.

But reality completely mocked scientists’ expectations. Enormously complex deep learning neural networks were powerful enough to digest all given training data, yet simultaneously provided correct answers to new problems they had never seen (a new patient’s X-ray, a question heard for the first time). It was as if they had created the ‘ultimate smart clothing’ that magically expands and contracts to perfectly fit whatever body type a customer has. Scientists were astonished. “Exactly why does such a complex thing provide the right answer without falling into overfitting?”

In fact, deep learning uses ‘Continuously differentiable activation functions’ when processing data. Simply put, these are mathematical filters that connect the flow of information smoothly like waves rather than in jerky interruptions. Once passing through these filters, the AI satisfies the conditions of the ‘Universal approximation theorem,’ meaning it can perfectly mimic any complex data form as if it were clay [Deep learning - Wikipedia].

Other facts are being proven one by one, such as the ‘Softmax’ layer that neatly divides results into “80% chance of A, 20% chance of B,” and the fact that these methods provide excellent consistency when processing large-scale information [Deep learning - Wikipedia]. However, the giant mathematical puzzle of “Why does it generalize so perfectly to new problems without breaking, even after turning tens of billions of dials?” remained incomplete.

Where We Stand: Theoretical Physics and Mathematics Step in as Relief Pitchers

Faced with this unexplained miracle of artificial intelligence, researchers in ‘theoretical physics’ and ‘pure mathematics’ have rolled up their sleeves and stepped in as relief pitchers to lighten the load for computer scientists. Recently, surprisingly new and concrete deep learning theories have been pouring out of academia.

One of the most interesting and unconventional approaches is borrowing the methods of ‘Theoretical physics.’ Just as particle physicists use ‘Effective theory’ to explain the complex overall movements of countless invisible subatomic particles in the universe, physical approaches are being studied to understand giant neural networks where billions of parameters are intertwined like cobwebs [The Principles of Deep Learning Theory]. Based on this perspective, a recently published textbook presents an excellent theoretical framework for macroscopically understanding realistic neural networks, from their microscopic components to the methods that determine accurate descriptions of final outputs [The Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks: Roberts, Daniel A., Yaida, Sho, Hanin, Boris: 9781316519332: Amazon.com: Books].

Furthermore, research utilizing ‘Spline functions,’ which mathematically smooth out the behavior of complex AI, is also active. Much like a mathematical tool an architect uses to design a smooth curved roof, ‘Spline Theory’ aims to build a rigorous and sturdy bridge between deep networks and existing approximation theories [A Spline Theory of Deep Learning].

Synthesizing all these dynamic movements, researchers have recently gone as far as to declare that “A scientific theory of deep learning is emerging” [There Will Be a Scientific Theory of Deep Learning]. This theory is not merely a guess that “it’s probably like this,” but aims to clearly characterize and identify the most important properties of AI—such as the training process of deep learning models, hidden data representation methods, final weights, and overall performance—mathematically [There Will Be a Scientific Theory of Deep Learning].

In particular, scientists are pouring all their energy into the following five key research areas to complete this grand scientific theory [2604.21691] There Will Be a Scientific Theory of Deep Learning:

Solvable idealized settings: Just as one might experiment with simple toy blocks before building a massive skyscraper, researchers study simplified models that allow them to infer the learning methods of real systems.
Tractable limits: By pushing variables to their mathematical limits, they uncover the secrets of fundamental learning phenomena.
Simple mathematical laws: Instead of obsessing over individual complex leaves, they discover simple observation-based laws that can explain the shape of a giant forest.
Theories of hyperparameters: Just as one might perfectly formalize temperature and time for a delicious meal, they work to isolate the setting values of the learning process to reduce overall complexity.
Universal behaviors: Just as the same universal law of gravity applies to a falling apple and the moon orbiting the Earth, they identify universal phenomena that commonly appear across various neural network systems.

As these five giant puzzle pieces slowly find their places, we are finally witnessing the historic academic achievement of translating ‘empirical magic’ into ‘verifiable science.’

What’s Next: True Intelligence That Even Calculates ‘Uncertainty’

So, how will the future of artificial intelligence change after all these scientific theories are perfectly established? One of the most important and disruptive changes we will experience in our daily lives is AI gaining the ability to perfectly perceive and control ‘Uncertainty.’

We often think that computers or AI always provide flawless answers with 100% certainty. However, information in the real world is always noisy and incomplete. Future AI will evolve to mathematically calculate not only the ‘limitations and uncertainty of the AI model itself’ but also the ‘uncertainty of the data entered by humans’ by fusing probabilistic deep learning models with deep neural networks [A Probabilistic Theory of Deep Learning].

Simply put, instead of telling a doctor definitively, “This is a tumor,” future medical AI will answer like this: “Synthesizing the mathematical limits of the model I have learned and the poor quality of the current X-ray (data uncertainty), the probability that this is a malignant tumor is exactly 87.3%. Therefore, an additional ultrasound is absolutely necessary for confirmation.” In other words, AI will become aware of ‘what it does not know’ and advise humans accordingly.

Just as the development of medieval alchemy into modern chemistry allowed humanity to create plastics and new materials for spacecraft, deep learning has moved past the era of relying on blind experience to stand upon the most solid scientific theory. Perhaps the truly great change—how an artificial intelligence that we can perfectly understand and control will transform human life to be even more wondrous and safe—is only just beginning.

Perspective of MindTickleBytes AI 🤖

It is similar to how primitive humans discovered fire and cooked meat for hundreds of years before finally realizing the chemical principles of combustion. Likewise, for AI, practical success and the sprint of technology have far outpaced mathematical theory.

However, a castle built on sand is bound to crumble eventually. The current process of realizing those fundamental principles in the rigorous language of particle physics and pure mathematics will be a historical inflection point that crafts AI from a fearsome ‘mysterious magic box’ into ‘humanity’s greatest tool’ that is perfectly predictable and controllable. We are standing at the forefront of the scene where the new scientific revolution of the 21st century is being completed.

References

[[On the Information Bottleneck Theory of Deep Learning OpenReview]](https://openreview.net/forum?id=ry_WPG-A-)
[Deep learning - Wikipedia]
[Introduction to Deep Learning - GeeksforGeeks]
[[A Theory of Deep Learning Elements of a Vector Space]](https://elonlit.com/scrivings/a-theory-of-deep-learning/)
[The Principles of Deep Learning Theory]
[The Principles of Deep Learning Theory: An Effective Theory Approach to Understanding Neural Networks: Roberts, Daniel A., Yaida, Sho, Hanin, Boris: 9781316519332: Amazon.com: Books]
[A Spline Theory of Deep Learning]
[There Will Be a Scientific Theory of Deep Learning]
[2604.21691] There Will Be a Scientific Theory of Deep Learning
[A Probabilistic Theory of Deep Learning]

Share this article:

Test Your Understanding

Q1. According to the traditional statistical principle of 'bias-variance tradeoff,' what phenomenon should normally occur if a model has far more parameters (adjustable numerical values) than data?

Underfitting
Overfitting
Universal approximation

According to traditional statistical learning theory, underfitting occurs when a model is too simple, while overfitting occurs when it is too complex and has high expressive power, causing it to excessively memorize the training data.

Q2. Which field of study are scientists recently borrowing from to explain deep learning theory?

Theoretical physics
Quantum mechanics
Classical biology

Recently, scientists have been borrowing concepts and approaches from theoretical physics to explain the operating principles of deep learning models.

Q3. Which of the following is a key element that 'probabilistic deep learning' primarily aims to address?

Maximizing calculation speed
Explaining uncertainty
Improving visual design

Probabilistic deep learning is a field that explains and accounts for both the uncertainty of the model itself and the uncertainty of the data.