Instead of waiting for an AI to output text, 'probing' technology allows for the direct analysis of the model's internal data state, making it possible to identify AI thoughts and factual accuracy more quickly and efficiently.
Imagine this: when you ask a friend, “How is the weather today?”, what if you could read the thoughts appearing in their mind right before they open their mouth to answer? You wouldn’t need to wait for a response, and you could immediately tell if they were about to lie.
Recently, an interesting technology similar to this has been gaining attention in the field of Artificial Intelligence (AI). It is called ‘Probing,’ a technique that allows us to look directly into the internal thoughts (hidden states) of Large Language Models (LLMs)—like ChatGPT—before they generate text.
Why is this technology important?
Until now, the only way we could verify an AI’s thoughts was to make the AI ‘speak’ by outputting text. However, it takes time for an AI to open its mouth—that is, to produce output. More importantly, when an AI experiences ‘hallucinations’ (making up information that is not factual without knowing it), we only realize the error after the AI has finished its incorrect answer.
Probing bypasses the need to wait for the AI’s slow generation process by directly analyzing the ‘data state,’ which is like the electrical signals flowing through the AI’s neural circuits. This paves the way for increasing AI reliability and grasping how specific information is processed within the AI much faster and more accurately.
Easy to Understand: A Filter for Reading the AI’s Brain
Probing can be easily explained as a ‘filter’ in a photo editing app. It leaves the original photo data intact while applying a specific filter to highlight only the information we want to see (like color tone or brightness).
AI models are composed of numerous layers. As data passes through these layers, it gradually comes to understand complex concepts. Researchers ‘intercept’ the data state right before the AI produces its final response—that is, at a middle depth of the model (approximately 70% of the way through) [Source 8, Source 9]. They then pass this data through a small analyzer called a ‘Probe’ (typically a simple classifier like logistic regression) [Source 2].
By doing this, we can read data that indicates what beliefs an AI holds about a specific question, or whether it judges information as true or false, before the text generation stage [Source 1, Source 8].
It is the same principle as noticing, “Ah, they are hesitating, so they must not know,” just by looking at a friend’s facial expression before they give you an answer.
Current Status: How Far Has It Come?
This technology is already being utilized in various fields.
- Hallucination Detection: Research results show that an AI’s hidden state data has excellent performance in predicting whether its answer is factual or not [Source 19]. This means we can capture signs of an AI lying before it actually does so.
- Identifying Knowledge Sources: It is possible to analyze whether an AI is speaking based on learned data (parametric knowledge) or whether it is referencing the provided context [Source 11].
- Connecting with Humans: Recent studies have discovered that the way AI processes sentences is similar to human eye movements when reading sentences [Source 6]. This has opened a new path to study AI’s thinking process by comparing it with human cognitive processes.
Of course, there are limitations. Some point out that if an AI changes its mind or makes errors in the middle of completing a sentence, it is difficult to interpret every step perfectly using only probing [Source 5].
What Will Happen in the Future?
Probing technology is transforming AI from a mere ‘talking machine’ into an ‘analytical subject whose interior can be inspected.’ To use a metaphor, while we could previously only throw questions at the black box called AI, we can now observe the flow of AI’s thoughts in real-time through a clear glass window.
In the future, a time will come when we can assign reliability scores to an AI’s answer before it even finishes generating, or monitor how the AI is constructing the grounds for its answer in real-time. We will learn to use technology more safely and intelligently, no longer just listening to and relying on what an AI says, but transparently confirming the AI’s thought processes.
MindTickleBytes’ AI Reporter Perspective
Probing, which allows us to look inside an AI, is a powerful tool for securing AI reliability. By visualizing the ‘flow of thought’ hidden behind the complexity of the technology, we are gradually turning the black box of AI into a more transparent glass box. These efforts will ultimately ensure that technology does not remain just a tool to help humans, but becomes a partner that humans can understand and control more deeply.
References
- Still no Lie Detector for LLMs — LessWrong
- Still No Lie Detector for Large Language Models - Ben Levinstein
- Measuring Beliefs of Language Models During Chain-of-Thought
- Probing Large Language Models from a Human Behavioral Perspective - ACL Anthology
- Daniel A. Herrmann arXiv:2307.00175v1
- Don’t let the LLM speak, just probe it. - James Padolsey
-
[Don’t let the LLM speak, just probe it Hasty Briefs](https://hb.int2inf.com/en/s/item/UNX3BEHdhhYGUhBZqMkgEH-hidden-state-classification-with-llms) - Probing Language Models on Their Knowledge Source - arXiv.org
- Simple Factuality Probes Detect Hallucinations in Long-Form Natural Language Generation
- It inspects the grammar of text generated by AI
- It directly checks the internal data state of the AI before it gives an answer
- It forcibly increases the speed of AI responses
- Robotics technology
- Complex machine learning architecture
- Linear classifiers or shallow MLPs (Multilayer Perceptrons)
- Improving AI handwriting
- Detecting AI 'hallucination' phenomena
- Measuring internet speed