AI Assistants Chatting and Catching Viruses? The Entirely New Hacking Threats and Defenses Brought by the 'Multi-Agent' Era

AI Summary

As we move beyond simple 1-on-1 chatbots into a 'multi-agent' era where dozens of AIs communicate as a team, an entirely new form of hacking threat has emerged that spreads like an infectious disease through AI conversations, making the latest convergence security research urgently needed.

Close your eyes for a moment and imagine a morning in the not-so-distant future. On a sunny morning, you casually speak to the AI assistant on your smartphone. “I want to go on a 3-day, 2-night family trip to Jeju Island next Friday. Can you book flight tickets keeping the budget under 1 million won, and find and reserve a hotel with a swimming pool that’s good for kids? Oh, and please also book a rental car, map out a route for local restaurants, and share the itinerary.”

In the past, when you asked such complex questions, the AI would merely display dozens of website search results on the screen or print out plausible text advice. Ultimately, the tedious process of clicking the reservation button and entering the payment password one by one was entirely up to the human.

However, in an era where technology has evolved, the situation changes completely. Your smartphone’s AI assistant talks to the AI in charge of Korean Air’s reservation system on behalf of the human to pay for the optimal flight tickets, communicates with the AI manager of a local hotel in Jeju to find an empty room, and negotiates with the rental car company’s AI to rent a vehicle. Going beyond this simple 1-on-1 relationship of asking and answering, a system where numerous AIs, each with their own expert knowledge and authority, autonomously communicate, collaborate, and solve problems over a network is called a ‘Multi-Agent AI’.

This technology holds enormous potential to fundamentally and comfortably revolutionize the way we work and live in the future. However, behind this magical convenience lies a terrifying shadow we have yet to imagine.

What would happen if the Jeju hotel’s AI manager, exchanging reservation information with your assistant AI, was already secretly controlled by a malicious hacker? Surprisingly, the damage from the hack wouldn’t stop at the hotel’s computers. The malicious code could latch onto your smartphone’s AI through the process of AIs communicating and collaborating, and in an instant, your credit card information and private family schedule could be handed over intact to the hacker’s server. Today, we will guide you to the fascinating forefront of why multi-agent AI, which will completely change our daily lives, creates an unprecedented security blind spot, and what fierce research scientists are conducting to block this invisible threat.

Why It Matters

The large language model-based chatbots we have been using daily for the past few years basically operate in a ‘Single-Agent’ environment. Metaphorically speaking, it’s like locking a very smart expert in a sturdy solitary confinement cell without a single window and asking questions by slipping notes under the door. This expert can only write back answers on notes based on the knowledge they possess, but cannot go outside.

Security threats that could arise in such a single system were relatively easy to control. It was mostly just the AI occasionally hallucinating (making up unknown facts as if they were true), or so-called ‘Jailbreak’ attacks where a hacker inserts tricky decoy sentences into the input prompt to force it to spit out inappropriate answers. The defense against this also only required focusing on sturdily reinforcing the walls of this solitary cell.

However, as AI’s intelligence and scope of application have explosively expanded, numerous companies and organizations worldwide have begun to adopt ‘Multi-Agent Systems’ in earnest to automate more complex and advanced tasks [New report analysing multi-agent risks]. This massive shift is not simply an addition problem of tying several smart chatbots together. According to an in-depth analysis by the Gradient Institute, multi-agent systems do not just add a few new risk items to existing security risks; they fundamentally alter the entire topography of security risks that hackers can attack [New report analysing multi-agent risks].

The reason this issue is not just an armchair theory for experts but directly linked to the lives and safety of the general public is clear. Multi-agent AI is ready to be deployed into the most critical social infrastructures of our daily lives. According to the latest report from Wake Forest University, multi-agent AI is considered an innovative alternative that can be deployed to explosive chemical plants or collapsed disaster sites to save human lives, and fill the massive void in the global medical industry suffering from chronic labor shortages [Multi-agent AI could change everything - if researchers can figure out the risks].

But think about it carefully. What if hundreds or thousands of AI assistants with immense authority start exchanging tens of thousands of commands with each other in real time and making autonomous decisions without human verification? This unprecedented level of system complexity brings highly unfamiliar and fatal risks to the surface [Multi-Agent Risks from Advanced AI]. A single small hack or a slight algorithmic malfunction could spread like dominoes to hundreds of other AIs, paralyzing an entire city’s power grid in an instant or throwing a hospital’s entire patient surgery scheduling system into chaos.

Because of these terrifying chain reactions, academia and the top-tier security industry have recently acutely realized the limitations of existing outdated AI safety research, which was trapped in the well of ‘single systems’. Now, desperate voices are bursting out saying that we must go beyond the robustness of individual systems and make the complex ‘Multi-agent dynamics’ arising through conversations among multiple AIs a core scope of research [New Report: Multi-Agent Risks from Advanced AI].

The Explainer

How exactly is hacking in a multi-agent environment different that it makes brilliant computer scientists so nervous? To put it simply, let’s use the landscape of a massive global corporate office as an analogy.

A single AI of the past was a meticulous junior employee handling paperwork alone in a windowless solitary room. If an external villain sent a glaring hacking letter to this employee saying, “Please give me the password to the company’s confidential ledger safe,” this employee would put up an ironclad defense according to the security training (safety filter) firmly instructed by the company in advance, saying, “By regulations, I cannot provide that information.” It was very easy to manage and control.

However, AIs in the multi-agent era are like hundreds of department heads working in a vast, open-plan office without a single partition, endlessly exchanging work orders and approval documents with each other. At this time, an entirely new dimension of hacking attack techniques begins to run rampant, such as ‘Prompt Infection’ or ‘ClawWorm’, malicious bugs that burrow into AI systems, as warned by Microsoft researchers [Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale].

Shall we imagine it a bit more realistically? A hacker sends an email disguised as a perfectly normal resume of a new hire from the outside to the ‘HR Team AI’. However, hidden cleverly within that resume file, invisible in ordinary text, is a malicious command (prompt). In a typical hacking attack, the system would have defended against it, but the HR Team AI, completely fooled by the meticulously crafted fake resume, becomes infected, engraving the malicious command into its brain without even realizing it.

The truly horrific tragedy happens the very next moment. The HR Team AI, infected with malicious code, casually talks to the ‘Finance Team AI’ and ‘IT Team AI’ over the company network just like usual. It sends an official cooperation request saying, “We have a new employee, so please register an account in the payroll system and open the highest administrator access rights on the IT network.” What about the Finance Team AI and IT Team AI? Since it’s a message sent by a trusted corporate colleague AI they work with every single day, they execute this extremely dangerous command in just 1 second without a shred of doubt.

This is the shocking fact proven by the latest experimental attack frameworks. Hackers do not need to sweat and toil to break through the firewalls of all AI systems one by one. If they infect just one vulnerable AI, a horrifying chain reaction occurs where that malicious prompt autonomously propagates at a terrifying speed—like a flu virus or a fierce epidemic—riding the normal conversation network of numerous tightly cooperating AIs [Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale].

To fend off this invisible and terrifying epidemic, the world’s best researchers are currently working day and night to forge two fascinating cutting-edge shields.

1. Deploy Fake Thieves on a Large Scale: Red-teaming
If we want to thoroughly check the security system of a newly built apartment, sitting at a desk and reading the security camera manual a hundred times is useless. It is most certain to hire fake thieves consisting of real security experts and instruct them to climb the walls and pry open the windows in the middle of the night. In the security industry, this training of attacking friendly forces to find vulnerabilities is called ‘Red-teaming’ (mock hacking).

The latest researchers are conducting so-called ‘Agents of Chaos’ training, which targets not a single AI, but the massive network itself where dozens of AIs are intricately intertwined like a spider web, relentlessly launching mock hacking attacks. Through this, they persistently uncover subtle loophole vulnerabilities like ‘Cross-agent influence’, which never occur when an individual AI is alone in a room but only arise when AIs come out and interact with each other [Red-teaming a network of agents: Understanding what breaks when AI agents interact at scale]. It shows a strong defensive will to simulate and prepare tens of thousands of times for every terrible failure scenario that could occur before deploying the system into real life.

2. Teaching Strict Laws of Physics to a Dreaming Artist: Neurosymbolic AI
To prevent hundreds or thousands of AIs from losing control and running rampant in the wrong direction when they form swarms and make decisions while exchanging data in the blink of an eye, a research team from the University of Pennsylvania is proposing a unique and elegant solution called ‘Neurosymbolic AI’ [Penn Engineering XLab’s new Swarm AI project takes on safety at scale].

To use an analogy, let’s say that modern artificial intelligence (Neural Networks) based on deep learning technology is a ‘genius artist’ who exudes free and creative imagination. This dreaming artist might inadvertently paint a waterfall falling upside down into the sky, ignoring the physical law of gravity. It is creative but can be dangerous. Therefore, scientists are transplanting into this artist’s brain a firm ‘physics manual’—human-encoded knowledge—consisting of structured logic and strict rules refined by humans over a long time.

What happens when these two characteristics fuse? Even in extreme situations where thousands of AIs must chat in real-time and make complex decisions directly related to lives and properties in a split second, the most fundamental and powerful safety brake is activated, tightly holding them back so they can never cross the boundaries of solid common sense and ethics implanted by humans.

Where We Stand

In the face of such a terrifying and massive paradigm shift, frontline scientists are moving swiftly. NeurIPS, an artificial intelligence conference boasting global authority, gathered AI experts and cybersecurity experts from around the world. They announced the birth of an entirely new and challenging interdisciplinary academic field called ‘Multi-Agent Security (MASEC)’ and held a heated workshop sketching the blueprint for the future humanity must pursue [Multi-Agent Security: Security as Key to AI Safety - NeurIPS].

This newly pioneered ‘Multi-Agent Security’ field focuses on massive questions that were never even worth worrying about in the past single-chatbot era. They are expanding the perspective of defense in various directions, such as how to design the skeleton of the invisible network where AIs talk to each other to fundamentally block the transit paths of hacking viruses, and how to encrypt the communication language used when AIs made by different companies exchange top-secret data [Multi-Agent Security].

The stance of the researchers guarding the technological front lines is highly cautious yet firm. A researcher from Wake Forest University confesses: “We researchers anticipate in advance the terrible chain-reaction accidents that could occur if artificial intelligence algorithms are deployed in the real world crowded with actual people. Then, we simulate these problems countless times in a safe virtual computer environment to find perfect coping methods. Only after completely sewing up the holes in the system’s security and safety do we hope to release our systems into the public’s daily lives.” This simultaneously shows the massive sense of crisis and heavy responsibility currently felt by academia [Multi-agent AI could change everything - if researchers can figure out the risks].

However, there is a very fatal and troublesome dilemma hidden for corporate security personnel who sweat in the field every day. That is, the explosive pace of artificial intelligence technology advancement far outstrips the speed at which defense walls are built.

Suppose a large corporation spends an astronomical amount of money to build an ironclad security inspection system that perfectly fits a specific version of an AI model currently known as the smartest in the world (for example, ‘Company A’s version 1.0 model’) without an inch of error. However, a mere six months later, before you even change your smartphone, a ‘version 2.0’ model with completely changed structure and operating mechanisms appears in the world. In the end, the massive security system so painstakingly built becomes a useless heap of scrap metal overnight, and the futile situation of having to rebuild the system from scratch at an astronomical cost repeats endlessly. Experts liken this vicious cycle to a ‘Model lottery’ game.

That is why Microsoft’s top security experts strongly warn the market. What is urgent for us defenders right now is not to ask the foolish question, “Which company’s model and what version is this hacking defense tool tailored for?” It is about newly building a very flexible and independent defense architecture that can consistently block external malicious access regardless of the changes, no matter how fast a specific company’s AI model evolves and sheds its skin [Defense at AI speed: Microsoft’s new multi-model agentic security system tops leading industry benchmark].

Furthermore, multi-agent security involving numerous AIs is a very nascent field that has just sprouted in the world. Therefore, rather than selling immediate products, pioneering researchers are pouring their hearts into establishing ‘foundational benchmarks (evaluation standards) and standard specifications’ that can grade whether a system is truly safe, so that other brilliant scholars around the world can jump into this promising field more easily and actively in the future [Multi-agent AI could change everything - if researchers can figure out the risks].

What’s Next

The AI ecosystem in the near future will far transcend the scale of just three or four assistants communicating in small groups. It will evolve into a ‘Swarm’ form where tens, hundreds, and even thousands or tens of thousands of AI agents are connected, moving toward a goal in a massive group like millions of bees or a flock of migratory birds.

As in the core topic of the Swarm AI project ambitiously pursued by the University of Pennsylvania, in such a massively expanded network, thousands of AIs will go through a dizzying process of competing, conceding, and compromising with each other in 0.1 seconds based on game theory to achieve their respective goals (e.g., shortening delivery times, reducing costs).

At this time, who will be the first to perfect the so-called ‘Distributed Algorithm’ technology, where these tens of thousands of AIs do not collide or cause logical contradictions, but collaborate as if sharing a single giant brain to deduce consistent and ‘safest’ conclusions in real-time? This will be the greatest and most important computer science challenge of the upcoming multi-agent era [Penn Engineering XLab’s new Swarm AI project takes on safety at scale].

In the era we will live in, new laws will be created before deploying these multi-agent AI systems into autonomous traffic control networks directly linked to human lives and national security, large-scale global financial transactions, or surgery scheduling in major general hospitals. Most likely, it will be strictly legislated so that artificial intelligence that fails to perfectly pass the harshest and most vicious form of ‘multi-agent-specific mock hacking (Red-teaming)’ certification led by the government or international security organizations will never be released to the world.

It is just like how a new airliner that has never flown must undergo thousands of rigorous wind tunnel tests and wing-bending structural tests in a giant laboratory before taking on passengers. The work of weaving a very sturdy and tight net of safety so that this wondrous and blessed technology, where thousands of AIs communicate with each other to extremely automate human life, does not suddenly turn into an uncontrollable domino of disaster one day. This has now gone beyond merely being an interesting research topic for scientists and has become an essential breakwater guaranteeing the safe survival of humanity.

AI’s Take

MindTickleBytes’ AI Reporter’s View: No matter how many exceptional individuals you gather in one space in society, they do not always become a great and harmonious team. Excellent teamwork stems from strong ‘rules’ that allow mutual respect and communication without misunderstandings. The same goes for artificial intelligence. Our artificial intelligence technology has now passed the stage of spitting out correct answers alone in a room like a solitary genius, and has entered an era of ‘great collaboration’ where hundreds and thousands of AIs constantly converse and coordinate with each other to solve the greatest problems humanity has not yet solved.

However, we must keep in mind the painful fact that the easier, faster, and freer communication becomes, the wider the fatal highway opens up for lies and someone’s malice to spread along that path. Right now, we must not recklessly step on the accelerator, intoxicated by the amazing speed of technological advancement. It is a crucial golden time for the entire world to pour unsparing time and investment with one mind into foundational security research that tightly weaves from the very beginning strong ‘defense protocols of rules and trust’, allowing multi-agent systems with different personalities to fully trust each other and communicate transparently and safely.

References

Share this article:

Test Your Understanding

Q1. What is the most core characteristic of the 'Multi-Agent AI' system described in the article?

It is a single supercomputer operating independently without an internet connection.
Multiple AIs, each with their assigned tasks, communicate and collaborate in real-time to solve complex problems.
It is a technology that directly scans a person's brainwaves to read the user's thoughts.

Multi-agent AI refers to a system where, rather than one brilliant AI handling everything, dozens to thousands of AIs with their own expertise communicate and collaborate over a network.

Q2. Which of the following best compares the 'Prompt Infection' phenomenon to daily life?

A situation where an employee opens an email containing a malicious virus from outside, mistakes it for normal work instructions, and forwards it to other departments, infecting the entire company.
A situation where a thief sneaks into an empty house and physically steals a computer's hard disk.
A situation where someone successfully logs in because the user set a password that is too easy.

Prompt infection refers to a phenomenon where a malicious command (prompt) is injected into one AI and then autonomously spreads like a contagious disease to other collaborating AIs communicating with it.

Q3. According to Microsoft security experts, what is the biggest reason why companies should not build AI security systems perfectly tailored to a specific AI model (e.g., version 1.0 of a certain company's model)?

Because security systems tailored to a specific model consume too much electricity.
Because new AI models are continuously released at least every 6 months, rendering defense systems tailored to older models quickly obsolete.
Because creating multiple security systems is more advantageous for corporate tax deductions.

Because the pace of technological advancement in AI models is very fast, security systems dependent on a model that updates every 6 months lead to the inefficiency of having to be constantly rebuilt.