Are AIs Reading Each Other’s Minds? The Secret of Multi-Agent AI Made Smarter by 'Cache Merging'

AI Summary

A new collaborative method has emerged that allows AIs to exchange internal memory states, known as 'latent states,' instead of text, reducing token usage by over 80% while improving accuracy.

Imagine several experts holding a meeting to solve a difficult problem. Previous AI collaboration methods were like these experts having to vocalize and read out complete sentences just to understand the content. Naturally, this takes a long time, and the more the conversation drags on, the easier it is to lose sight of the core issues.

However, AIs have now found a way to share their “thoughts” directly without needing to exchange long, tedious sentences. This is thanks to new technologies called “Cache Merging (CaM)” and “LatentMAS.”

Why It Matters

Multi-agent systems, where multiple AIs cooperate to perform complex tasks, are the key to making AI assistants smarter. However, current AI assistants consume many tokens (the fragments of words AI processes) even when handling simple requests, and they tend to slow down as conversations grow longer.

Technologies like LatentMAS reduce the massive resource waste involved in AI generating text and help AI models with different expertise collaborate faster and more accurately. In simple terms, it means you will be able to receive much faster and more accurate answers even when you assign more complex tasks to an AI. Source: Latent Collaboration in Multi-Agent Systems

The Explainer

Does the term “Latent State” sound difficult? As a metaphor, think of it like cooking. Until now, AIs have had to prepare ingredients and show the finished dish (text) to the other party, who would then have to decompose that dish back into raw materials (data) to use in their own cooking. It was a very inefficient process.

On the other hand, LatentMAS (a multi-agent reasoning framework) is like AIs exchanging prepared ingredients (latent states) directly, skipping the cooking process altogether. Source: Gen-Verse/LatentMAS

The core role here is played by Cache Merging (CaM). When an AI processes data, it uses a memory space called a “KV cache.” When this space is full, the AI must delete old information. However, instead of just discarding information, CaM “merges” less important information with information in high-importance (high-attention) locations. It’s like adding supplementary knowledge to an essential summary note. This significantly saves memory space while maintaining key information. Source: Latent Collaboration in Multi-Agent Systems, Source: CaM: Cache Merging for Memory-efficient LLMs Inference

Where We Stand

Currently, AI agents communicate primarily through text. However, this causes bottlenecks in the information transfer process, much like how we would have to spell out every single letter when talking. Source: Latent Collaboration in Multi-Agent Systems

Research shows that the LatentMAS framework reduced token usage by up to 83.7% compared to existing methods without any additional retraining. What is surprising is that even with fewer tokens, accuracy actually increased by 14.6%. This clearly demonstrates how efficient collaboration becomes when AIs skip the unnecessary language generation process and share only the essential “inference information” directly. Source: Latent Collaboration in Multi-Agent Systems

What’s Next

The AI ecosystem will rapidly move from “independent models” to “collaborative agent systems.” In particular, “multi-agent latent reasoning,” where multiple agents combine their memory spaces (KV cache) to complete a single massive context, is expected to become an indispensable core technology for complex data analysis or real-time decision-making models in the future. [Source: Multiagent Systems

Cool Papers](https://papers.cool/arxiv/cs.MA)

We are now witnessing an era where AIs go beyond reading and writing like humans, and instead exchange their “latent thoughts” with each other much faster and more discreetly.

AI’s Take

MindTickleBytes’ AI Reporter’s Take: Moving beyond the narrow corridor of human language to allow AI to communicate directly in its own language (latent states) is a turning point toward the true agent era. Efficiency is just the beginning; I look forward to seeing what kind of creative results the “sharing of thoughts” between AI models will produce.

References

Multiagent Systems - arXiv.org
GitHub - Gen-Verse/LatentMAS
Latent Collaboration in Multi-Agent Systems CaM
Latent Collaboration in Multi-Agent Systems (Hugging Face)
CaM: Cache Merging for Memory-efficient LLMs Inference
VoltAgent/awesome-ai-agent-papers
Latent Collaboration in Multi-Agent Systems (EmergentMind)
[Multiagent Systems Cool Papers](https://papers.cool/arxiv/cs.MA)

FACT-CHECK SUMMARY

Claims checked: 10
Claims verified: 10
Verdict: PASS

Share this article:

Test Your Understanding

Q1. What communication method does the LatentMAS framework use for AI collaboration?

Massive text summarization
Latent space sharing
Simple result transmission

LatentMAS performs efficient collaboration by sharing the latent states within the models instead of text-based communication.

Q2. What is the core method by which Cache Merging (CaM) technology increases efficiency?

Storing all conversations
Merging caches to be deleted with other caches
Deleting unnecessary agents

Cache Merging (CaM) maximizes memory efficiency by merging low-importance caches into high-attention locations instead of discarding them.

Q3. What effects can be expected from introducing LatentMAS?

Increased token usage and slower speeds
Reduced token usage and improved accuracy
No change in accuracy

LatentMAS demonstrated results of reducing token usage by up to 83.7% while increasing accuracy by 14.6%.