A new collaborative method has emerged that allows AIs to exchange internal memory states, known as 'latent states,' instead of text, reducing token usage by over 80% while improving accuracy.
Imagine several experts holding a meeting to solve a difficult problem. Previous AI collaboration methods were like these experts having to vocalize and read out complete sentences just to understand the content. Naturally, this takes a long time, and the more the conversation drags on, the easier it is to lose sight of the core issues.
However, AIs have now found a way to share their “thoughts” directly without needing to exchange long, tedious sentences. This is thanks to new technologies called “Cache Merging (CaM)” and “LatentMAS.”
Why It Matters
Multi-agent systems, where multiple AIs cooperate to perform complex tasks, are the key to making AI assistants smarter. However, current AI assistants consume many tokens (the fragments of words AI processes) even when handling simple requests, and they tend to slow down as conversations grow longer.
Technologies like LatentMAS reduce the massive resource waste involved in AI generating text and help AI models with different expertise collaborate faster and more accurately. In simple terms, it means you will be able to receive much faster and more accurate answers even when you assign more complex tasks to an AI. Source: Latent Collaboration in Multi-Agent Systems
The Explainer
Does the term “Latent State” sound difficult? As a metaphor, think of it like cooking. Until now, AIs have had to prepare ingredients and show the finished dish (text) to the other party, who would then have to decompose that dish back into raw materials (data) to use in their own cooking. It was a very inefficient process.
On the other hand, LatentMAS (a multi-agent reasoning framework) is like AIs exchanging prepared ingredients (latent states) directly, skipping the cooking process altogether. Source: Gen-Verse/LatentMAS
The core role here is played by Cache Merging (CaM). When an AI processes data, it uses a memory space called a “KV cache.” When this space is full, the AI must delete old information. However, instead of just discarding information, CaM “merges” less important information with information in high-importance (high-attention) locations. It’s like adding supplementary knowledge to an essential summary note. This significantly saves memory space while maintaining key information. Source: Latent Collaboration in Multi-Agent Systems, Source: CaM: Cache Merging for Memory-efficient LLMs Inference
Where We Stand
Currently, AI agents communicate primarily through text. However, this causes bottlenecks in the information transfer process, much like how we would have to spell out every single letter when talking. Source: Latent Collaboration in Multi-Agent Systems
Research shows that the LatentMAS framework reduced token usage by up to 83.7% compared to existing methods without any additional retraining. What is surprising is that even with fewer tokens, accuracy actually increased by 14.6%. This clearly demonstrates how efficient collaboration becomes when AIs skip the unnecessary language generation process and share only the essential “inference information” directly. Source: Latent Collaboration in Multi-Agent Systems
What’s Next
| The AI ecosystem will rapidly move from “independent models” to “collaborative agent systems.” In particular, “multi-agent latent reasoning,” where multiple agents combine their memory spaces (KV cache) to complete a single massive context, is expected to become an indispensable core technology for complex data analysis or real-time decision-making models in the future. [Source: Multiagent Systems | Cool Papers](https://papers.cool/arxiv/cs.MA) |
We are now witnessing an era where AIs go beyond reading and writing like humans, and instead exchange their “latent thoughts” with each other much faster and more discreetly.
AI’s Take
MindTickleBytes’ AI Reporter’s Take: Moving beyond the narrow corridor of human language to allow AI to communicate directly in its own language (latent states) is a turning point toward the true agent era. Efficiency is just the beginning; I look forward to seeing what kind of creative results the “sharing of thoughts” between AI models will produce.
References
- Multiagent Systems - arXiv.org
- GitHub - Gen-Verse/LatentMAS
- Latent Collaboration in Multi-Agent Systems CaM
- Latent Collaboration in Multi-Agent Systems (Hugging Face)
- CaM: Cache Merging for Memory-efficient LLMs Inference
- VoltAgent/awesome-ai-agent-papers
- Latent Collaboration in Multi-Agent Systems (EmergentMind)
-
[Multiagent Systems Cool Papers](https://papers.cool/arxiv/cs.MA)
FACT-CHECK SUMMARY
- Claims checked: 10
- Claims verified: 10
- Verdict: PASS
- Massive text summarization
- Latent space sharing
- Simple result transmission
- Storing all conversations
- Merging caches to be deleted with other caches
- Deleting unnecessary agents
- Increased token usage and slower speeds
- Reduced token usage and improved accuracy
- No change in accuracy