Google has introduced the 'T5Gemma' series, a new collection of encoder-decoder AI models that break away from traditional 'read-only' AI structures to deeply understand information, summarize it, and even process images.
Introduction: Two Ways of AI ‘Thinking’
Imagine this. You have a very difficult and thick English report in front of you. What would you do if you had to translate this content into another language or summarize it into a single sentence?
Most likely, you would first meticulously ‘read and understand’ the entire report, and then, based on that core content, organize it in your mind to ‘output’ a new sentence. Interestingly, most of the latest AIs we have used until now, such as ChatGPT, have focused more on statistically ‘predicting’ the next word rather than this process of ‘deep reading.’
Recently, Google went back to basics and announced a new AI model series, ‘T5Gemma’, which maximizes the ability to deeply understand and organize information. T5Gemma: A new collection of encoder-decoder Gemma models Why did Google pull out a ‘classic structure’ while leaving behind the successful existing methods? What changes will occur in our daily lives? Let’s break it down one by one, as if a smart friend were explaining it to you.
Why It Matters
The performance of the AI we use is determined by its ‘blueprints,’ the architecture (the structural design of the AI). In recent years, an architecture called ‘Decoder-only’ has been the mainstream. This is because it is advantageous for making sentences flow smoothly, making it very suitable for chatbots that are good at chatting.
However, Google’s newly introduced T5Gemma has revived the ‘Encoder-Decoder (a structure divided into a part that receives information and identifies its meaning, and a part that outputs results based on that meaning)’ method. Google Releases T5Gemma, Reigniting the Architecture War!
In simple terms, while existing AIs focused on “What should I say next?”, this new structure is designed to first consider “What is the real meaning of what the other person said?” To use a metaphor, it is closer to a prudent expert who listens to the other person to the end and points out the core, rather than a fast-talking eloquent speaker. This structure is particularly effective in tasks such as:
- Sophisticated Translation: Translating after perfectly grasping the context before and after the entire sentence.
- Core Summarization: Excellent ability to pick out only the truly important points from a vast pile of information.
- Reasoning and Answering: Deeply identifying the hidden intent of a question to provide a logical answer.
The era of ‘smart AI that properly understands and organizes content’, beyond just AI that speaks well, has opened up again. Unveiling T5Gemma: Google’s New Encoder-Decoder Gemma Models
Understanding Easily: Cooperation Between the ‘Reading Brain’ and ‘Speaking Brain’
Let’s explain the ‘encoder-decoder’ structure, the core of T5Gemma, with a more specific metaphor.
If the decoder-only model, which has been the mainstream, is an “outstanding novelist who looks at the preceding words and is very good at guessing the next word,” this T5Gemma is like a “skilled researcher who writes a clear report after perfectly understanding professional content.” T5Gemma: A new collection of encoder-decoder Gemma models
Here, the encoder thoroughly scans the information we give and creates a sophisticated map of its ‘meaning’ in numbers. Then, the decoder looks at that map, finds the correct destination (the answer), and creates a new sentence. Because the two parts clearly divide and handle their roles, it is much more efficient at understanding complex contexts. Gemma— Google DeepMind
The Magic of ‘Adaptation’
The amazing thing is that Google didn’t create this model completely from scratch. They took existing ‘decoder-only’ models (Gemma 2 or Gemma 3) whose performance had already been verified and transformed them into an encoder-decoder structure through a special technique called ‘Adaptation (converting a model to fit a specific purpose)’. T5Gemma: A new collection of encoder-decoder Gemma models
To use a metaphor, it is similar to giving a right-handed chef special training to use their left hand as well, and having them reborn as an ‘ambidextrous chef’ who uses both hands freely. To achieve this, Google used a massive amount of data snippets—about 2 trillion (2T) UL2 tokens—to train and rearrange their brain structure. T5Gemma 2: Seeing, Reading, and Understanding Longer
Current Status: Smaller Yet Smarter?
With the latest version, T5Gemma 2, the technology has evolved one step further. It has moved beyond just reading text to possess all-around capabilities for ‘Seeing, Reading, and Understanding Longer’. T5Gemma 2: Seeing, Reading, and Understanding Longer
The main features of T5Gemma 2 are as follows:
- AI That Opened Its Eyes (Vision capabilities): It can now see complex images or charts, in addition to text, to identify their content, explain them, or answer questions. T5Gemma 2: The next generation of encoder-decoder models
- Successful Dieting (Efficiency): It applied ‘tied embeddings’ technology, where the encoder and decoder share redundant information. Thanks to this, performance actually improved while successfully reducing the AI’s weight (number of parameters) by 10.5%. T5Gemma 2: Google’s Encoder-Decoder Revival… - Banandre
- No Problem with Long Sentences (Long-context): It inherited the ability to understand very long texts or documents reaching hundreds of pages without losing the flow from beginning to end. Encoder-Decoders and Byte LLMs: T5Gemma 2 and AI2’s New Models
In addition, latest technologies such as GQA (Grouped Query Attention), which speeds up information processing, and RoPE (Rotary Positional Embeddings), which identifies the positional relationship of words more accurately, have been applied to maximize processing efficiency. T5Gemma - Hugging Face
What’s Next?
The appearance of the T5Gemma series foretells that the apps we use in our daily lives will become lighter and smarter.
Existing giant models were too heavy and had to go through massive data centers, incurring a lot of cost and energy in the process. However, compact and powerful models like T5Gemma 2 can run smoothly even inside the smartphones or laptops in our hands. T5Gemma 2: The next generation of encoder-decoder models
In particular, multilingual support capabilities, which naturally bridge various languages, have been significantly strengthened. Soon, everyone will be able to conveniently enjoy services that more accurately translate and summarize documents in any language, anywhere in the world. T5Gemma 2: Seeing, Reading, and Understanding Longer
AI’s Take
In the view of MindTickleBytes’ AI reporter, T5Gemma is like the AI version of the saying ‘fashion goes in cycles.’ Instead of just chasing what is flashy and new, Google’s strategy of reinterpreting a great structure from the past with modern, overwhelming technology to maximize practicality is very clever.
This is not just a technical change. If the AI assistant in our smartphones starts reading information in the photos we take and perfectly summarizes complex work documents in just 3 seconds, the revival of this ‘encoder-decoder’ that has begun to focus on ‘understanding’ will be in the background. It could be seen as a process of AI becoming better at ‘getting what you mean’ rather than just becoming smarter.
References
- T5Gemma: A new collection of encoder-decoder Gemma models
- Gemma— Google DeepMind
- T5Gemma: A new collection of encoder-decoder Gemma models (Engineering.fyi)
- T5Gemma 2: Seeing, Reading, and Understanding Longer (Arxiv PDF)
- T5Gemma · Hugging Face
- Google Releases T5Gemma, Reigniting the Architecture War!
- T5Gemma Revolutionizes LLM Efficiency: How Encoder-Decoder…
- T5Gemma 2: Google’s Encoder-Decoder Revival… - Banandre
- T5Gemma 2: The next generation of encoder-decoder models (Google Blog)
- T5Gemma 2: Seeing, Reading, and Understanding Longer (Arxiv Abstract)
- Unveiling T5Gemma: Google’s New Encoder-Decoder Gemma Models
- T5Gemma - Hugging Face (Main Doc)
-
[How Will T5Gemma Transform Encoder-Decoder Models? Analytics India Mag](https://analyticsindiamag.com/ai-news-updates/google-launches-t5gemma-to-reclaim-encoder-decoder-architecture-benefits/) - Encoder-Decoders and Byte LLMs: T5Gemma 2 and AI2’s New Models
FACT-CHECK SUMMARY
- Claims checked: 21
- Claims verified: 21
- Verdict: PASS
- GPT-4
- Gemma 2 and Gemma 3
- Llama 3
- Reducing data size
- Sharing information between the encoder and decoder (tied embeddings)
- Giving up language support
- Music composition ability
- Vision capabilities to see and read images
- Game-playing ability