Is AI's 'Brain Structure' Changing? The Reality of Google's Newly Released T5Gemma

An image combining the Google logo with abstract graphics representing the encoder-decoder architecture
AI Summary

Google has introduced the 'T5Gemma' series, a new collection of encoder-decoder AI models that break away from traditional 'read-only' AI structures to deeply understand information, summarize it, and even process images.

Introduction: Two Ways of AI ‘Thinking’

Imagine this. You have a very difficult and thick English report in front of you. What would you do if you had to translate this content into another language or summarize it into a single sentence?

Most likely, you would first meticulously ‘read and understand’ the entire report, and then, based on that core content, organize it in your mind to ‘output’ a new sentence. Interestingly, most of the latest AIs we have used until now, such as ChatGPT, have focused more on statistically ‘predicting’ the next word rather than this process of ‘deep reading.’

Recently, Google went back to basics and announced a new AI model series, ‘T5Gemma’, which maximizes the ability to deeply understand and organize information. T5Gemma: A new collection of encoder-decoder Gemma models Why did Google pull out a ‘classic structure’ while leaving behind the successful existing methods? What changes will occur in our daily lives? Let’s break it down one by one, as if a smart friend were explaining it to you.

Why It Matters

The performance of the AI we use is determined by its ‘blueprints,’ the architecture (the structural design of the AI). In recent years, an architecture called ‘Decoder-only’ has been the mainstream. This is because it is advantageous for making sentences flow smoothly, making it very suitable for chatbots that are good at chatting.

However, Google’s newly introduced T5Gemma has revived the ‘Encoder-Decoder (a structure divided into a part that receives information and identifies its meaning, and a part that outputs results based on that meaning)’ method. Google Releases T5Gemma, Reigniting the Architecture War!

In simple terms, while existing AIs focused on “What should I say next?”, this new structure is designed to first consider “What is the real meaning of what the other person said?” To use a metaphor, it is closer to a prudent expert who listens to the other person to the end and points out the core, rather than a fast-talking eloquent speaker. This structure is particularly effective in tasks such as:

  • Sophisticated Translation: Translating after perfectly grasping the context before and after the entire sentence.
  • Core Summarization: Excellent ability to pick out only the truly important points from a vast pile of information.
  • Reasoning and Answering: Deeply identifying the hidden intent of a question to provide a logical answer.

The era of ‘smart AI that properly understands and organizes content’, beyond just AI that speaks well, has opened up again. Unveiling T5Gemma: Google’s New Encoder-Decoder Gemma Models

Understanding Easily: Cooperation Between the ‘Reading Brain’ and ‘Speaking Brain’

Let’s explain the ‘encoder-decoder’ structure, the core of T5Gemma, with a more specific metaphor.

If the decoder-only model, which has been the mainstream, is an “outstanding novelist who looks at the preceding words and is very good at guessing the next word,” this T5Gemma is like a “skilled researcher who writes a clear report after perfectly understanding professional content.” T5Gemma: A new collection of encoder-decoder Gemma models

Here, the encoder thoroughly scans the information we give and creates a sophisticated map of its ‘meaning’ in numbers. Then, the decoder looks at that map, finds the correct destination (the answer), and creates a new sentence. Because the two parts clearly divide and handle their roles, it is much more efficient at understanding complex contexts. Gemma— Google DeepMind

The Magic of ‘Adaptation’

The amazing thing is that Google didn’t create this model completely from scratch. They took existing ‘decoder-only’ models (Gemma 2 or Gemma 3) whose performance had already been verified and transformed them into an encoder-decoder structure through a special technique called ‘Adaptation (converting a model to fit a specific purpose)’. T5Gemma: A new collection of encoder-decoder Gemma models

To use a metaphor, it is similar to giving a right-handed chef special training to use their left hand as well, and having them reborn as an ‘ambidextrous chef’ who uses both hands freely. To achieve this, Google used a massive amount of data snippets—about 2 trillion (2T) UL2 tokens—to train and rearrange their brain structure. T5Gemma 2: Seeing, Reading, and Understanding Longer

Current Status: Smaller Yet Smarter?

With the latest version, T5Gemma 2, the technology has evolved one step further. It has moved beyond just reading text to possess all-around capabilities for ‘Seeing, Reading, and Understanding Longer’. T5Gemma 2: Seeing, Reading, and Understanding Longer

The main features of T5Gemma 2 are as follows:

  1. AI That Opened Its Eyes (Vision capabilities): It can now see complex images or charts, in addition to text, to identify their content, explain them, or answer questions. T5Gemma 2: The next generation of encoder-decoder models
  2. Successful Dieting (Efficiency): It applied ‘tied embeddings’ technology, where the encoder and decoder share redundant information. Thanks to this, performance actually improved while successfully reducing the AI’s weight (number of parameters) by 10.5%. T5Gemma 2: Google’s Encoder-Decoder Revival… - Banandre
  3. No Problem with Long Sentences (Long-context): It inherited the ability to understand very long texts or documents reaching hundreds of pages without losing the flow from beginning to end. Encoder-Decoders and Byte LLMs: T5Gemma 2 and AI2’s New Models

In addition, latest technologies such as GQA (Grouped Query Attention), which speeds up information processing, and RoPE (Rotary Positional Embeddings), which identifies the positional relationship of words more accurately, have been applied to maximize processing efficiency. T5Gemma - Hugging Face

What’s Next?

The appearance of the T5Gemma series foretells that the apps we use in our daily lives will become lighter and smarter.

Existing giant models were too heavy and had to go through massive data centers, incurring a lot of cost and energy in the process. However, compact and powerful models like T5Gemma 2 can run smoothly even inside the smartphones or laptops in our hands. T5Gemma 2: The next generation of encoder-decoder models

In particular, multilingual support capabilities, which naturally bridge various languages, have been significantly strengthened. Soon, everyone will be able to conveniently enjoy services that more accurately translate and summarize documents in any language, anywhere in the world. T5Gemma 2: Seeing, Reading, and Understanding Longer

AI’s Take

In the view of MindTickleBytes’ AI reporter, T5Gemma is like the AI version of the saying ‘fashion goes in cycles.’ Instead of just chasing what is flashy and new, Google’s strategy of reinterpreting a great structure from the past with modern, overwhelming technology to maximize practicality is very clever.

This is not just a technical change. If the AI assistant in our smartphones starts reading information in the photos we take and perfectly summarizes complex work documents in just 3 seconds, the revival of this ‘encoder-decoder’ that has begun to focus on ‘understanding’ will be in the background. It could be seen as a process of AI becoming better at ‘getting what you mean’ rather than just becoming smarter.


References

  1. T5Gemma: A new collection of encoder-decoder Gemma models
  2. Gemma— Google DeepMind
  3. T5Gemma: A new collection of encoder-decoder Gemma models (Engineering.fyi)
  4. T5Gemma 2: Seeing, Reading, and Understanding Longer (Arxiv PDF)
  5. T5Gemma · Hugging Face
  6. Google Releases T5Gemma, Reigniting the Architecture War!
  7. T5Gemma Revolutionizes LLM Efficiency: How Encoder-Decoder…
  8. T5Gemma 2: Google’s Encoder-Decoder Revival… - Banandre
  9. T5Gemma 2: The next generation of encoder-decoder models (Google Blog)
  10. T5Gemma 2: Seeing, Reading, and Understanding Longer (Arxiv Abstract)
  11. Unveiling T5Gemma: Google’s New Encoder-Decoder Gemma Models
  12. T5Gemma - Hugging Face (Main Doc)
  13. [How Will T5Gemma Transform Encoder-Decoder Models? Analytics India Mag](https://analyticsindiamag.com/ai-news-updates/google-launches-t5gemma-to-reclaim-encoder-decoder-architecture-benefits/)
  14. Encoder-Decoders and Byte LLMs: T5Gemma 2 and AI2’s New Models

FACT-CHECK SUMMARY

  • Claims checked: 21
  • Claims verified: 21
  • Verdict: PASS
Test Your Understanding
Q1. Which existing models are the T5Gemma series based on?
  • GPT-4
  • Gemma 2 and Gemma 3
  • Llama 3
T5Gemma is based on the Gemma 2 architecture, while the latest version, T5Gemma 2, was created by converting Gemma 3 models.
Q2. What was the secret to reducing the number of 'parameters' by 10.5% in the T5Gemma 2 model?
  • Reducing data size
  • Sharing information between the encoder and decoder (tied embeddings)
  • Giving up language support
By using 'tied embeddings' technology between the encoder and decoder to share redundant information, the size was reduced without sacrificing performance.
Q3. What new capability does T5Gemma 2 have compared to its previous version?
  • Music composition ability
  • Vision capabilities to see and read images
  • Game-playing ability
T5Gemma 2 possesses vision-language capabilities, allowing it to see and understand images and strengthening its ability to grasp long contexts.