The 'Anonymous AI Knight' Who Beat Tech Giant Google Ascends to the Terminal Throne

AI Summary

Equipped with Google Gemini 3, the open-source AI agent Dirac has broken the world record in 'terminal' control tests, a domain previously reserved for computer experts.

Imagine being left alone in the control room of a massive factory filled with complex machinery. Switches are everywhere, and cryptic code flows incessantly across the screens. This is the heart that drives everything in the factory, but it’s a terrifying space that you wouldn’t dare touch unless you were a highly skilled technician.

A similar “secret control room” exists inside the computers we use every day. It is the Terminal (a window where you control the computer by typing commands directly), often filled with nothing but white text on a black screen. While average users use computers by clicking pretty icons with a mouse, true experts use this terminal tool to manipulate the computer’s skeleton and design complex systems.

Recently, however, an event occurred in this sanctuary of experts that has shocked the world. An “anonymous open-source AI” created by an individual developer surpassed the official AI made by a giant like Google to become the smartest terminal expert in the world. It’s a plot twist akin to a local restaurant chef winning a cooking competition against 3-star Michelin chefs.

Why is this important? “From Speaking AI to Acting AI”

Until now, the AIs we’ve encountered, such as ChatGPT and Gemini, have primarily been entities that are good at “talking.” They were very proficient at requests like “write a poem,” “translate English,” or “summarize a long text.” However, they remained unreliable when entrusted with practical tasks like “organize 1,000 messy files on my computer by content and install the necessary programs.”

The AI agent named Dirac, which has recently become a hot topic, is on a different level. According to Dirac OSS Agent Crushes Google’s Baseline on TerminalBench, Dirac has proven its ability to access the deepest parts of a computer—the terminal—to issue complex commands, manage files, and solve problems on its own.

In simple terms, this means AI has evolved beyond a “talkative secretary” that just provides information into a “competent agent” that manages my computer and performs complex technical tasks on my behalf. Developers around the world are particularly excited that an Open Source model—where anyone can look at the blueprints and use it for free—took first place over the paid services of a conglomerate backed by trillions of won in capital.

Easy Understanding: The AI ‘Driver’s License Test,’ TerminalBench

To measure how smart an AI is, experts put them through various “exams.” The test Dirac topped is TerminalBench 2.0. Open-Source AIAgentTopsTerminalBench2.0 Leaderboard

This test can be compared to a “high-difficulty driving test for AI.” However, instead of a car, it involves driving the extremely demanding and complex device known as a “computer terminal.” The test items include challenges that would make even experts break a sweat: OSS Agent Tops TerminalBench with Gemini-3 - PromptZone

Shell Scripting: Writing a series of steps for the computer to follow in order (like writing a recipe for a complex dish to be eaten by tens of thousands without a single error).
File Management: The detailed task of finding minute differences among tens of thousands of files to select, move, and modify what is needed.
System Configuration: The high-level task of completely overhauling the computer’s internal environment to suit a specific purpose.

Developer ‘umair24171’ noted, “Most AI exams are often just window dressing that asks for simple knowledge, but TerminalBench is a real skill test to gauge whether an AI can actually ‘work’.” Gemini-3-Flash: My aiagentbenchmarkterminalbenchWin & 3 Fixes

Current Status: David Beats Goliath with a Surprising Score Gap

The results of this showdown sent shockwaves through the entire IT industry. It’s as if a student who studied by finding their own path beat an elite student from a wealthy family—who always took first place—by an overwhelming margin. Let’s look at the actual report card:

Dirac: 65.2% (Open-source based, available to anyone) r/GoogleGeminiAI on Reddit: Open Source Agent I built topped the TerminalBench 2.0 on Gemini-3-flash-preview
Junie CLI: 64.3% (An existing, expensive commercial model)
Google Official Record: 47.8% (The result of Google testing its own model)

Surprisingly, Dirac recorded a score 17.4 percentage points higher than the official record set by Google. In terms of a school exam, while Google scored 48, Dirac scored over 65. r/GoogleGeminiAI on Reddit: Open Source Agent I built topped the TerminalBench 2.0 on Gemini-3-flash-preview

The hidden helper in this victory is actually the brain of Google’s latest AI, the Gemini-3-flash-preview model. Dirac OSS Agent Crushes Google’s Baseline on TerminalBench Gemini-3 Flash is Google’s ambitious project designed to operate much faster and smarter than previous models when performing complex coding and system tasks. Gemini3Flash— Google DeepMind

The crucial point, however, is that while Google itself stayed in the 40s because it couldn’t properly utilize this excellent engine, developer Max Trivedi achieved world-class performance by precisely tuning and optimizing it. And he did so while keeping all the blueprints public, without any tricks. ShowHN:OSSAgentIbuilttoppedtheTerminalBenchon…

What Lies Ahead? The ‘Universal Handyman’ AI Coming to Our Side

Dirac’s success clearly illustrates two futures we are about to encounter.

First, AI will become the ‘universal computer handyman’ in our homes. Imagine a scene where, when your computer suddenly slows down or an unknown error window pops up, you tell an AI agent, “Find the cause of this problem in the terminal and fix it,” instead of calling an expert and paying high repair fees. The era where AI scans tens of thousands of lines of code in a black screen and finishes repairs in a minute is not far off.

Second, the ‘power of building together’ wins over corporate monopoly. This is because it has been confirmed that if people around the world collectively deliberate and improve better ways to utilize an engine (agent architecture)—even if it’s an engine borrowed from Google—the result can be far superior to what a company develops secretly on its own.

Of course, there is still a way to go. A score of 65.2% still means it can make mistakes about 3 times out of 10. A mistake in the terminal carries the risk of accidentally deleting precious family photos or important work files. That is why developers continue their research today to create more perfect ‘safety devices’ so that AI never makes a mistake.

AI’s Perspective: Through the Eyes of MindTickleBytes’ AI Reporter

“Dirac’s victory is not just a battle of numbers. It is an event that proves that the powerful tool of AI is not the exclusive property of a specific conglomerate, but shines brightest when the wisdom and curiosity of us all are gathered. Now, moving past the era of worrying about ‘what to ask AI,’ we are standing on the threshold of a true ‘Agent Era’ where we must consider ‘what difficult tasks on my computer to entrust to AI.’”

References

FACT-CHECK SUMMARY

Claims checked: 15
Claims verified: 15
Verdict: PASS

Share this article:

Test Your Understanding

Q1. What is the name of the open-source AI agent that recently broke the world record?

Gemini CLI
Dirac
Junie CLI

Dirac is an open-source AI agent developed by Max Trivedi of Dirac Delta Labs.

Q2. What is the name of the test that evaluates an AI's ability to perform terminal tasks?

TerminalBench 2.0
Gemini Test
Hacker News Benchmark

TerminalBench is a benchmark that evaluates how well an AI performs file management or scripting in a command-line interface.

Q3. What was Dirac's success rate in this test?

47.8%
64.3%
65.2%

Dirac recorded a success rate of 65.2%, significantly outpacing Google's official record of 47.8%.