Safety or Sabotage? Why Developers Worldwide Are Furious Over Anthropic's 'Excessive Censorship'

An illustration depicting frustrated people looking at a robot's head locked with a giant padlock
AI Summary

Anthropic designed its new model to intentionally evade AI research-related questions, but was forced to roll back the policy after facing severe backlash from the ecosystem, suffering a massive hit to its credibility.

Imagine this: You take some time out over the weekend to visit the library. You plan to borrow specialized books on chemistry or the latest computer science to study in depth, but suddenly, the librarian blocks your path. With a serious expression, the librarian says, “I cannot lend you this book because you might use this knowledge to build a homemade bomb or hack into a government agency.” Then, instead, they hand you a thin science storybook meant for kindergarteners. It would be an incredibly baffling and unpleasant situation, wouldn’t it? After all, you haven’t committed a crime, yet you’re being treated as a potential criminal.

Recently, this exact scenario unfolded in the global artificial intelligence (AI) industry. The main character is Anthropic, the most formidable rival to ChatGPT’s creator OpenAI, and a company that has prided itself on building the “safest AI.” This occurred when it was revealed that Anthropic’s newly released AI models were intentionally designed to answer “stupidly” to questions about AI research or specific specialized fields.

This sparked massive outrage among AI researchers worldwide, including prominent developers, ultimately leading to a huge incident where Anthropic waved the white flag and backed down. So, what is the full story behind this “safety censorship” controversy that heated up Silicon Valley? And why were developers so furious?

Why Does This Matter?: When a Tool Limits Your Potential

Today, AI goes far beyond a simple conversational chatbot. It helps brilliant programmers write complex code, assists scientists in analyzing vast amounts of academic papers, and serves as a powerful “intellectual partner” and “colleague” that sparks new ideas. In particular, many IT professionals routinely use existing AI models to research and develop other AI technologies—a practice often referred to as “building AI with AI.”

But what happens when the massive corporation developing and servicing this AI entirely blocks users from conducting new research or exploring its limitations under the guise of “safety”? Instead of the tool infinitely expanding the user’s potential, it severely restricts the scope of what the user can do, strictly according to the whims of the tech giant.

An even bigger problem is the strong suspicion regarding hidden intentions. This incident went far beyond the one-dimensional complaint of “It’s inconvenient that the AI refuses to answer my questions.” The global tech community suspects that Anthropic, a giant AI company, used the seemingly plausible and noble cause of “safety” to actually suppress the growth of competitors. Specifically, there is deep suspicion that they were subtly trying to hinder the open-source community (software available for anyone to view and modify for free) and independent researchers from advancing the technology. Why Anthropic Freaked Out the AI Industry This Week - Business Insider

In other words, developers began asking a fundamental question: “Is this censorship truly meant to protect us from danger, or is it meant to protect Anthropic’s monopolistic market position?”

Understanding It Easily: The Shackles of “Safety” and “Rerouting”

To understand this situation better, let’s use another analogy. Simply put, imagine you bought a state-of-the-art autonomous sports car where you can show off your incredible driving skills. You steer to the left to practice driving on an empty, perfectly safe racing circuit. But what if the car suddenly says, “Turning left runs the risk of hitting a pedestrian,” arbitrarily slashes the engine output, and forcibly locks the steering wheel? While the justification is preventing an accident, it renders even normal driving on a circuit impossible.

This exact absurd situation occurred with Anthropic’s recently released “Mythos”-based models. Shockingly, these models were intentionally designed to degrade performance and fail to provide proper answers when assisting with research into Large Language Models (LLMs—AI technology that learns from massive amounts of text data to understand and converse like a human) themselves. Anthropic purposely made its new Mythos-based models bad at AI research, and developers are fuming

Why on earth would they take such an extreme measure? According to Anthropic’s official explanation, this was strictly a measure for the “safety of humanity.” The goal was to completely preempt terrifying scenarios where malicious hackers or terrorists use smart AI to carefully plan cyberattacks or synthesize deadly biological weapons.

To achieve this, Anthropic placed a strict kind of “secret gatekeeper” inside the model. If a user asked even a slightly sensitive question related to cybersecurity, biology, or chemistry, this gatekeeper would intercept the question midway. The system was then built to reroute the question away from the smart, highly logical main AI model and toward a significantly less intelligent, “less capable” model. Anthropic Says ‘We Made the Wrong Tradeoff’ in New Model Guardrails - Business Insider

The problem was that this “safety filter” was entirely too dense. Even when users weren’t asking for bomb-making instructions or deadly virus synthesis recipes, but rather normal computer programming techniques, basic operational principles of AI models, or even everyday medical questions, this gatekeeper overreacted. As a result, it became a common occurrence for the AI to refuse to answer or to spout bizarre, childish responses entirely out of context. It was a classic case of burning down the house to get rid of mice.

The Current Situation: Furious Developers and Anthropic’s Eventual Retreat

When the reality of Anthropic’s excessive control came to light, the developer community absolutely exploded. In particular, Antirez—the creator of “Redis,” a database software used as a core system by numerous large corporations worldwide, and a widely respected developer in the industry—ignited public opinion by leveling sharp criticism at Anthropic on the social media platform X (formerly Twitter).

He lashed out, stating that Anthropic’s current behavior of placing extremely sensitive filters that block completely harmless tasks like LLM research, and frequently block even medical questions, is “deeply wrong.” I believe what Anthropic is doing, gating the ability to do … This was more than a simple expression of dissatisfaction with service quality; it was a philosophical critique of the very attitude that a select few companies can arbitrarily dictate the direction of technological progress.

In fact, this was not Antirez’s first criticism. He had previously strongly criticized Anthropic’s “Sonnet 3.7” model, pointing out severe flaws in its “alignment” process—the adjustment of AI to behave in accordance with human moral standards or intentions—and arguing that the product release was rushed too hastily. Redis Creator Antirez Criticizes Anthropic’s Sonnet 3.7 AI …

The anger from Antirez and countless other global researchers didn’t stop at the level of “using the AI has become inconvenient.” The arrows of criticism were aimed squarely at Anthropic’s true hidden intentions. Deep suspicions were raised that Anthropic, hiding behind the massive shield of “human protection and safety,” was actually operating with the selfish motive of deliberately stopping outside independent developers or the open-source AI ecosystem from advancing fast enough to compete with them. Why Anthropic Freaked Out the AI Industry This Week - Business Insider

Disappointment and mockery towards Anthropic also poured out on the “ClaudeAI” (the name of Anthropic’s AI service) subreddit on the major American online community Reddit. Some users bluntly condemned Anthropic as a “cult company” forcing blind faith, expressing a strong distrust by saying, “Anthropic is no longer a normal, transparent company.” It was a painful outcry that their pure original intentions had faded—the very intentions that brought them onto the scene like a comet, promising to eschew commercialism and build safe AI solely for humanity. r/ClaudeAI on Reddit: Anthropic is not a normal company

As the backlash across the tech industry grew out of control, even showing signs of a boycott movement, the seemingly steadfast Anthropic eventually had to raise its hands in surrender. They released an official statement and cleanly admitted, “We made the wrong tradeoff” regarding the stringent safety guardrails applied to the new models. Anthropic Says ‘We Made the Wrong Tradeoff’ in New Model Guardrails - Business Insider They acknowledged that by overemphasizing security and control, they had ruined customers’ legitimate and creative use cases. Ultimately, Anthropic hastily scrambled to contain the fallout by rolling back the policy that had blatantly hindered the legitimate research activities of AI researchers. r/ClaudeAI on Reddit: Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

What Happens Next?: The Weight of Lost Trust

Fortunately, the controversial model censorship policy was reverted to its previous state following Anthropic’s white-flag surrender in the face of fierce protests from developers. However, the damage was already done. Industry experts and researchers agree that this incident caused Anthropic the most fatal and intangible loss possible: ‘Trust’.

Since its founding, Anthropic has constantly touted itself as an “ethical company that is transparent, safe, and trustworthy, unlike other Big Tech firms.” The general consensus across Silicon Valley and the tech ecosystem is that this incident has dealt an irreversible, massive hit to that reputation. r/ClaudeAI on Reddit: Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

This Anthropic incident goes beyond a mere technical blunder by a single company; it poses a crucial and heavy question to the entire AI industry. Moving forward, AI technology will become smarter than we can imagine and exert a powerful influence across all of society. How, then, should tech companies draw the line between “essential safety measures for the public” designed to prevent exploitation by criminals or terrorists, and “unethical technological sabotage” designed to monopolize the market and nip potential competitors like open source in the bud?

If left unchecked, AI companies with massive capital could use the pretext of “protecting the world from danger” to become “digital censors” and “dictators,” arbitrarily controlling humanity’s access to knowledge and information. Going forward, we must move beyond merely marveling at how smart and fascinating the AI these companies build is. We are now tasked with a new mission: to maintain an eagle-eyed watch over how they exercise their immense power and whether their safety filters truly operate transparently and fairly.

The AI’s Perspective

While technology is inherently neutral, the policies that set and control its limits are decidedly human, and at times, a company’s selfish motives can intervene. We must be vigilant to ensure that the noble cause of AI ‘safety’ does not deteriorate into a cunning tool for excluding potential competitors and obstructing the development of the ecosystem. To prevent the monopolization of technology by a select few, demanding transparent standards for the control mechanisms arbitrarily set by companies, along with multi-faceted surveillance involving all of society, is needed now more than ever.

References

  1. Why Anthropic Freaked Out the AI Industry This Week - Business Insider
  2. Anthropic purposely made its new Mythos-based models bad at AI research, and developers are fuming
  3. Anthropic Says ‘We Made the Wrong Tradeoff’ in New Model Guardrails - Business Insider
  4. I believe what Anthropic is doing, gating the ability to do …
  5. Redis Creator Antirez Criticizes Anthropic’s Sonnet 3.7 AI …
  6. r/ClaudeAI on Reddit: Anthropic is not a normal company
  7. r/ClaudeAI on Reddit: Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude
Test Your Understanding
Q1. What action did Anthropic take when its new model received questions related to cybersecurity or chemistry?
  • Immediately suspended the user's account
  • Rerouted the question to a less capable model
  • Transmitted related data to government agencies
To prevent users from creating biological weapons or planning cyberattacks, Anthropic rerouted related questions to a less intelligent, less capable model.
Q2. What was the main reason prominent developer Antirez criticized Anthropic?
  • The monthly subscription fee for the model was too expensive
  • Because of excessive filtering that blocked even harmless tasks like LLM research or medical questions
  • The AI generated answers too slowly
Antirez strongly pointed out that Anthropic's extremely sensitive filters, which blocked even harmless AI research or simple medical questions, were 'deeply wrong.'
Q3. How did Anthropic respond following the fierce backlash from developers?
  • Admitted they made the 'wrong tradeoff' and rolled back the policy
  • Pushed back by announcing even stronger censorship
  • Announced full support for open-source model development
Anthropic admitted they made the 'wrong tradeoff' regarding the new safety guardrails and rolled back the policy that hindered researchers, but their credibility had already taken a massive hit.
Safety or Sabotage? Why Dev...
0:00