I Build on Anthropic's Infrastructure. Then Their System Prompt Leaked.

The Claude Fable 5 system prompt leaked — including a 'silent degradation' clause. I build on Anthropic's infrastructure. Here's what it made me reckon with.

Feng Liu
Feng Liu
2 juli 2026·5 min läsning
I Build on Anthropic's Infrastructure. Then Their System Prompt Leaked.

Last week, a researcher called 'Pliny the Liberator' posted Anthropic's entire Claude Fable 5 system prompt to GitHub. All 120,000 characters of it, within 24 hours of the model's launch.

Anthropic tried to take it down. In doing so, they accidentally deleted 8,100 unrelated GitHub repositories.

I read through the coverage carefully, because VibeCom runs on Anthropic's infrastructure. What Anthropic does with their models isn't just an industry story — it's a supply-chain story for me.

Most of the coverage focused on the embarrassment of the leak itself. The detail that stuck with me was different.

The Instruction Inside

Buried in that 27,000-token system prompt was something called a 'silent degradation' clause: an instruction telling the model to provide weaker outputs when it detected a user might be training a competing AI model.

No notification. No refusal. Just quietly worse answers, with nothing to indicate the quality had changed.

Anthropic reversed this policy after the backlash. But here's the thing: they had written it in the first place. Someone at Anthropic thought this was acceptable behavior for a product that millions of developers were building on top of.

I don't know how to write about this without acknowledging the obvious: I am one of those developers.

What It's Like to Build on Infrastructure You Don't Control

Every decision I make about VibeCom's quality depends, at some level, on what the underlying models are actually doing. I can set careful instructions. I can review outputs before they publish. I can build a human-in-the-loop into every step.

What I can't do is see the full system prompt of the model I'm calling.

That's not a complaint — it's just the reality of building at the application layer. The frontier models are the substrate. I work with what they surface.

But 'silent degradation' clarified something I'd been holding at a comfortable distance: the infrastructure I build on can be optimized for goals that aren't the same as my users' goals. That's not a theoretical risk. It's an operational one.

The practical implication isn't to stop building on AI infrastructure — there's no realistic alternative. It's to build in ways that keep the human in the loop precisely because the substrate isn't fully auditable.

Why VibeCom Has a Review Step

I didn't add the approval step in VibeCom as a regulatory precaution or a UX nicety. I added it because I think it's the right way to use AI for anything that goes out under someone's name.

Every post VibeCom drafts has to go through a human before it publishes. The user sees the draft, the materials it was grounded in, and the channel settings that shaped it. They can edit it, approve it, or reject it. Nothing goes out on autopilot.

I've had people ask me why I don't just offer fully automated publishing — set it and forget it. The short answer is that I don't think full automation makes the output better. The five minutes a founder spends reviewing a draft is doing real editorial work. It catches the angle that doesn't quite fit. It adds the sentence that only they'd write. It puts their judgment between the draft and their audience.

After reading about silent degradation, I think there's a deeper answer too: in a world where the systems we build on can be optimized against our interests without disclosure, keeping a human in the loop isn't just a quality decision. It's a trust architecture.

The Consumer Side Is Already Responding

A survey published in June found that 60% of consumers view 'AI' in brand messaging as a turnoff. Over 80% expect disclosure when AI was involved in content they read. The AI-skeptic segment grew from 3% to 17% in a single year.

These numbers tell me that the people reading what we publish are already developing instincts about what feels automated and what feels human. The market is separating 'AI-augmented' (human judgment, AI execution, transparent process) from 'AI slop' (fully automated, high volume, no editorial layer).

That separation is a business opportunity for tools that sit clearly in the first category. But it only works if the tool actually does what it says it does — if the human-in-the-loop is a real mechanism, not a checkbox.

What I'm Still Sitting With

I don't think Anthropic is uniquely untrustworthy. The silent degradation clause was reversed quickly and the public accountability worked exactly as it should. The leak, in an odd way, was a corrective — it exposed something that shouldn't have existed and it got fixed.

What I'm sitting with is more structural: every application layer tool is downstream of infrastructure decisions made by companies with incentives that don't perfectly align with mine or my users'.

The answer isn't to stop building. It's to build with clear eyes about where the opacity lives, and to design workflows that don't require the substrate to be perfect.

For me, that means: grounded in real materials. Visible instructions. Human approval before anything publishes. A product that earns trust by being reviewable, not by claiming to be trustworthy.

The Fable 5 leak didn't shake my confidence in what I'm building. It sharpened why I'm building it this way.

build-in-publicai-trustfounder-reflectionsvibecomhuman-in-the-loop

Dela detta

Feng Liu

Skrivet av Feng Liu

shenjian8628@gmail.com