DeepSeek vs Claude vs GPT (2026): What Should an Automation Consultant Actually Build With?

DeepSeek vs Claude vs GPT: what should an automation consultant actually build with?

Two things happened in the AI world this month that are worth a straight answer rather than a news roundup. A US export order forced Anthropic to pull its two newest models overnight. And DeepSeek's pricing now undercuts the Western frontier labs by a margin that should make every business owner running AI in production stop and check their invoice. Neither changes what we recommend building with for client work — but both are worth understanding properly before someone tells you cost is the only variable that matters.

The Anthropic story: a developing situation, not a verdict

On 12 June 2026, the US government issued an export control directive ordering Anthropic to suspend access to its two newest models, Fable 5 and Mythos 5, for any foreign national — including Anthropic's own non-citizen staff. Because there is no practical way to filter users by nationality in real time, Anthropic disabled both models for every customer worldwide rather than risk non-compliance. Every other Claude model, including the one running this site's automation, was unaffected.

The government's stated concern was national security, tied to a reported method of bypassing Fable 5's safeguards. Anthropic's position, published the same day, is that the finding was narrow and non-universal — closer to a known, minor vulnerability than a usable cyberweapon — and that the same technique can be reproduced on other publicly available frontier models that aren't subject to the same order. Anthropic has said it believes this is a misunderstanding and is working to restore access.

We are not going to pretend either side has fully made its case in public yet. What we do know: this is the first time a government has forced a full, global takedown of a publicly deployed frontier model, it happened to a model that had been live for three days, and as of writing it remains unresolved. If you are running anything that calls a Mythos-class model directly, that is worth checking now rather than assuming it will sort itself out. For most operational builds — the kind we run for clients — this has had no impact at all, because none of our production work sits on Fable or Mythos.

The DeepSeek story: the price gap is real, and it is bigger than people think

Separately from any of that, DeepSeek has made its V4-Pro pricing cut permanent. At current standing rates, DeepSeek V4-Pro costs roughly 11x less than GPT-5.5 or Claude Opus-class models on input tokens, and somewhere in the region of 30x less on output tokens — which is where most of the actual spend on a chat-heavy or report-generating application lands. Run a workload generating ten million output tokens a month and you are looking at roughly £30 on DeepSeek against £250-300 on a Western frontier model for comparable volume.

On the image side, Chinese providers such as Tencent's Hunyuan and ByteDance's Seedream are also undercutting Midjourney, but the gap there is nowhere near as dramatic — typically two to four times cheaper per image, not the multiples seen on text. Worth knowing, but not the same story.

DeepSeek's benchmark numbers are genuinely close to the Western frontier on coding and general reasoning tasks. This is not a "cheap and worse" story in the way it might have been eighteen months ago. The gap that remains is narrower, and concentrated in specific places: long-horizon agentic reliability, factual recall on real-world knowledge, and the kind of consistent instruction-following that matters when a model is running unsupervised inside someone's business.

What this actually means for an operational build

Here is the part that matters if you are paying for any of this rather than just reading about it. Token price is one line on an invoice. It is not the cost that determines whether an automation survives in production.

When we build a WhatsApp staff assistant, a proposal generator, or a weekly labour reporting pipeline for a client, the model is answering a real person, writing a number into a real document, or feeding a decision a manager will act on. The variables that actually decide whether that system holds up over months of unsupervised running are reliability under edge cases, predictable refusal behaviour, data governance and where the data physically sits, and whether the provider is still going to be operating the same way in a year. Anthropic and OpenAI currently win on all four for the kind of work we do. A 90% cost saving on tokens that cost a few pounds a month per client to begin with is not worth trading against any of that.

That said, cheap inference has a real place. Bulk classification, first-draft generation, document summarisation at scale, anything where a human reviews the output before it touches a client or a decision — that is exactly where a model like DeepSeek V4 earns its keep, and we will route work there where it makes sense to. The mistake is applying that same logic to the parts of a build that run client-facing and unsupervised. Those stay on Claude or GPT-class models, and that has not changed this month.

If anything, the Fable/Mythos episode is a useful data point in the other direction. A government can suspend access to a specific model overnight for reasons entirely outside your control. That is one more argument for building automations that don't hard-depend on a single named model — route through an API layer you can repoint, and you are protected whether the disruption comes from a regulator or a pricing change.

The takeaway

Frontier AI is moving fast enough that any "best model" answer has a shelf life measured in weeks. What does not move as fast is the actual job: build something that works reliably for the business paying for it, on infrastructure you are not betting the client relationship on. Right now that still means Claude or GPT-class models for anything client-facing, with cheaper models doing the unglamorous bulk work behind the scenes where appropriate.

Want a second opinion on what's actually running your automation?

If you're not sure which model your current setup depends on, or whether your stack is exposed to a single provider, we'll take a look and give you a straight answer.

Book a discovery call Submit a project brief

Related reading: WhatsApp automated messages for business · Make.com vs Zapier for UK businesses

Free ops checklist