/Tech1d ago

Creator Teortaxes says China's GLM model nearly matches Claude 3 Opus despite lacking a multi-billion-dollar compute budget

Beff (e/acc) attributes the rapid gains to model distillation

1322.5K85411206.2K

#331

Original post

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex#501inTech

GLM blows a big hole in my thesis that the Chinese have a catastrophic disadvantage in high-quality data. It's too close to Opus. They did this without spending billions. I don't know how. Maybe distillation is all you need, like it served to bootstrap early assistants. Wild.

4:48 PM · Jun 19, 2026 · 177.1K Views

Sentiment

Positive users praise the GLM model's performance closing the gap with Claude Opus via efficient distillation, while negative users respond with insults or fear distillation as an economic threat to frontier AI development.

Pos

37.5%

Neg

62.5%

26 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS26.5KBOOKMARKS46LIKES348RETWEETS15REPLIES12

Beff (e/acc)@beffjezos

Anthropic is too addicted to big revenue to stop offering their API globally

Chinese labs are encouraged to do whatever it takes to catch up

The distillations will continue as long as the models improve.

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

1d26.5K34846

kalomaze@kalomaze

@teortaxesTex *sigh* value estimation smell real?

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

1d6.2K6513

Teknium 🪽@Teknium

@teortaxesTex Wonder who proposed that idea 😳

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

15h2.6K362

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@beffjezos I suspect that it's impossible to stop short of Fable-style lockdown Extraction of the reward profile delta seems to be absurdly more efficient than copying RL-d behavior with SFT capability lift. You can bootstrap with a few tens of thousands of innocuous interactions.

Beff (e/acc)@beffjezos

Anthropic is too addicted to big revenue to stop offering their API globally

Chinese labs are encouraged to do whatever it takes to catch up

The distillations will continue as long as the models improve.

1d1.5K232

Chris Paxton@chris_j_paxton

@teortaxesTex Better RL environments for coding? you shouldn't really need better data at some point, if you can get the RL working well enough

also I wonder if this is sanctions helping them. more gaming gpus which are not as good at training but better for RL

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

1d2.3K182

Shannon Sands@max_paperclips

@Youssofal_ @teortaxesTex yeah man, they totally "distilled Fable" and trained a model in the 1 day it was available or whatever

19h44224

Pit Schultz@pitsch

China isn't relying on raw data hoards. Instead, they're running a Grokipedia-on-steroids flywheel: farming frontier models for synth reasoning chains, zwiki-style repo-to-wiki pipelines, agent swarm trajectories, and relentless distill plus RL loops. They turned teacher models into an orders-of-magnitude wider synthetic encyclopedia, then bootstrapped hard. Post-training efficiency is closing the gap fast. Data moats were never eternal.

20h37352

0xSero@0xSero

@teortaxesTex @chiefofautism They have a program where you can paste a repo or search a repo and zai would create an entire wiki on it it was part of their MCPs/offering last year

I don’t see it on the internet anymore but I had a post about it that blew up @grok it was about pi as well

1d26021

bling@blingdivinity

@teortaxesTex distillation is more powerful than ever in the llm as judge rl regime. a very chinese strategy, and they do it best. they let the westoids do the data legwork, exercise their "taste". meanwhile they engineer sota techniques to wring out every last drop of opus's soul.

1d74341

Pit Schultz@pitsch

@teortaxesTex @bookwormengr Why couldn’t China labs train a base model on terabytes of shadow libraries, distill it into an untraceable teacher model that generates unlimited structured/annotated data from an expanded Wikidata graph to build a verifiable 'super wiki' for further annotation and training?

19h15551

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@0xSero @chiefofautism zwiki?

1d7012

0xSero@0xSero

@teortaxesTex @chiefofautism nevermind it's this

1d601

am.will@LLMJunky

@teortaxesTex they didnt spend billions

as long as you completely ignore the hundreds billions of dollars in subsidies the chinese government has poured into AI, energy, vouchers, infra, etc

1d5481

Aedmar Skýjárn@AedmarSkyjarn

Didn’t you see the many news reports about Chinese entities running American frontier models at massive scale to violate the terms of service of American AI companies for the purpose of collect training data?

You have very strong opinions about AI, but it seems to be one-sided here.

18h16221

kalomaze@kalomaze

an example of this is a task that takes paragraphs, sorts sentences into some ~random order, asks the lm to reconstruct a valid one there's vastly many more wrong orders than correct orders, so pinning to the original order is sane - but the original order isn't uniquely determined by the input in a lot of cases the value estimator, in principle, learns a way to assign relative points that hedge as best as is possible given the incompressible constraints. that's probably the true value (heh) in explicit estimation; not some vague """variance reduction""" for """sample efficiency""" in the sense that the canon literature likes to invoke (else robotics people could just crank GRPO group size), but the smooth manifold geometry with inherently relative characteristics

1d1785

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@LLMJunky The long and short of what you're claiming is that your companies are trash and so is your government, so you deserve to lose

1d851

cat cool@catcool477399

@atlantis2point0 @teortaxesTex @bookwormengr 好的数据占比太少了，美国ai能直接用到美国欧洲几十年积累的纯人工高质量代码，哪怕是工程以及金融软件等，都是可以用。但是中国没有这些的同时很多中国大公司采用自己部署大模型来防止代码外露。日常中使用pc电脑产生的数据通常比手机更有价值，但是中国的发展的pc占比相比欧美国家太少了。

17h2121

GDP@bookwormengr

Beijing has 6 world class universities with more than 40k grads, post grads and phd students “EACH”. And that is Beijing alone. Z AI, MiniMax, ByteDance SEED etc are all based there. That cluster publishes worlds most books and research papers across all subjects.

I always find it laughable that they can not prepare datasets for pre and post training.

1d854

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

@murchiston would be nice if that really helped with writing GEMM kernels

17h3582

am.will@LLMJunky

@TeleCat88 @teortaxesTex @grok how many people died in china as a result of starvation, political purges, repression, and executions during the same time period of the vietnam war

and then compare that to how many died in the vietnam war on both sides.

23h18