GLM blows a big hole in my thesis that the Chinese have a catastrophic disadvantage in high-quality data. It's too close to Opus. They did this without spending billions. I don't know how. Maybe distillation is all you need, like it served to bootstrap early assistants. Wild.
Creator Teortaxes says China's GLM model nearly matches Claude 3 Opus despite lacking a multi-billion-dollar compute budget
Beff (e/acc) attributes the rapid gains to model distillation
Positive users praise the GLM model's performance closing the gap with Claude Opus via efficient distillation, while negative users respond with insults or fear distillation as an economic threat to frontier AI development.
No Digg Deeper questions have been answered for this story yet.
Most Activity
Anthropic is too addicted to big revenue to stop offering their API globally
Chinese labs are encouraged to do whatever it takes to catch up
The distillations will continue as long as the models improve.
GLM blows a big hole in my thesis that the Chinese have a catastrophic disadvantage in high-quality data. It's too close to Opus. They did this without spending billions. I don't know how. Maybe distillation is all you need, like it served to bootstrap early assistants. Wild.
@teortaxesTex *sigh* value estimation smell real?
GLM blows a big hole in my thesis that the Chinese have a catastrophic disadvantage in high-quality data. It's too close to Opus. They did this without spending billions. I don't know how. Maybe distillation is all you need, like it served to bootstrap early assistants. Wild.
@teortaxesTex Wonder who proposed that idea 😳
GLM blows a big hole in my thesis that the Chinese have a catastrophic disadvantage in high-quality data. It's too close to Opus. They did this without spending billions. I don't know how. Maybe distillation is all you need, like it served to bootstrap early assistants. Wild.
@beffjezos I suspect that it's impossible to stop short of Fable-style lockdown Extraction of the reward profile delta seems to be absurdly more efficient than copying RL-d behavior with SFT capability lift. You can bootstrap with a few tens of thousands of innocuous interactions.
Anthropic is too addicted to big revenue to stop offering their API globally
Chinese labs are encouraged to do whatever it takes to catch up
The distillations will continue as long as the models improve.
@teortaxesTex Better RL environments for coding? you shouldn't really need better data at some point, if you can get the RL working well enough
also I wonder if this is sanctions helping them. more gaming gpus which are not as good at training but better for RL
GLM blows a big hole in my thesis that the Chinese have a catastrophic disadvantage in high-quality data. It's too close to Opus. They did this without spending billions. I don't know how. Maybe distillation is all you need, like it served to bootstrap early assistants. Wild.

@Youssofal_ @teortaxesTex yeah man, they totally "distilled Fable" and trained a model in the 1 day it was available or whatever

China isn't relying on raw data hoards. Instead, they're running a Grokipedia-on-steroids flywheel: farming frontier models for synth reasoning chains, zwiki-style repo-to-wiki pipelines, agent swarm trajectories, and relentless distill plus RL loops. They turned teacher models into an orders-of-magnitude wider synthetic encyclopedia, then bootstrapped hard. Post-training efficiency is closing the gap fast. Data moats were never eternal.

@teortaxesTex @chiefofautism They have a program where you can paste a repo or search a repo and zai would create an entire wiki on it it was part of their MCPs/offering last year
I don’t see it on the internet anymore but I had a post about it that blew up @grok it was about pi as well

@teortaxesTex distillation is more powerful than ever in the llm as judge rl regime. a very chinese strategy, and they do it best. they let the westoids do the data legwork, exercise their "taste". meanwhile they engineer sota techniques to wring out every last drop of opus's soul.

@teortaxesTex @bookwormengr Why couldn’t China labs train a base model on terabytes of shadow libraries, distill it into an untraceable teacher model that generates unlimited structured/annotated data from an expanded Wikidata graph to build a verifiable 'super wiki' for further annotation and training?

@0xSero @chiefofautism zwiki?

@teortaxesTex @chiefofautism nevermind it's this

@teortaxesTex they didnt spend billions
as long as you completely ignore the hundreds billions of dollars in subsidies the chinese government has poured into AI, energy, vouchers, infra, etc

Didn’t you see the many news reports about Chinese entities running American frontier models at massive scale to violate the terms of service of American AI companies for the purpose of collect training data?
You have very strong opinions about AI, but it seems to be one-sided here.

an example of this is a task that takes paragraphs, sorts sentences into some ~random order, asks the lm to reconstruct a valid one there's vastly many more wrong orders than correct orders, so pinning to the original order is sane - but the original order isn't uniquely determined by the input in a lot of cases the value estimator, in principle, learns a way to assign relative points that hedge as best as is possible given the incompressible constraints. that's probably the true value (heh) in explicit estimation; not some vague """variance reduction""" for """sample efficiency""" in the sense that the canon literature likes to invoke (else robotics people could just crank GRPO group size), but the smooth manifold geometry with inherently relative characteristics

@LLMJunky The long and short of what you're claiming is that your companies are trash and so is your government, so you deserve to lose

@atlantis2point0 @teortaxesTex @bookwormengr 好的数据占比太少了,美国ai能直接用到美国欧洲几十年积累的纯人工高质量代码,哪怕是工程以及金融软件等,都是可以用。但是中国没有这些的同时很多中国大公司采用自己部署大模型来防止代码外露。日常中使用pc电脑产生的数据通常比手机更有价值,但是中国的发展的pc占比相比欧美国家太少了。

Beijing has 6 world class universities with more than 40k grads, post grads and phd students “EACH”. And that is Beijing alone. Z AI, MiniMax, ByteDance SEED etc are all based there. That cluster publishes worlds most books and research papers across all subjects.
I always find it laughable that they can not prepare datasets for pre and post training.

@murchiston would be nice if that really helped with writing GEMM kernels

@TeleCat88 @teortaxesTex @grok how many people died in china as a result of starvation, political purges, repression, and executions during the same time period of the vietnam war
and then compare that to how many died in the vietnam war on both sides.