Narration: the data efficiency black hole.
00:00:00 – What is really driving AI progress? 00:03:11 – Comparing human vs AI sample efficiency 00:08:46 – Does sample efficiency matter?
Also on pod and YouTube feed.
LLMs require up to a million times more training data.
Narration: the data efficiency black hole.
00:00:00 – What is really driving AI progress? 00:03:11 – Comparing human vs AI sample efficiency 00:08:46 – Does sample efficiency matter?
Also on pod and YouTube feed.
Positive users praise discussions of the sample efficiency gap between humans and LLMs as insightful and the next frontier, while negative users criticize frontier labs for relying on brute-force data scaling instead.
No Digg Deeper questions have been answered for this story yet.
Current LLMs are outrageously data inefficient (and hence also compute inefficient) - this will be the next frontier https://www.dwarkesh.com/p/the-sample-efficiency-black-hole-2
Narration: the data efficiency black hole.
00:00:00 – What is really driving AI progress? 00:03:11 – Comparing human vs AI sample efficiency 00:08:46 – Does sample efficiency matter?
Also on pod and YouTube feed.
"AI needs 1,000,000 more data than us."
Appreciate the @mercor_ai shoutout 🚀
Narration: the data efficiency black hole.
00:00:00 – What is really driving AI progress? 00:03:11 – Comparing human vs AI sample efficiency 00:08:46 – Does sample efficiency matter?
Also on pod and YouTube feed.
behold, the costs of human sample efficiency
@dwarkesh_sp Indeed. Have people sent Machine Studying and its definitions of expertise and intelligence your way yet?
Continual learning is widely discussed right now, but mostly as improving on the job or avoiding catastrophic forgetting. But it has a different, difficult, and already urgent form:
Given nothing but a corpus of documents, how should AI systems develop expertise in a new, unfamiliar domain? We call this problem Machine Studying.

@dwarkesh_sp "Data efficiency" comes from having a better algorithm
That's why algorithms are the most important element in AI

@dwarkesh_sp If rubrics are the path to agi we’re in trouble

At some point, you would need to scale compute so much more that not squeezing out more signal from each rollout becomes untenable. That's the regime we might be entering with agents that use compaction, where one rollout can easily pass 1m tokens.

GRPO then is just taking the compute scaling even further and not bothering to even re-use information from past rollouts by learning a value model for finer-grained credit assignment.
This has, surprisingly, worked quite well, but it may be coming to an end soon.

@willccbb @hallerite Real OGs remember https://www.amazon.com/Probability-Finance-Its-Only-Game/dp/0471402265

Spot on. Learning a value model indeed re-uses past rollouts, but there's an even more direct way that's been absent in LLM RL: replay. Some recent work (https://arxiv.org/abs/2604.08706) showed it it can work well. Imo, figuring out what to store and how to sample from the buffer holds the key for many open questions

Incidentally, @dwarkesh_sp also just uploaded something about sample efficiency. I think having a mental model about AI that is heavily informed by information theory helps here.
https://www.youtube.com/watch?v=4pG3SJQPAwk

As a matter of fact, using SFT to bootstrap the base model instead of doing RL directly on top of the base model is already using a more sample efficient method that is heavily biased to cut down on the variance.

@hallerite my first real RL nerdsnipe was adversarial bandits

So while we can indeed scale compute and probably mostly continue doing so for a bit, it's definitely worth it to already do research on how we can get more signal out of our rollouts.

In any case, it seems to me like now is a good time to think about sample efficiency again and for that it probably makes sense to develop a first principles understanding on what makes a method more or less sample efficient.

@dwarkesh_sp > can AIs, which do not have human-level sample efficiency, nonetheless solve the remaining research problems on the way to human-like intelligence and learning
I think not

Further, as models improve on judging or assigning credit and other interesting auxiliary methods, one big research problem will probably be about finding a good allocation for one's compute budget. GRPO then is just one extreme, but likely not optimal allocation strategy.

@dwarkesh_sp For the love of god please try to define out of distribution
In doing so you will find enlightenment

@dwarkesh_sp The important question is will it keep making progress this way and surpass humans.