/Tech1d ago

Dwarkesh Patel argues the massive sample efficiency gap between LLMs and humans represents a critical AI data bottleneck

LLMs require up to a million times more training data.

911.3K74975247.4K

#60

Original post

Dwarkesh Patel@dwarkesh_sp#60inTech

Narration: the data efficiency black hole.

00:00:00 – What is really driving AI progress? 00:03:11 – Comparing human vs AI sample efficiency 00:08:46 – Does sample efficiency matter?

Also on pod and YouTube feed.

10:14 AM · Jun 19, 2026 · 132.2K Views

Sentiment

Positive users praise discussions of the sample efficiency gap between humans and LLMs as insightful and the next frontier, while negative users criticize frontier labs for relying on brute-force data scaling instead.

Pos

75.7%

Neg

24.3%

19 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Related links

The data black hole at the center of AI

DWARKESH.COMVia

#129

Posts from X

Most Activity

VIEWS149.1KBOOKMARKS306LIKES283REPLIES10

Kevin Patrick Murphy@sirbayes

Current LLMs are outrageously data inefficient (and hence also compute inefficient) - this will be the next frontier https://www.dwarkesh.com/p/the-sample-efficiency-black-hole-2

1d149.1K283306

RETWEETS37

Dwarkesh Patel@dwarkesh_sp

Narration: the data efficiency black hole.

00:00:00 – What is really driving AI progress? 00:03:11 – Comparing human vs AI sample efficiency 00:08:46 – Does sample efficiency matter?

Also on pod and YouTube feed.

1d132.2K632613

Brendan (can/do)@BrendanFoody

"AI needs 1,000,000 more data than us."

Appreciate the @mercor_ai shoutout 🚀

Dwarkesh Patel@dwarkesh_sp

Narration: the data efficiency black hole.

00:00:00 – What is really driving AI progress? 00:03:11 – Comparing human vs AI sample efficiency 00:08:46 – Does sample efficiency matter?

Also on pod and YouTube feed.

1d12.3K7443

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)@teortaxesTex

behold, the costs of human sample efficiency

19h8K14823

Omar Khattab@lateinteraction

@dwarkesh_sp Indeed. Have people sent Machine Studying and its definitions of expertise and intelligence your way yet?

Jacob X. Li@jacobli99

Continual learning is widely discussed right now, but mostly as improving on the job or avoiding catastrophic forgetting. But it has a different, difficult, and already urgent form:

Given nothing but a corpus of documents, how should AI systems develop expertise in a new, unfamiliar domain? We call this problem Machine Studying.

1d8.6K6532

The Innovation Game (𝔦, 𝔦)@tigfoundation

@dwarkesh_sp "Data efficiency" comes from having a better algorithm

That's why algorithms are the most important element in AI

14h685325

Garrett Lord@GarrettLord

@dwarkesh_sp If rubrics are the path to agi we’re in trouble

1d2K162

hallerite@hallerite

At some point, you would need to scale compute so much more that not squeezing out more signal from each rollout becomes untenable. That's the regime we might be entering with agents that use compaction, where one rollout can easily pass 1m tokens.

1d414162

hallerite@hallerite

GRPO then is just taking the compute scaling even further and not bothering to even re-use information from past rollouts by learning a value model for finer-grained credit assignment.

This has, surprisingly, worked quite well, but it may be coming to an end soon.

1d431132

Mark Saroufim@marksaroufim

@willccbb @hallerite Real OGs remember https://www.amazon.com/Probability-Finance-Its-Only-Game/dp/0471402265

21h13332

Ilija Lichkovski@carnot_cyclist

Spot on. Learning a value model indeed re-uses past rollouts, but there's an even more direct way that's been absent in LLM RL: replay. Some recent work (https://arxiv.org/abs/2604.08706) showed it it can work well. Imo, figuring out what to store and how to sample from the buffer holds the key for many open questions

1d10851

hallerite@hallerite

Incidentally, @dwarkesh_sp also just uploaded something about sample efficiency. I think having a mental model about AI that is heavily informed by information theory helps here.

https://www.youtube.com/watch?v=4pG3SJQPAwk

1d23351

hallerite@hallerite

As a matter of fact, using SFT to bootstrap the base model instead of doing RL directly on top of the base model is already using a more sample efficient method that is heavily biased to cut down on the variance.

1d1809

will brown@willccbb

@hallerite my first real RL nerdsnipe was adversarial bandits

1d8831

hallerite@hallerite

So while we can indeed scale compute and probably mostly continue doing so for a bit, it's definitely worth it to already do research on how we can get more signal out of our rollouts.

1d1786

hallerite@hallerite

In any case, it seems to me like now is a good time to think about sample efficiency again and for that it probably makes sense to develop a first principles understanding on what makes a method more or less sample efficient.

1d1706

Mithil Vakde@evilmathkid

@dwarkesh_sp > can AIs, which do not have human-level sample efficiency, nonetheless solve the remaining research problems on the way to human-like intelligence and learning

I think not

14h53011

hallerite@hallerite

Further, as models improve on judging or assigning credit and other interesting auxiliary methods, one big research problem will probably be about finding a good allocation for one's compute budget. GRPO then is just one extreme, but likely not optimal allocation strategy.

1d1496

Super Dario@inductionheads

@dwarkesh_sp For the love of god please try to define out of distribution

In doing so you will find enlightenment

1d7327

Dhruva Pandey@Dhruvapandey

@dwarkesh_sp The important question is will it keep making progress this way and surpass humans.

16h2221