·3 min read

GPT-3 Just Dropped and I Have Thoughts

nlpgpt-3deep-learningopinion

I was scrolling through Twitter during my chai break when I first saw the GPT-3 demos. Someone had it writing React components from natural language. Another person had it generating SQL queries. A third person had it writing poetry that was... actually good?

I spent the next hour going down the rabbit hole. Then I closed my laptop, looked at the 3MB model I was trying to squeeze onto a Raspberry Pi, and had a minor existential crisis.

What GPT-3 Gets Right

Few-shot learning actually works. Give it 2-3 examples of what you want and it figures out the pattern. No fine-tuning, no training data pipeline, no retraining. For prototyping and exploration, this is huge.

Language understanding is deep. It grasps context, nuance, and style in ways that feel qualitatively different from GPT-2. The jump from 1.5B to 175B parameters wasn't just incremental. Something changed.

The API model makes AI accessible. You don't need ML expertise to use GPT-3. Send text in, get text out. That opens up AI to developers, designers, writers, people who'd never train a model themselves.

What Concerns Me

Only accessible through an API. You can't download GPT-3. You can't run it yourself. You can't inspect its weights or fine-tune it. One company controls access. That's a very different world from open-source ML.

The cost is hidden. Each API call costs money. A busy application could easily run up thousands in API fees. And you're dependent on OpenAI's pricing, uptime, and content policies.

It's a black box. When GPT-3 generates something wrong or biased, you can't debug it the way you'd debug a model you trained. You can't retrain it on better data. You just prompt it differently and hope for the best.

Environmental cost. Training GPT-3 consumed an estimated 3.14 × 10²³ FLOPS. The carbon footprint of models this size is a real concern that the industry needs to take more seriously.

The Edge Engineer's Perspective

The thing is, I spend my days making models smaller. Quantizing, pruning, distilling. Figuring out how to get useful inference out of hardware that costs less than a dinner at a nice restaurant. GPT-3 is the exact opposite philosophy: throw everything at the problem, cost be damned.

Both approaches have their place. But the models that actually change everyday computing will be the ones that run on your phone, in your browser, on a Raspberry Pi. Not the ones behind a $0.06/1K-token API.

My Take

GPT-3 is a research milestone and a product. It's going to spawn a thousand startups. And it proved that scaling laws are real in a way that's impossible to ignore.

But I'd be lying if I said the demos didn't make me stop and stare for a while. Wild times for NLP.