When we were raising money for ATLAS, I often told investors that my cofounder and I were probably the most skeptical GenAI founders they would meet. Sam Altman’s recent hyperbolic claim that Open AI has “scientific certainty that GPT-5 will be better than GPT-4” at Stanford University fuels this skepticism:
I’m impressed by OpenAI, we use their models, and I’m sure Sam is a nice guy, but I cannot imagine that whatever evidence they have to think GPT-5 will be better than GPT-4 would be enough for “scientific certainty.”
I’m not even sure what he means by “scientific certainty,” but here’s a guess: OpenAI has previously discussed how they’ve modeled the relationship between parameter count and LLM performance and have successfully predicted LLM performance using this model. Here’s the thing: the prior usefulness of this model says very little about it’s predictive power as they scale parameter count.
The problem of extrapolating from existing data like this is as old as David Hume and it’s something that thoughtful scientists and philosophers of science have thought about since then. In some ways, being “certain” about this kind of extrapolation feels unscientific.
Regardless of whatever OpenAI’s fancy curve predicts, at the end of the day, GPT-5 is just another data point that new versions of their curve will have to fit. It’s an experiment, not a “scientific certainty,” and the use of that term makes me worried that “science” is being pressed into the service of marketing for OpenAI.