Why forecasting AI performance is tricky: the following 4 trends fit the observed data equally as well
I was trying to replicate a forecast found on AI 2007 and thought it'd be worth pointing out that any number of trends could fit what we've observed so far with performance gains in AI, and at this juncture we can't use goodness of fit to differentiate between them. Here's a breakdown of what you're seeing: The blue line roughly coincides with AI 2027's "benchmark-and-gaps" approach to forecasting when we'll have a super coder. 1.5 is the line where a model would supposedly beat 95% of humans on the same task (although it's a bit of a stretch given that they're using the max score obtained on multiple runs by the same model, not a mean or median). Green and orange are the same type of logistic curve where different carrying capacities are chosen. As you can see, assumptions made abo...