Interesting take... I think organizationally buying more compute ends up the more popular choice because of internal incentive structures.
In other words, if you're not OpenAI or another big player with money to burn and you have a fixed budget, buying more compute has a neat little graph associated with it (even if it's made-up) and no one will fault you for following the graph.
On the other hand, investing in clever architecture is a lot riskier because you can't guarantee you will produce the leapfrog insights. But I agree that the ones who are in a position to take such risks definitely should be investing more in them.
“AI is at the cusp of a new period in which experience will become the dominant medium of improvement and ultimately dwarf the scale of human data used in today’s systems. The pace of progress driven solely by supervised learning from human data is demonstrably slowing, signally the need for a new approach.”
- from Sutton's Welcome to the Era of Experience paper
Hadn't actually seen that post! The catalyst was a series of conversations with other researchers over the last several months. But, thank you for the pointer — I'll reach out to Andrew. Based on the discussion thread, we seem quite philosophically aligned.
Or...you could just find a second internet of hidden data. That's why the tech bros put Trump into power. They wanted access to all the data hidden away in the US government. All of it already digitised in incompatible siloes and obsolete formats, going back five or six decades. That is an awful lot of data that no one has ever seen. Think about all those satellites and submarines. That is what Palantir is for, to restructure all that data into a single searchable database. That is what McNamara was assuming he could do in the 1960's, but which failed because there wasn't enough compute and only a tiny group of academics who understood information theory, all of whom were on contracts to the Department of Defence. Remember the internet was originally a military project. Everyone just assumed that the military element had simply faded away because it was no longer mentioned. When the whole world installed stolen copies of Windows 98, all the world's data was accessed by the NSA, because there was no security at all, like zero, zip, nada. You might need to read that again, ALL THE WORLD'S DATA. China and Russia did not realise what had been done to them for 20 years. They had built their whole infrastructure on the American version of the Belt and Road, and they were pwned. That is why they are conducting open warfare on the internet. They want revenge, but it's decades too late to change the outcome. So Big Compute is getting a second chance at making it to the top of the mountain, their own private mountain, that they don't have to share with anybody, using free money, and incidentally creating another moon race/Star Wars moment as it tempts China and Russia to bankrupt themselves into diverting their scarce resources into a game that that is totally owned by the USA. Welcome to the casino guys!
> After filtering for quality and duplicates, you’re staring at a pool of ~10T useful pre-training tokens.3
Here you are citing the older Epoch AI paper supplanted by the one from which you show the image. The 100-1000T range *is* the updated estimate, though only ~100T of it is text. Citing the older estimate as the filtered amount is misleading.
I think the optimality of Chinchilla scaling laws is sometimes exaggerated. They only hold when everything else is kept constant and there are plenty of other levers to try. E.g. increasing context length during pretraining would quickly eat more compute with the same data.
Interesting take... I think organizationally buying more compute ends up the more popular choice because of internal incentive structures.
In other words, if you're not OpenAI or another big player with money to burn and you have a fixed budget, buying more compute has a neat little graph associated with it (even if it's made-up) and no one will fault you for following the graph.
On the other hand, investing in clever architecture is a lot riskier because you can't guarantee you will produce the leapfrog insights. But I agree that the ones who are in a position to take such risks definitely should be investing more in them.
Looks like you just coined the Tick-Tock model for ai 🤩
Per Dwarkesh's recent interview, Sutton's new position is now that even more than data, experience is key:
https://www.youtube.com/watch?v=21EYKqUsPfg
https://www.ft.com/content/8192467e-e9d7-4c0a-ab0d-59bd6351a1bb
“AI is at the cusp of a new period in which experience will become the dominant medium of improvement and ultimately dwarf the scale of human data used in today’s systems. The pace of progress driven solely by supervised learning from human data is demonstrably slowing, signally the need for a new approach.”
- from Sutton's Welcome to the Era of Experience paper
I recommend acknowledging this thread: https://x.com/andrewgwils/status/1953814226188841417.
Hadn't actually seen that post! The catalyst was a series of conversations with other researchers over the last several months. But, thank you for the pointer — I'll reach out to Andrew. Based on the discussion thread, we seem quite philosophically aligned.
Or...you could just find a second internet of hidden data. That's why the tech bros put Trump into power. They wanted access to all the data hidden away in the US government. All of it already digitised in incompatible siloes and obsolete formats, going back five or six decades. That is an awful lot of data that no one has ever seen. Think about all those satellites and submarines. That is what Palantir is for, to restructure all that data into a single searchable database. That is what McNamara was assuming he could do in the 1960's, but which failed because there wasn't enough compute and only a tiny group of academics who understood information theory, all of whom were on contracts to the Department of Defence. Remember the internet was originally a military project. Everyone just assumed that the military element had simply faded away because it was no longer mentioned. When the whole world installed stolen copies of Windows 98, all the world's data was accessed by the NSA, because there was no security at all, like zero, zip, nada. You might need to read that again, ALL THE WORLD'S DATA. China and Russia did not realise what had been done to them for 20 years. They had built their whole infrastructure on the American version of the Belt and Road, and they were pwned. That is why they are conducting open warfare on the internet. They want revenge, but it's decades too late to change the outcome. So Big Compute is getting a second chance at making it to the top of the mountain, their own private mountain, that they don't have to share with anybody, using free money, and incidentally creating another moon race/Star Wars moment as it tempts China and Russia to bankrupt themselves into diverting their scarce resources into a game that that is totally owned by the USA. Welcome to the casino guys!
Self learning open ended agents
> After filtering for quality and duplicates, you’re staring at a pool of ~10T useful pre-training tokens.3
Here you are citing the older Epoch AI paper supplanted by the one from which you show the image. The 100-1000T range *is* the updated estimate, though only ~100T of it is text. Citing the older estimate as the filtered amount is misleading.
I think the optimality of Chinchilla scaling laws is sometimes exaggerated. They only hold when everything else is kept constant and there are plenty of other levers to try. E.g. increasing context length during pretraining would quickly eat more compute with the same data.
👍