The diminishing returns are kind of insane if you compare the performance and hardware requirements of a 7b and 100b model. In some cases the smaller model can even perform better because it’s more focused and won’t be as subtle about its hallucinations.
Something is going to have to fundamentally change before we see any big improvements, because I don’t see scaling it up further ever producing AGI or even solving any of the hallucinations/ logic errors it makes.
In some ways it’s a bit like the Crypto blockchain speculators saying it’s going to change the world. But in reality the vast majority of applications proposed would have been better implemented with a simple centralized database.
The diminishing returns are kind of insane if you compare the performance and hardware requirements of a 7b and 100b model. In some cases the smaller model can even perform better because it’s more focused and won’t be as subtle about its hallucinations.
Something is going to have to fundamentally change before we see any big improvements, because I don’t see scaling it up further ever producing AGI or even solving any of the hallucinations/ logic errors it makes.
In some ways it’s a bit like the Crypto blockchain speculators saying it’s going to change the world. But in reality the vast majority of applications proposed would have been better implemented with a simple centralized database.