Here’s why turning to AI to train future AIs may be a bad idea
ChatGPT, Gemini, Copilot and other AI tools whip up impressive sentences and paragraphs from as little as a simple line of text prompt. To generate those words, the underlying large language models were trained on reams of text written by humans and scraped from the internet. But now, as generative AI tools flood the internet with a large amount of synthetic content, that content is being used to train future generations of those AIs. If this continues unchecked, it could be disastrous, researchers say.
Training large language models on their own data could lead to model collapse, University of Oxford computer scientist Ilia Shumailov and colleagues argued recently in Nature.
Model collapse sounds startling, but it doesn’t mean generative AIs would just quit working. Instead, the tools’ responses would move further and further from their original training data. Though sometimes biased, that original data is a decent representation of reality. But as the tools train on their own generated data, the small errors they make add up, their content ultimately losing the nuance of diverse perspectives and morphing into gibberish. [Continue reading…]