Project analyzing human language usage shuts down because ‘generative AI has polluted the data’
Robyn Speer, the creator of wordfreq, writes: The wordfreq data is a snapshot of language that could be found in various online sources up through 2021. There are several reasons why it will not be updated anymore. Generative AI has polluted the data I don’t think anyone has reliable information about post-2021 language usage by humans. The open Web (via OSCAR) was one of wordfreq’s data sources. Now the Web at large is full of slop generated by large language…