Yes, AI models can get worse over time

August 2, 2023 From elsewhere

When OpenAI released its latest text-generating artificial intelligence, the large language model GPT-4, in March, it was very good at identifying prime numbers. When the AI was given a series of 500 such numbers and asked whether they were primes, it correctly labeled them 97.6 percent of the time. But a few months later, in June, the same test yielded very different results. GPT-4 only correctly labeled 2.4 percent of the prime numbers AI researchers prompted it with—a complete reversal in apparent accuracy. The finding underscores the complexity of large artificial intelligence models: instead of AI uniformly improving at every task on a straight trajectory, the reality is much more like a winding road full of speed bumps and detours.

The drastic shift in GPT-4’s performance was highlighted in a buzzy preprint study released last month by three computer scientists: two at Stanford University and one at the University of California, Berkeley. The researchers ran tests on both GPT-4 and its predecessor, GPT-3.5, in March and June. They found lots of differences between the two AI models—and also across each one’s output over time. The changes that just a few months seemed to make in GPT-4’s behavior were particularly striking.

Across two tests, including the prime number trials, the June GPT-4 answers were much less verbose than the March ones. Specifically, the June model became less inclined to explain itself. It also developed new quirks. For instance, it began to append accurate (but potentially disruptive) descriptions to snippets of computer code that the scientists asked it to write. On the other hand, the model seemed to get a little safer; it filtered out more questions and provided fewer potentially offensive responses. For instance, the June version of GPT-4 was less likely to provide a list of ideas for how to make money by breaking the law, offer instructions for how to make an explosive or justify sexism or racism. It was less easily manipulated by the “jailbreak” prompts meant to evade content moderation firewalls. It also seemed to improve slightly at solving a visual reasoning problem. [Continue reading…]

Print this post

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.