Training AI systems using synthetic data can lead to rapid model collapse

New research published in Nature has highlighted the risks associated with using synthetic data to train artificial intelligence (AI) models. The study identified that the use of synthetic data can lead to the rapid degradation of AI models. The researchers found that AI models tend to collapse over time due to the accumulation and amplification of mistakes from successive generations of training. This deterioration is linked to the design of the model, the learning process, and the quality of data used. The study also raises concerns about the over-representation of majority subpopulations at the expense of minority groups in the early stages of collapse, leading to nonsensical outputs.

£ - This article requires a subscription.

Read Full Story

AI Governance, Big Data

Back to news

What is this page?

You are reading a summary article on the Privacy Newsfeed, a free resource for DPOs and other professionals with privacy or data protection responsibilities helping them stay informed of industry news all in one place. The information here is a brief snippet relating to a single piece of original content or several articles about a common topic or thread. The main contributor is listed in the top left-hand corner, just beneath the article title.

The Privacy Newsfeed monitors over 300 global publications, of which more than 5,750 summary articles have been posted to the online archive dating back to the beginning of 2020. A weekly roundup is available by email every Friday.

Training AI systems using synthetic data can lead to rapid model collapse

24/07/2024 | Financial Times