Generative AI companies bypassing web protocol to scrape data

Reuters reports that multiple artificial intelligence (AI) companies are bypassing a common web standard used by publishers to prevent the scraping of their content for use in generative AI systems. Content licensing startup TollBit informed publishers about the issue in a letter seen by Reuters on Friday. The AI companies are reportedly evading attempts to block web crawlers using the widely accepted Robots Exclusion Protocol—robots.txt.

In related news, DIGIT News reports that seven data licensing companies have established the Dataset Protection Alliance (DPA), a trade union to encourage responsible and ethical methods for gathering data used in training artificial intelligence (AI) and machine learning (ML) systems. Members of the DPA have committed to fostering cooperation, creating best practices, and advocating for the rights of content creators in the rapidly evolving AI and ML landscape. The initiative comes in response to a surge in copyright lawsuits targeting AI companies, with data protection advocates pushing for swift legislation to regulate the mass collection of data for training AI models.

Read Full Story

Digital Platforms & Services Browsers, Websites & Apps AI Governance, Big Data Data Processing

Back to news

What is this page?

You are reading a summary article on the Privacy Newsfeed, a free resource for DPOs and other professionals with privacy or data protection responsibilities helping them stay informed of industry news all in one place. The information here is a brief snippet relating to a single piece of original content or several articles about a common topic or thread. The main contributor is listed in the top left-hand corner, just beneath the article title.

The Privacy Newsfeed monitors over 300 global publications, of which more than 5,750 summary articles have been posted to the online archive dating back to the beginning of 2020. A weekly roundup is available by email every Friday.

Generative AI companies bypassing web protocol to scrape data

21/06/2024 | Reuters