Websites face losing battle to block AI web scraping

29/07/2024 | Media

A report in 404 Media reveals that hundreds of websites have been inadvertently blocking the wrong bots in their efforts to prevent AI company Anthropic from scraping their content.

The development has been attributed to outdated instructions in robots.txt files, which fail to account for the emergence of new AI bots using different names. The situation has been described as indicative of the current complexity of managing robots.txt files by the anonymous operator of Dark Visitors, a website specialising in monitoring web crawlers and scrapers, many of which are AI-operated. The ever-changing landscape of bots has made it challenging for website owners to stay updated, leading to unintended consequences such as the blocking of legitimate crawlers and search engines.

Read Full Story
Robot, AI, Algorythm

What is this page?

You are reading a summary article on the Privacy Newsfeed, a free resource for DPOs and other professionals with privacy or data protection responsibilities helping them stay informed of industry news all in one place. The information here is a brief snippet relating to a single piece of original content or several articles about a common topic or thread. The main contributor is listed in the top left-hand corner, just beneath the article title.

The Privacy Newsfeed monitors over 300 global publications, of which more than 4,350 summary articles have been posted to the online archive dating back to the beginning of 2020. A weekly roundup is available by email every Friday.

Freevacy has been shortlisted in the Best Educator category.
The PICCASO Privacy Awards recognise the people making an outstanding contribution to this dynamic and fast-growing sector.