Reddit Is Updating Its Policies to Crack Down On AI Scraping

Reddit is updating its policies in an apparent effort to crack down on AI companies scraping the site for content to train AI models....
Reddit Is Updating Its Policies to Crack Down On AI Scraping
Written by Matt Milano
  • Reddit is updating its policies in an apparent effort to crack down on AI companies scraping the site for content to train AI models.

    Reddit is a popular place for AI companies to scrape, thanks to the large quantity of user-generated content on a vast array of subjects. Reddit has signed a deal with Google allowing the company to use the site’s content, but other companies appear to be continuing their efforts to scrape the site.

    The company says it will make changes to address the issue.

    In the coming weeks, we’ll update our Robots Exclusion Protocol (robots.txt file), which gives high-level instructions about how we do and don’t allow Reddit to be crawled by third parties. Along with our updated robots.txt file, we will continue rate-limiting and/or blocking unknown bots and crawlers from accessing reddit.com. This update shouldn’t impact the vast majority of folks who use and enjoy Reddit. Good faith actors – like researchers and organizations such as the Internet Archive – will continue to have access to Reddit content for non-commercial use.

    Mark Graham, Director, Wayback Machine at Internet Archive, praised Reddit’s position.

    “The Internet Archive is grateful that Reddit appreciates the importance of helping to ensure the digital records of our times are archived and preserved for future generations to enjoy and learn from,” said Graham. “Working in collaboration with Reddit we will continue to record and make available archives of Reddit, along with the hundreds of millions of URLs from other sites we archive every day.”

    Reddit emphasized that organizations must abide by its policies.

    Anyone accessing Reddit content must abide by our policies, including those in place to protect redditors. We are selective about who we work with and trust with large-scale access to Reddit content. Organizations looking to access Reddit content can head over to our guide to accessing Reddit Data.

    Get the WebProNews newsletter delivered to your inbox

    Get the free daily newsletter read by decision makers

    Subscribe
    Advertise with Us

    Ready to get started?

    Get our media kit