Perplexity Ai Reacts To Reddit Lawsuit Over Data Access

Perplexity Reacts To Reddit Lawsuit Over Data Access

Perplexity reacts to Reddit’s lawsuit over data access, addressing concerns regarding the fairness of use and compliance.

Reddit recently filed an action in a federal court against Perplexity AI as well as three companies that scrape data at the New York federal court, accusing them of evading access controls to gain access to Reddit content on a massive scale. 

The complaint states that Perplexity utilized intermediaries like SerpApi, Oxylabs UAB, and AWMProxy to circumvent Reddit’s security and scrape data, such as the results of Google searches.

Perplexity Defence and Claims

Perplexity has publicly defended itself by claiming that it does not summarize Reddit discussions by citing them and doesn’t develop its AI models based on Reddit content. 

Perplexity stated:

“We summarize Reddit discussions, and we cite Reddit threads in answers, just like people share links to posts here all the time.”

However, Reddit’s complaint questions this distinction, claiming Perplexity obtained content that was only available to Google’s crawler. The content was displayed in search results in just a few hours.

The filing also says that after sending a cease-and-desist notice in May 2024, requesting Perplexity cease scraping data from its servers, the number of access to Reddit were up by 40% instead of reducing, thereby increasing the company’s suspicions of being in violation of the law. 

The lawsuit portrays Perplexity and the scraping companies as operating as “would-be bank robbers,” bypassing security measures to access valuable content from users without authorization.

The legal dispute is a reflection of an overall conflict between AI developers keen to utilize massive amounts of online content for training, and platform owners who want to safeguard their rights to use data. 

Accusations From other Publishers

Other publishers such as Forbes and Wired have also accused Perplexity and have made similar claims against Perplexity. They do so by highlighting the problems with the republishing of exclusive content as well as evading the no-crawl guidelines of websites using shady methods.

Wired published that Perplexity accessed unannounced IPs and fake user-agent strings to get around robots.txt, Wired’s.

Cloudflare later confirmed that Perplexity was using “stealth, undeclared crawlers” that did not follow no-crawl guidelines Based on tests conducted in August.

Perplexity Response to Controversy

Perplexity has responded in the past to controversy by pointing out that the problem was due to product issues in the early stages and pledging better attribution of content. 

In this case Perplexity frame the lawsuit of Reddit as part of a larger negotiation regarding access to data from training and stating its firm. Perplexity pledge to not extort its users and writes:

“We summarize Reddit discussions… We won’t be extorted, and we won’t help Reddit extort Google.”

Why This Case Matters

The result of this lawsuit is crucial in the AI industry as well as content platforms too. It raises crucial legal issues concerning whether technological measures that are designed to safeguard online content are ineffective.

If the court rules in favor of Reddit the court could make stricter rules regarding the way AI assistants access and cite forums’ content. The court can possibly do so by limiting the circulation of information and setting an example for license agreements. 

A ruling in favor of Perplexity could lead to a greater use of publically accessible online forums for AI training, which could affect the manner in which digital content can be reused under the existing copyright laws.

What Remains Unclear

The lawsuit claims Perplexity obtained Reddit content from at least one scraping vendor The public filings don’t detail the vendor who provided specific data or describe the specific terms used in transactions. 

This lack of transparency creates an open question about the extent of the data acquisition process and the sources used.

Bottom Line

The legal dispute in the court between Reddit and Perplexity gives a key insight into the changing nature of content ownership, AI use of data, as well as the management of digital rights in a world. Its outcome could change the rules concerning data access, licensing and the ethical aspects of AI development.

Mohsin Pirzada
Mohsin Pirzada is a freelance writer and editor with over 7 years of experience in SEO content writing, digital…