Cloudflare Blocks And Delist Perplexity for Deceptive Crawling Behavior

Cloudflare Delists And Blocks Perplexity From Crawling Websites

Cloudflare has blocked and delisted Perplexity from crawling sites due to deceptive crawling behaviour and infractions to the robots.txt protocol.

Cloudflare announced that they delisted Perplexity’s crawler due to the use of an aggressive, misleading crawling strategy. This action follows a number of user complaints as well as an internal investigation that revealed Perplexity utilized methods that were designed to bypass the standard crawling safeguards.

Cloudflare Verified Bots Program

Cloudflare’s Verified Bots program is designed to ensure that crawlers on websites adhere to certain guidelines, for example following robots.txt file, the files that define the guidelines to web crawlers. Bots that adhere to these guidelines are whitelisted, permitting them to access secure websites. Perplexity’s actions breached the rules by not recognizing robots.txt and employing tactics to hide the identity of its crawler.

Stealth Crawling Tactics Uncovered

Cloudflare’s investigation found that Perplexity utilized changing IP addresses and fake users to hide its crawlers. When it was denied access to websites, Perplexity switched to different IP addresses that weren’t officially associated using the ASN (Autonomous System Number). It was able to allow Perplexity to bypass limitations and continue to crawl, without being detected.

For example, Perplexity’s bots are identified with the following user agents:

  • PerplexityBot
  • Perplexity-User

Apart from rotating IP addresses Perplexity also used an agent-spoofing. By pretending to be legitimate browsers like Chrome on the Mac, Perplexity attempted to avoid detection by filters that are used to identify bots known to be. This technique is described as “spoofing,” where a bot disguises itself as a user to get around detection.

According to Cloudflare, Perplexity used the following:

“Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36”

Cloudflare’s Response

As a response to the strategies, Cloudflare officially delisted Perplexity from its Verified Bots program, blocking all its crawlers from accessing websites that are protected by Cloudflare. Cloudflare’s announcement emphasized how important trust within the digital world, noting that bots must remain transparent and respect users’ preferences and adhere to clear guidelines.

Cloudflare announced that Perplexity is delisted as a verified bot and will be blocked:

“The Internet as we have known it for the past three decades is rapidly changing, but one thing remains constant: it is built on trust. There are clear preferences that crawlers should be transparent, serve a clear purpose, perform a specific activity, and, most importantly, follow website directives and preferences. Based on Perplexity’s observed behavior, which is incompatible with those preferences, we have de-listed them as a verified bot and added heuristics to our managed rules that block this stealth crawling.”

Key Takeaways

Infraction of the Cloudflare Verified Bots Policy: Perplexity violated Cloudflare’s policies in that it did not follow the fundamental crawling guidelines including adhering to robots.txt instructions.

Stealth Crawling Techniques The platform employed changing IP addresses and fake user agents to evade blocking attempts to continue to crawl.

User Agent Spoofing The bots of Perplexity were disguised as genuine web traffic to elude detection.

Cloudflare’s response delisting of Cloudflare and restraining Perplexity’s crawlers sends a clear warning against fraudulent crawling practices.

SEO implications websites that are secured by Cloudflare must verify that the crawlers of Perplexity are blocked and if they are, then allow access through the Cloudflare dashboard.

Perplexity’s Response

In a counter-argument, Perplexity argued that Cloudflare was mischaracterizing its actions, saying that their AI assistants, which collect data on the basis of user inputs that should not be considered malicious bots. Perplexity argued that this stance discriminates against automated tools that meet legitimate users’ needs.

“When companies like Cloudflare mischaracterize user-driven AI assistants as malicious bots, they’re arguing that any automated tool serving users should be suspect—a position that would criminalize email clients and web browsers, or any other service a would-be gatekeeper decided they don’t like.

This controversy reveals that Cloudflare’s systems are fundamentally inadequate for distinguishing between legitimate AI assistants and actual threats. If you can’t tell a helpful digital assistant from a malicious scraper, then you probably shouldn’t be making decisions about what constitutes legitimate web traffic.”

The Perplexity stance is important issue regarding separating legitimate AI bots from malicious ones. It also highlights the difficulty of creating tools that can effectively differentiate from the other. Cloudflare’s decision to stop Perplexity underscores the necessity for transparent and robust methods of detecting bots in the time of AI-driven tools.

What This Means for SEOs?

For SEO professionals, this incident serves as a reminder of the ever-changing character of bot management and the ongoing challenge of balancing user-friendly automation with ethical web practices.

Mohsin Pirzada
Mohsin Pirzada is a freelance writer and editor with over 7 years of experience in SEO content writing, digital…