Should I Block ChatGPT from Crawling My Site?

Listen Now Play
Rion Haber | Oct 29, 2023

OpenAI recently announced that it will allow websites to block ChatGPT from crawling their content. The ability is part of an effort to push back on growing concern about the ability of creators to protect their intellectual property from being recycled by chat engines, often without any permission or credit at all. At the end of the day, it may prompt as many questions as answers, though. How will allowing or blocking chat engines from seeing a person or organization’s content impact their ability to get found in the future? In this post, we’ll explore whether allowing ChatGPT to scrape your website makes sense for you.

What are Web Crawlers?

If you’re familiar with digital marketing, you’ve probably heard of web crawlers; if not, they’re a tool that inspects content across the web from personal blogs to business applications. Crawlers can be used for many tasks, but the most familiar is often SEO. Consumer- and business-facing websites allow search engines like Google and Bing to map their site’s metadata, information architecture, and content because it allows them to be more easily found by users. As a result, we rely on search engines as a relatively agnostic way to navigate the incredibly wide array of ideas that comprises the internet.

The large language models (LLMs) powering platforms like ChatGPT use web crawlers a little differently, though. They examine the same website properties as search engines. Instead of returning sources based on a site’s content, however, they ingest and aggregate each source’s content and use generative AI to deliver it in a more conversational style. This process of converting content from multiple experts and delivering it back to the user as an “original work” has prompted questions about sustainability, though. If there’s no benefit to exposing your expertise online, what’s the point? 

That’s why the announcement that ChatGPT would allow websites to block crawlers from including their unique content in results, prompted questions: For instance, Will preventing access to your website lower or eliminate the chance of being referred by a chat engine? Can you provide access while still protecting sensitive IP? Does subject matter expertise delivered at the top of the funnel benefit referrals at the bottom? While generative AI is still, largely, in its infancy, it represents a serious threat for people and organizations that now use subject matter experts to get found online – prompting questions about how to block ChatGPT from accessing propriatary content.

For chat platforms like OpenAI, the threat may be even more serious. Chat engines’ process of aggregating and repackaging content sidesteps a vast creator economy that’s been at the heart of the internet since it became a part of our modern lexicon. Without the fresh content provided by a “knowledge for traffic” economy, LLMs may cannibalize themselves – a potentiality demonstrated by recent reports that ChatGPT  is getting “dumber when blocked by sources of high-quality and relevant content like the newspapers”. The only logical conclusion is chat engines need to expand to include value-based arbitrage or they will wither and die on the vine.

It's All About Context

So, should you allow chat engines to crawl your site? In short, “It depends”. In their current iteration, chat engines push many of the boundaries of IP law. In fact, at the time of this post, several suits are challenging OpenAI’s right to repurpose original content at all. Others address topics ranging from whether AI generated content is copyrightable at all to its potential use in election interference. All of this is to say that what may be sound advice at this stage of generative AI’s journey, could prove ineffective at the next. The best we can do is provide short term answers to long-term questions and be aware that volatility in the space is likely.

For the current snapshot in time and now that Open AI is releasing tools that allow publishers from blocking OpenAI’s web crawlers, it may be best to turn the question on its head, though: That is, not “Should I allow chat engines to crawl my site”, but “how much of my site should I allow chat engines to crawl”? 

Questions about your business, the products and services you sell, your history, testimonial content, and FAQs are all unique to you and may help chat engines distinguish and recommend your business without presenting much threat of being aggregated into a larger search category that would otherwise net you business. On the other hand, if you earn traffic by answering common questions through long-tail content, it may pay to think more conservatively, leaving your site open to search, but not chat engines for the time being. 

Not every business should approach the question of whether to allow chat engines to crawl their site in a homogeneous way, though. For one and despite the doomsayers, chat engines aren’t necessarily impacting search the way many thought they would; at present it’s become more complimentary than disruptive; they may be mutually inclusive. Secondarily, if you can’t be found by chat engines, you can’t be referred to by them. So, being seen by chat engines and seen early may prove impactful down the road – the same as it was with search. Finally, chat engines are still unrefined and make up answers (a phenomenon known as “hallucinating”). That creates a gap between user expectation and experience that can be a brand killer. 

Because of that, the answer to whether you should allow ChatGPT to crawl your website may have more to do with your unique business model, tolerance for experimentation, and a strategy for getting found through chat (chat engine optimization) than a single “correct” approach.

You can use these guidelines as a starting point for your strategy:

Allow ChatGPT Crawlers to Access Your Site

If your website is non-commercial or contains relatively static and non-proprietary information about your mission, services, team etc, it’s probably okay to allow ChatGPT to index your site’s full diaspora of content. Web properties of this kind generally offer enough business information to make them recognizable, without exposing trade secrets or other types of sensitive or value-based content.

Block ChatGPT Crawlers to Access Your Site

If your website thrives on analysis of content or news based on subject matter expertise, you might want to consider blocking chat engine crawlers from analyzing any part of your site for the time being. While the absorption rate for news and analysis content is unknown, we’ve seen examples of rapid ingestion in early testing of ChatGPT’s Bing extension. 

Allow Some ChatGPT Crawlers to Access Your Site

If your website offers a mix of business and subject matter expertise (as is the most common scenario) it may be smartest to take blocking on a page by page basis. Allow access to critical information about what makes your organization unique, while blocking access to proprietary  thought-leadership content or knowledge. For example, this post blocks chat GPT crawlers.

The Future of Chat Engines

By the time anyone had heard of ChatGPT, it had already scraped billions of websites and was doing a good job of answering relatively complex queries about the world around us. To that end, some say that the damage is done; the ability to block ChatGPT will only lead to exclusion from its results. Staying relevant, however, means that these models will need to take in new, high-quality data on a regular basis. So, when deciding whether to allow ChatGPT to crawl your site, it may be less important to focus on the technology’s challenges or benefits, than how we can align our digital marketing strategy with them to expose audiences to the right content at the right time to make a purchasing decision.

Had you asked us in 1997 what the web was going to look like in 30 years, we likely would have pointed to very advanced forms of Netscape. By 2006 that company’s market share had dropped from over 90% to less than 1%. What we can’t tell yet is what chat will evolve into. Early indicators point to something more akin to a deeply personalized assistant than a search engine. Assuming that we manage not to throw the baby out with the bathwater, new legislation may arise that protects creators while giving searchers a more effective, impactful, and easy to use discovery experience. When integrated into tools like IoT and mobile devices, the technology will become a vital part of any digital marketing strategy.

Take the next steps now with a discovery call.

Get a Free Quote

How to Build a Marketing Team on a Budget

Rion Haber Rion Haber | Nov 01, 2023

The communications ecosystem is growing more complex. In this post, we explore how you still build a marketing team without breaking the bank. ...

What is Chat Engine Optimization?

Rion Haber | Oct 29, 2023

Learn how to stay visible online in the age of chat bots by leveraging search, answer, and chat engine optimization for discovery, ...

Will Chat Replace Search?

Rion Haber Rion Haber | Feb 01, 2023

As competition between generative AI models like GPT, Bard, LLaMA, and others heats up, so has speculation about the future of search engines....

Marketing During a Recession without Reducing Quality

Rion Haber Rion Haber | Feb 01, 2023

With fears of an oncoming recession, most organizations have already begun tightening their belts. Marketing budgets are notorious for being the first to feel the squeeze....

The Difference Between Omnichannel & Integrated Marketing

Rion Haber Rion Haber | Feb 01, 2021

Omnichannel marketing offers communications leaders a powerful strategy for creating relationship-driven brand storytelling . It can be enormously effective at growing a loyal following, but delivering on that promise is often more complex than it first appears....

What is Integrated Marketing?

Rion Haber Rion Haber | Jan 05, 2023

Integrated marketing is an Agile marketing operations (MOPs) framework conceived to help organizations deploy complex omnichannel messaging strategies at scale....