OpenAI introduced the option to block the GPTBot on August 7th, we conducted a comprehensive study of the Top 24,000 global sites by traffic to find out how these sites have responded.

Our findings reveal that 2.4% of the top 24,000 sites chose to block GPTBot but 8.6% of the top 1000 blocked GPTBot which is 3.5 times higher. Dive into our analysis of 24,000 top global websites to uncover more insights! (Note: This study is updated every week)

Websites That Blocked GPTBot – Statistics

  • Only 2.4% of the top 24,000 websites worldwide have blocked GPTBot
  • 8.6% of the top 1,000 global websites have blocked GPTBot
  • 9% of the top 100 global websites have blocked GPTBot
  • The top 1,000 sites have blocked GPTBot 3.58 times more frequently than the top 24,000 websites
  • Reuters.com was the first to block GPTBot, only a day after its launch
  • Leading websites like Amazon, Quora, and NYTimes blocked it within a week

2.4% of the Top 24,000 Websites Worldwide Blocked GPTBot

StatusNumber of WebsitesPercentage (%)
Blocked6062.4%
Not Blocked2418297.6%

2.4% (606) websites out of 24,000 top websites have blocked OpenAI’s GPTBot which includes some of the prominent websites like Amazon, NYTimes, Quora, etc.

8.6% of the Top 1,000 Websites Worldwide Blocked GPTBot

StatusNumber of WebsitesPercentage (%)
Blocked868.6%
Not Blocked91491.4%

8.6% (86 websites) are blocked, while 91.4% (914 websites) are not blocked. Notably, when comparing 25,000 websites to 1,000 websites, the blocking rate is 3.58 times more

9% of The top 100 Websites Worldwide Blocked GPTBot

StatusNumber of WebsitesPercentage (%)
Blocked99%
Not Blocked9191%

It is interesting to note that the top 1,000 sites have blocked GPTBot 3.58 times more frequently than the top 24,000 global websites. This is a huge difference and it shows that top websites are much more proactive in protecting their data.

Popular Websites That Blocked GPTBot

Top 10 Websites that had Blocked GPTBot

Website PopularityWebsite
3http://amazon.com/
23http://nytimes.com/
28http://thesaurus.com/
31http://cnn.com/
32http://healthline.com/
37http://dictionary.com/
52http://medicalnewstoday.com/
58http://quora.com/
68http://vocabulary.com/
101http://kbb.com/

Our Methodology:

Selection

We identified the top 24,000 websites using the data from the popular marketing tool SEMrush.

Study

After identifying the Top 24,000 websites, we used a Python automation script to inspect the domain’s robots.txt file and understand whether they have blocked the GPTBot access or not.

Detailed Data

  • Analysis date: 28th August, 2023
  • Total websites analysed: 24,000
  • Websites that blocked GPTBot: 606

For a complete list of websites that blocked GPTBot, click here.

(Note: Our data is based on search traffic.)

Top 100 Websites that had Blocked GPTBot

Website PopularityWebsiteGPTBot Blocked
3http://amazon.com/Yes
23http://nytimes.com/Yes
28http://thesaurus.com/Yes
31http://cnn.com/Yes
32http://healthline.com/Yes
37http://dictionary.com/Yes
52http://medicalnewstoday.com/Yes
58http://quora.com/Yes
68http://vocabulary.com/Yes
101http://kbb.com/Yes
116http://ikea.com/Yes
126http://wikihow.com/Yes
136http://reuters.com/Yes
145http://basketball-reference.com/Yes
151http://aa.com/Yes
153http://airbnb.com/Yes
159http://abcnews.go.com/Yes
189http://businessinsider.com/Yes
190http://pbs.org/Yes
193http://bankrate.com/Yes
198http://pro-football-reference.com/Yes
201http://baseball-reference.com/Yes
203http://usmagazine.com/Yes
223http://theathletic.com/Yes
225http://eater.com/Yes
246http://spanishdict.com/Yes
253http://theverge.com/Yes
275http://insider.com/Yes
277http://nationalgeographic.com/Yes
279http://bloomberg.com/Yes
281http://autotrader.com/Yes
292http://trulia.com/Yes
326http://mashable.com/Yes
348http://archiveofourown.org/Yes
355http://shutterstock.com/Yes
368http://hellomagazine.com/Yes
372http://medscape.com/Yes
387http://vulture.com/Yes
392http://allure.com/Yes
397http://masterclass.com/Yes
400http://aarp.org/Yes
405http://www.tumblr.com/Yes
407http://economictimes.com/Yes
416http://nymag.com/Yes
420http://tvtropes.org/Yes
425http://polygon.com/Yes
443http://radiotimes.com/Yes
447http://abcya.com/Yes
457http://stackexchange.com/Yes
470http://foursquare.com/Yes
486http://pcmag.com/Yes
494http://stackoverflow.com/Yes
514http://popsugar.com/Yes
532http://vogue.com/Yes
542http://flightaware.com/Yes
563http://bonappetit.com/Yes
564http://wallethub.com/Yes
569http://wired.com/Yes
573http://glamour.com/Yes
577http://disney.com/Yes
606http://coursera.org/Yes
611http://onceuponachef.com/Yes
641http://newyorker.com/Yes
664http://teacherspayteachers.com/Yes
669http://alamy.com/Yes
672http://vanityfair.com/Yes
721http://ingles.com/Yes
724http://wowhead.com/Yes
746http://axios.com/Yes
747http://theatlantic.com/Yes
751http://timesofindia.com/Yes
754http://rtings.com/Yes
766http://rxlist.com/Yes
785http://teenvogue.com/Yes
823http://medicinenet.com/Yes
856http://healthgrades.com/Yes
862http://vox.com/Yes
867http://gq.com/Yes
875http://inspiredtaste.net/Yes
878http://architecturaldigest.com/Yes
934http://vistaprint.com/Yes
957http://psychcentral.com/Yes
959http://game8.co/Yes
964http://thrillist.com/Yes
968http://cntraveler.com/Yes
987http://dotesports.com/Yes
1006http://theringer.com/Yes
1019http://nextdoor.com/Yes
1042http://bbcgoodfood.com/Yes
1052http://thecut.com/Yes
1054http://thepointsguy.com/Yes
1086http://ggrecon.com/Yes
1096http://nydailynews.com/Yes
1106http://abc.com/Yes
1113http://arstechnica.com/Yes
1138http://pitchfork.com/Yes
1193http://epicurious.com/Yes
1202http://chicagotribune.com/Yes
1211http://j-14.com/Yes
1227http://nytcrosswordanswers.org/Yes

How to Detect GPTBot’s Activity on any Website?

GPTBot, the web crawler for ChatGPT, can be identified by its unique “GPTBot” token and a specific user-agent string:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

To check if it visited your site, simply search for this signature in your server logs. If you find it, GPTBot was there.

How to Manage GPTBot’s Access?

You can easily control GPTBot’s access to your site. To block it completely, add this to your robots.txt file:

User-agent: GPTBot
Disallow: /

To allow access to certain parts, modify your robots.txt like this:

User-agent: GPTBot
Allow: /directory-1/
Disallow: /directory-2/

This way, you decide exactly where GPTBot can go, making sure it interacts with your content the way you prefer.

Conclusion

The vast majority (97.6%) of the top 24,000 websites globally have not implemented blocks against OpenAI’s GPTBot. A small fraction (2.4%) have actively chosen to do so.

The top 1,000 sites have blocked GPTBot 3.58 times more frequently than the top 24,000 global websites. This shows that the biggest websites are more cautious about their data.

Regular Updates

Our team continuously monitors and analyses the web, ensuring the information provided in this study is up to date. We will update the latest statistics and information almost every week.

If you have questions or need further information, please feel free to contact us.