OpenAI introduced the option to block the GPTBot on August 7th, we conducted a comprehensive study of the Top 24,000 global sites by traffic to find out how these sites have responded.
Our findings reveal that 2.4% of the top 24,000 sites chose to block GPTBot but 8.6% of the top 1000 blocked GPTBot which is 3.5 times higher. Dive into our analysis of 24,000 top global websites to uncover more insights! (Note: This study is updated every week)
Websites That Blocked GPTBot – Statistics
- Only 2.4% of the top 24,000 websites worldwide have blocked GPTBot
- 8.6% of the top 1,000 global websites have blocked GPTBot
- 9% of the top 100 global websites have blocked GPTBot
- The top 1,000 sites have blocked GPTBot 3.58 times more frequently than the top 24,000 websites
- Reuters.com was the first to block GPTBot, only a day after its launch
- Leading websites like Amazon, Quora, and NYTimes blocked it within a week
2.4% of the Top 24,000 Websites Worldwide Blocked GPTBot
Status | Number of Websites | Percentage (%) |
---|---|---|
Blocked | 606 | 2.4% |
Not Blocked | 24182 | 97.6% |
2.4% (606) websites out of 24,000 top websites have blocked OpenAI’s GPTBot which includes some of the prominent websites like Amazon, NYTimes, Quora, etc.
8.6% of the Top 1,000 Websites Worldwide Blocked GPTBot
Status | Number of Websites | Percentage (%) |
---|---|---|
Blocked | 86 | 8.6% |
Not Blocked | 914 | 91.4% |
8.6% (86 websites) are blocked, while 91.4% (914 websites) are not blocked. Notably, when comparing 25,000 websites to 1,000 websites, the blocking rate is 3.58 times more
9% of The top 100 Websites Worldwide Blocked GPTBot
Status | Number of Websites | Percentage (%) |
---|---|---|
Blocked | 9 | 9% |
Not Blocked | 91 | 91% |
It is interesting to note that the top 1,000 sites have blocked GPTBot 3.58 times more frequently than the top 24,000 global websites. This is a huge difference and it shows that top websites are much more proactive in protecting their data.
Popular Websites That Blocked GPTBot
Top 10 Websites that had Blocked GPTBot
Our Methodology:
Selection
We identified the top 24,000 websites using the data from the popular marketing tool SEMrush.
Study
After identifying the Top 24,000 websites, we used a Python automation script to inspect the domain’s robots.txt file and understand whether they have blocked the GPTBot access or not.
Detailed Data
- Analysis date: 28th August, 2023
- Total websites analysed: 24,000
- Websites that blocked GPTBot: 606
For a complete list of websites that blocked GPTBot, click here.
(Note: Our data is based on search traffic.)
Top 100 Websites that had Blocked GPTBot
How to Detect GPTBot’s Activity on any Website?
GPTBot, the web crawler for ChatGPT, can be identified by its unique “GPTBot” token and a specific user-agent string:
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)
To check if it visited your site, simply search for this signature in your server logs. If you find it, GPTBot was there.
How to Manage GPTBot’s Access?
You can easily control GPTBot’s access to your site. To block it completely, add this to your robots.txt file:
User-agent: GPTBot
Disallow: /
To allow access to certain parts, modify your robots.txt like this:
User-agent: GPTBot
Allow: /directory-1/
Disallow: /directory-2/
This way, you decide exactly where GPTBot can go, making sure it interacts with your content the way you prefer.
Conclusion
The vast majority (97.6%) of the top 24,000 websites globally have not implemented blocks against OpenAI’s GPTBot. A small fraction (2.4%) have actively chosen to do so.
The top 1,000 sites have blocked GPTBot 3.58 times more frequently than the top 24,000 global websites. This shows that the biggest websites are more cautious about their data.
Regular Updates
Our team continuously monitors and analyses the web, ensuring the information provided in this study is up to date. We will update the latest statistics and information almost every week.
If you have questions or need further information, please feel free to contact us.