OpenAI introduced the option to block GPTBot option on August 7th, we conducted a comprehensive study of the top 1000 popular UK sites by traffic to find out how these sites have responded.

Our findings reveal that 11% of the top 100 sites chose to block GPTBot but 5.7% of the top 1000 blocked GPTBot. Dive into our analysis of 1000 top UK websites to uncover more insights! (Note: This study is updated weekly)

Websites That Blocked GPTBot – Statistics

  • Only 5.7% of the top 1000 popular UK websites have blocked GPTBot
  • 11% of the top 100 UK websites have blocked GPTBot
  • The top 100 sites have blocked GPTBot 1.93 times more than the top 1000 websites
  • Number of global websites have blocked GPTBot is 1.56 times more than the top 1000 UK websites
  • Leading websites like Amazon, BBC, and Healthline blocked it within a week

11% of the Top 100 Websites Worldwide Blocked GPTBot

StatusNumber of WebsitesPercentage (%)
Blocked1111%
Not Blocked8989%

11% (11) websites out of 100 top websites have blocked OpenAI’s GPTBot which includes some of the prominent websites like BBC, Ikea, NewsNow etc.

5.7% of the Top 1,000 UK Websites Blocked GPTBot

StatusNumber of WebsitesPercentage (%)
Blocked575.7%
Not Blocked94394.3%

5.7% (57 websites) are blocked, while 94.3% (943 websites) are not blocked. Notably, when comparing 100 websites to 1,000 websites, the blocking rate is 1.93 times lesser in top 1000 websites.

No. of Global Websites Blocked GPTBot is 1.56 Times more than the Top 1000 UK Websites

Global/UKNumber of websites blocked
Global86
UK57

It is interesting to note that the top 1,000 UK sites that blocked GPTBot is 1.56 times less than the global websites. This is a huge difference and it shows that top global websites are much more proactive in allowing the GPTBot access than the UK websites.

Popular UK Websites that Blocked GPTBot

Top 10 Websites that had Blocked GPTBot

WebsiteGPTBot Disallowed
amazon.co.ukYes
bbcgoodfood.comYes
thesaurus.comYes
healthline.comYes
ikea.comYes
nytimes.comYes
dictionary.comYes
medicalnewstoday.comYes
radiotimes.comYes
newsnow.co.ukYes

Our Methodology:

Selection

We identified the top 1000 UK websites using the data from the popular marketing tool SEMrush.

Study

After identifying the Top 1000 websites, we used a Python automation script to inspect the domain’s robots.txt file and understand whether they have blocked the GPTBot access or not.

Detailed Data

  • Analysis date: 29th August, 2023
  • Total websites analysed: 1000
  • Websites that blocked GPTBot: 57

For a complete list of websites that blocked GPTBot, click here.

(Note: Our data is based on search traffic.)

Top 100 Websites that had Blocked GPTBot

WebsiteGPTBot Disallowed
amazon.co.ukYes
bbcgoodfood.comYes
thesaurus.comYes
healthline.comYes
ikea.comYes
nytimes.comYes
dictionary.comYes
medicalnewstoday.comYes
radiotimes.comYes
newsnow.co.ukYes
vocabulary.comYes
hellomagazine.comYes
amazon.comYes
quora.comYes
airbnb.co.ukYes
metro.co.ukYes
cnn.comYes
wikihow.comYes
reuters.comYes
gardenersworld.comYes
mashable.comYes
glamourmagazine.co.ukYes
economictimes.comYes
bloomberg.comYes
tvtropes.orgYes
nationalgeographic.comYes
businessinsider.comYes
shutterstock.comYes
pcmag.comYes
theathletic.comYes
insider.comYes
masterclass.comYes
olivemagazine.comYes
lonelyplanet.comYes
gq-magazine.co.ukYes
revolut.comYes
theverge.comYes
cntraveller.comYes
archiveofourown.orgYes
washingtonpost.comYes
pbs.orgYes
alamy.comYes
allure.comYes
vanityfair.comYes
usmagazine.comYes
timesofindia.comYes
vogue.co.ukYes
vulture.comYes
www.tumblr.comYes
stackexchange.comYes
polygon.comYes
foursquare.comYes
abcnews.go.comYes
fbref.comYes
distractify.comYes
stackoverflow.comYes
rtings.comYes
medicinenet.comYes
vistaprint.co.ukYes
fragrantica.comYes
theweathernetwork.comYes
glamour.comYes
wired.comYes
flightaware.comYes
spanishdict.comYes
newyorker.comYes
vogue.comYes
popsugar.comYes
inspiredtaste.netYes
ggrecon.comYes
psychcentral.comYes
medscape.comYes
madeformums.comYes
teenvogue.comYes
londonxlondon.comYes
wowhead.comYes
countryfile.comYes
houseandgarden.co.ukYes
abc.net.auYes
boardgamegeek.comYes
pitchfork.comYes
tatler.comYes
bikeradar.comYes
coursera.orgYes
simplypsychology.orgYes
androidauthority.comYes
theatlantic.comYes
bonappetit.comYes
economist.comYes
vox.comYes
disney.co.ukYes
thetrendspotter.netYes
dotesports.comYes
thecut.comYes
sbnation.comYes
mft.nhs.ukYes
j-14.comYes
bustle.comYes
wired.co.ukYes
onceuponachef.comYes

How to Detect GPTBot’s Activity on any Website?

GPTBot, the web crawler for ChatGPT, can be identified by its unique “GPTBot” token and a specific user-agent string:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

To check if it visited your site, simply search for this signature in your server logs. If you find it, GPTBot was there.

How to Manage GPTBot’s Access?

You can easily control GPTBot’s access to your site. To block it completely, add this to your robots.txt file:

User-agent: GPTBot
Disallow: /

To allow access to certain parts, modify your robots.txt like this:

User-agent: GPTBot
Allow: /directory-1/
Disallow: /directory-2/

This way, you decide exactly where GPTBot can go, making sure it interacts with your content the way you prefer.

Conclusion

The vast majority (94.3%) of the top 1,000 UK websites have not blocked OpenAI’s GPTBot. A small fraction (5.7%) have actively chosen to do so.

The top UK 1,000 sites have blocked GPTBot 1.56 times less than the top 1,000 global websites. This shows that the biggest websites are more cautious about their data.

Regular Updates

Our team continuously monitors and analyses the web, ensuring the information provided in this study is upto date. We will update the latest statistics and information almost every week.

If you have questions or need further information, please feel free to contact us.