How to Block User Agent ChatGPT, Claude, Perplexity and Others on Blogger
As website owners, we've certainly realized over the years that AI has transformed the user experience and the appeal of blog articles. Many have turned to the more practical approach of simply entering a prompt and receiving an immediate answer... yes, it's practical because it's already been summarized in such a way that it eliminates unnecessary words, or what used to be called small talk.
Blogging used to feel like chatting with a human because of that small talk (back in the 2009s), but over time, interactions have changed... :) So, shorter, more concise, and clearer content has become popular (this was before the popularity of generative AI). As website owners, we naturally follow suit to maintain traffic.
Every step feels like a race; on one hand, we have to survive. On the other hand, we're very enthusiastic about a field but lack the skills. Hehe, there's something that always keeps us going: the love of sharing knowledge... because it allows us to firmly embed that knowledge within ourselves. I feel that, rather than just receiving a lecture, taking notes, finishing and then forgetting, but by applying it, writing it down or sharing it in other ways, it can really increase our level of understanding of something.
Sometimes I think, it would be nice if life was secure, we could study and gain knowledge in peace and more...
Yes, that's how it is, and of course, currently web/site owners are also faced with Web Crawlers that are more structured and legal.. haha, it's like saying a group has resources, but there is no content yet, so we don't block OpenAI ChatGPT, GPTbot will get access to website content to learn it by crawling the internet. By crawling websites, GPTbot then extracts data and uses that data to train its language model, which allows GPTbot to create text, translate languages, write various types of creative content, and answer user-based questions in an informative way.
If we want to block our content from becoming AI training data, then we have to update the robots.txt of the website or blog.
Generally, all website owners use robots.txt. This is a basic text file that tells web crawlers which pages or directories on our website they can and cannot access using a series of instructions.
To block the ChatGPT Web Crawler from using your website content, you can add the following two lines to your robots.txt file. This tells the ChatGPT crawler that it's not allowed to access any pages or posts on your website. If you have more than one user agent, you can add a separate line.
Example :
# Model-training crawler. Opt-out if you don't want to be crawled by GPT-4o or GPT-5.
User-agent: GPTBot
Disallow: /private/ # not allowed in private folder
Allow: / # you can take it
So, if we want to allow everything, just press Allow:/
Whereas if we don't want everything, just press Disallow:/
To do
then activate it
After that, click on custom robot.txt, and fill in
Then, save it..
However, we know that there are a lot of crawling bots floating around, this method was actually used for other purposes, especially to prevent, and with the many user agents that crawl automatically nowadays in the AI era, of course that is just one of many if you really want to not be crawled and the content is used as training material...
Here are some other User-Agent AI Crawlers:
| Services/Companies | User-Agent | Information | Disallow robots.txt |
|---|---|---|---|
| OpenAI | GPTBot | Used to train GPT models, access public content. | User-agent: GPTBot |
| OpenAI (ChatGPT plugins, API) | ChatGPT-User | Used when a user requests crawling via ChatGPT. | User-agent: ChatGPT-User |
| Anthropic (Claude) | ClaudeBot | Used by Claude for public crawling. | User-agent: ClaudeBot |
| Google AI (Gemini) | Google-Extended | AI version of Googlebot (using basic Googlebot but can be controlled with Google-Extended). | User-agent: Google-Extended |
| CCBot (Common Crawl) | CCBot | Big data sources that many AI models use include GPT, Claude, etc. | User-agent: CCBot |
| Perplexity AI | PerplexityBot | The official bot from Perplexity.ai, used to crawl the web while answering questions. | User-agent: PerplexityBot |
| You.com | YouBot atau youBot | Used by search engines and their AI. | User-agent: YouBot |
| Neeva AI (tidak aktif, tapi sempat digunakan) | Neevabot | From the Neeva search engine before it was acquired. | User-agent: Neevabot |
| Amazon Bot (AI / Alexa) | Amazonbot | Can be used for Alexa or internal model training. | User-agent: Amazonbot |
| AppleBot (Siri / AI) | Applebot | Used by Siri and Apple's AI search features. | User-agent: Applebot |
| DuckDuckGo | DuckDuckBot | Used for indexing and AI summarization. | User-agent: DuckDuckBot |
| Meta (Facebook) | facebookexternalhit | Used for link preview scraping and AI content retrieval. | User-agent: facebookexternalhit |
| Bing / Copilot (Microsoft) | bingbot, BingPreview, msnbot | Powered by Bing and Copilot AI. | User-agent: Bingbot, BingPreview |
The method is the same, if you don't want to be crawled..
User-agent: GPTBot
Disallow: /User-agent: ChatGPT-User
Disallow: /User-agent: ClaudeBot
Disallow: /User-agent: Google-Extended
Disallow: /User-agent: CCBot
Disallow: /User-agent: PerplexityBot
Disallow: /User-agent: YouBot
Disallow: /User-agent: Neevabot
Disallow: /User-agent: Amazonbot
Disallow: /User-agent: Applebot
Disallow: /User-agent: DuckDuckBot
Disallow: /User-agent: facebookexternalhit
Disallow: /User-agent: bingbot
Disallow: /User-agent: BingPreview
Disallow: /
But yeah, back again bro, if it's me, well, what else can I do, it's like that now, right? And even if it doesn't come from us, there are still many others that can be used as training materials to make a better model... but actually the point of this article is not to make our content AI training material, but more about if we build a website, maybe there is sensitive information there, then we can prevent the AI BOT CRAWL from crawling and making it a tainting material... in other words, we can use it so that the user agent cannot access something private or something that shouldn't be crawled like that...
Otherwise, let's just take the lesson from it as mutual symbiosis...
Including Google too, right? 😄😄😄



Post a Comment for "How to Block User Agent ChatGPT, Claude, Perplexity and Others on Blogger"