Skip to content Skip to sidebar Skip to footer

Widget Atas Posting

How to Block User Agent ChatGPT, Claude, Perplexity and Others on Blogger

As website owners, we've certainly realized over the years that AI has transformed the user experience and the appeal of blog articles. Many have turned to the more practical approach of simply entering a prompt and receiving an immediate answer... yes, it's practical because it's already been summarized in such a way that it eliminates unnecessary words, or what used to be called small talk.


Blogging used to feel like chatting with a human because of that small talk (back in the 2009s), but over time, interactions have changed... :) So, shorter, more concise, and clearer content has become popular (this was before the popularity of generative AI). As website owners, we naturally follow suit to maintain traffic.


Every step feels like a race; on one hand, we have to survive. On the other hand, we're very enthusiastic about a field but lack the skills. Hehe, there's something that always keeps us going: the love of sharing knowledge... because it allows us to firmly embed that knowledge within ourselves. I feel that, rather than just receiving a lecture, taking notes, finishing and then forgetting, but by applying it, writing it down or sharing it in other ways, it can really increase our level of understanding of something.


Sometimes I think, it would be nice if life was secure, we could study and gain knowledge in peace and more...


Yes, that's how it is, and of course, currently web/site owners are also faced with Web Crawlers that are more structured and legal.. haha, it's like saying a group has resources, but there is no content yet, so we don't block OpenAI ChatGPT, GPTbot will get access to website content to learn it by crawling the internet. By crawling websites, GPTbot then extracts data and uses that data to train its language model, which allows GPTbot to create text, translate languages, write various types of creative content, and answer user-based questions in an informative way.


If we want to block our content from becoming AI training data, then we have to update the robots.txt of the website or blog.


Generally, all website owners use robots.txt. This is a basic text file that tells web crawlers which pages or directories on our website they can and cannot access using a series of instructions.


To block the ChatGPT Web Crawler from using your website content, you can add the following two lines to your robots.txt file. This tells the ChatGPT crawler that it's not allowed to access any pages or posts on your website. If you have more than one user agent, you can add a separate line.



Example :

# Model-training crawler. Opt-out if you don't want to be crawled by GPT-4o or GPT-5.

User-agent: GPTBot
Disallow: /private/          # not allowed in private folder
Allow: /                     # you can take it

So, if we want to allow everything, just press Allow:/
Whereas if we don't want everything, just press Disallow:/



To do


robot custom


then activate it


robot custom crawl


After that, click on custom robot.txt, and fill in


block ai agent


Then, save it..



However, we know that there are a lot of crawling bots floating around, this method was actually used for other purposes, especially to prevent, and with the many user agents that crawl automatically nowadays in the AI ​​era, of course that is just one of many if you really want to not be crawled and the content is used as training material...


Here are some other User-Agent AI Crawlers:


Services/CompaniesUser-AgentInformationDisallow robots.txt
OpenAIGPTBotUsed to train GPT models, access public content.User-agent: GPTBot
OpenAI (ChatGPT plugins, API)ChatGPT-UserUsed when a user requests crawling via ChatGPT.User-agent: ChatGPT-User
Anthropic (Claude)ClaudeBotUsed by Claude for public crawling.User-agent: ClaudeBot
Google AI (Gemini)Google-ExtendedAI version of Googlebot (using basic Googlebot but can be controlled with Google-Extended).User-agent: Google-Extended
CCBot (Common Crawl)CCBotBig data sources that many AI models use include GPT, Claude, etc.User-agent: CCBot
Perplexity AIPerplexityBotThe official bot from Perplexity.ai, used to crawl the web while answering questions.User-agent: PerplexityBot
You.comYouBot atau youBotUsed by search engines and their AI.User-agent: YouBot
Neeva AI (tidak aktif, tapi sempat digunakan)NeevabotFrom the Neeva search engine before it was acquired.User-agent: Neevabot
Amazon Bot (AI / Alexa)AmazonbotCan be used for Alexa or internal model training.User-agent: Amazonbot
AppleBot (Siri / AI)ApplebotUsed by Siri and Apple's AI search features.User-agent: Applebot
DuckDuckGoDuckDuckBotUsed for indexing and AI summarization.User-agent: DuckDuckBot
Meta (Facebook)facebookexternalhitUsed for link preview scraping and AI content retrieval.User-agent: facebookexternalhit
Bing / Copilot (Microsoft)bingbotBingPreviewmsnbotPowered by Bing and Copilot AI.User-agent: BingbotBingPreview


The method is the same, if you don't want to be crawled..


User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: Neevabot
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: Applebot
Disallow: /

User-agent: DuckDuckBot
Disallow: /

User-agent: facebookexternalhit
Disallow: /

User-agent: bingbot
Disallow: /

User-agent: BingPreview
Disallow: /



But yeah, back again bro, if it's me, well, what else can I do, it's like that now, right? And even if it doesn't come from us, there are still many others that can be used as training materials to make a better model... but actually the point of this article is not to make our content AI training material, but more about if we build a website, maybe there is sensitive information there, then we can prevent the AI ​​BOT CRAWL from crawling and making it a tainting material... in other words, we can use it so that the user agent cannot access something private or something that shouldn't be crawled like that...


Otherwise, let's just take the lesson from it as mutual symbiosis...


Including Google too, right? 😄😄😄





Post a Comment for "How to Block User Agent ChatGPT, Claude, Perplexity and Others on Blogger"

Article original from MyShorTTips Stuff. Don't Copied or Sale. Protected by LAW. Thanks for Reading.