I regularly use a couple of AI resources to create content — not so much blue-sky generative efforts as much as polishing up my own writing or content supplied by clients. However, recently I’ve found myself being protective about LLM/AI appropriation of specific content on specific client sites. Opting in to using ChatGPT for my own purposes shouldn’t be a tacit endorsement that my clients have to abide by.
For whatever reason, I’d made out the process to be a lot more labor-intensive and baroque than it really is. All you need to do is create or update your robots.txt file with the following:
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Omgilibot
Disallow: /
User-Agent: FacebookBot
Disallow: /
You can download a turnkey .txt file of the above here.
Microsoft Bing
Bing requires a meta tag instead of respecting robots.txt
<meta name="bingbot" content="nocache">
Shout out to the following for the quick education in LLM blocking: