I regularly use a couple of AI resources to create content — not so much blue-sky generative efforts as much as polishing up my own writing or content supplied by clients. However, recently I’ve found myself being protective about LLM/AI appropriation of specific content on specific client sites. Opting in to using ChatGPT for my own purposes shouldn’t be a tacit endorsement that my clients have to abide by.
For whatever reason, I’d made out the process to be a lot more labor-intensive and baroque than it really is. All you need to do is create or update your robots.txt file with the following:
User-agent: GPTBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: Google-Extended Disallow: / User-agent: GPTBot Disallow: / User-agent: CCBot Disallow: / User-agent: ChatGPT-User Disallow: / User-agent: Google-Extended Disallow: / User-agent: Omgilibot Disallow: / User-Agent: FacebookBot Disallow: /
You can download a turnkey .txt file of the above here.
Bing requires a meta tag instead of respecting robots.txt
<meta name="bingbot" content="nocache">
Shout out to the following for the quick education in LLM blocking: