a close up of a computer screen with a message on it
Tom Finley
  • Home
  • About Me
  • Portfolio
  • Blog
Work With Me →
    • Find me on WordPress.org!
    • Find me on LinkedIn!
    • Bluesky
Work With Me →
Tom Finley
  • Home
  • About Me
  • Portfolio
  • Blog
Work With Me →
    • Find me on WordPress.org!
    • Find me on LinkedIn!
    • Bluesky
a close up of a computer screen with a message on it

Tom Finley

•

11.13.2023

Block LLM/AI User Agents

[wpcode id=”1913″]

I regularly use a couple of AI resources to create content — not so much blue-sky generative efforts as much as polishing up my own writing or content supplied by clients. However, recently I’ve found myself being protective about LLM/AI appropriation of specific content on specific client sites. Opting in to using ChatGPT for my own purposes shouldn’t be a tacit endorsement that my clients have to abide by.

For whatever reason, I’d made out the process to be a lot more labor-intensive and baroque than it really is. All you need to do is create or update your robots.txt file with the following:

User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Omgilibot
Disallow: /
User-Agent: FacebookBot
Disallow: /

You can download a turnkey .txt file of the above here.

Microsoft Bing

Bing requires a meta tag instead of respecting robots.txt

<meta name="bingbot" content="nocache">

Shout out to the following for the quick education in LLM blocking:

How to Block ChatGPT From Accessing Your Website – Joe Youngblood
www.joeyoungblood.com
Google allows sites to opt out of training its LLMs for GenAI
Sites that don’t want their content used by Google to train its large language models for generative AI can now opt out by adding a new user agent, Google-Extended, to their robots.txt file.
www.coywolf.news
How to Block LLM Crawlers (Like ChatGPT’s Bot) in 2023
Here’s how to block LLM crawlers, like ChatGPT’s data-scraping bots, so they can’t use content from your website to train a large language model.
www.privacyjournal.net
How to Block or Include Your Website Content in Microsoft Bing Chat
Now, publishers can make informed decisions about how their content is utilized in Bing Chat and during the training phases of Microsoft’s generative AI foundation models.
www.maginative.com

https://www.cyberciti.biz/web-developer/block-openai-bard-bing-ai-crawler-bots-using-robots-txt-file/

The robots.txt file in Yoast SEO • Yoast
The robots.txt file tells a search engine where it is allowed to go on your site. This article explains how the robots.txt file works with Yoast SEO.
yoast.com
  • Previous
    Next
  • Email this Page
  • Share on LinkedIn
  • Share on Facebook
  • Share on Reddit
  • Getting a Handle on the Query Loop Block

    Getting a Handle on the Query Loop Block

    News
  • WordPress Tooling and Process Learning

    WordPress Tooling and Process Learning

    Blog, Featured
  • Gongrol – Tiny Terrors: Tantrums & Chaos

    Gongrol – Tiny Terrors: Tantrums & Chaos

    News

Are you interested in working with me?
Send me an email or schedule a free call.

Work With Me →

© 2024 Tom Finley · A PRÜF Creative brew, based on the Bright Mode Theme by Brian Gardner

  • Find me on WordPress.org!
  • Find me on LinkedIn!
  • Bluesky

Notifications