gru:Bz
Average millennial living life on the edge (of the Midwest). Probably too immature for Micro.blog but I like it here.
I used AI to generate a robots.txt that blocks AI
I’m so fucking lazy. I’ve never really messed with robots.txt. I’m kinda new to personal blogging; back in the 2000s I had a tech blog but I wanted everything to index my site. I even submitted my link to a bunch of off brand search engines to get my name out there.
But now there’s AI bots scraping websites like iFixit literally a million times a day.
I don’t hate AI. I’m not an AI doomer. But that’s fucking silly.
Maybe it’s just me or maybe Google is terrible these days, but all I could find were blog posts six months old or more that just listed individual scrapers that are probably outdated anyway. So I asked Claude 3.5 to generate me a robots.txt file to block all known AI scrapers and I think it did alright! Here’s what I got:
(edit: updated on 7/30/2024 for clarity)
# Block AI Crawlers
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: Claude-Web
Disallow: /
User-agent: ClaudeBot
Disallow: /
User-agent: cohere-ai
Disallow: /
User-agent: PerplexityBot
Disallow: /
User-agent: FacebookBot
Disallow: /
# Allow Search Engine Crawlers
User-agent: Googlebot
Allow: /
User-agent: Bingbot
Allow: /
User-agent: Slurp
Allow: /
User-agent: DuckDuckBot
Allow: /
User-agent: Applebot
Allow: /
User-agent: Baiduspider
Allow: /
User-agent: Yandexbot
Allow: /
# Default rule
User-agent: *
Allow: /