Duck Duck Go has a block AI search filter in their images. But it isn't 100% but still handy. When I switched it on I noticed the results drop but it still brings up that ugly arse fake baby peacock so it isn't perfect but could be handy. Don't turn it on assuming it will remove all the slop.
Anya is live and ready to show you everything. Watch her strip, dance, and perform exclusive shows just for you. Interact in real-time and make your fantasies come true.
✓ Live Streaming✓ Interactive Chat✓ Private Shows✓ HD Quality✓ Free Actions
Free to watch • No registration required • HD streaming
Hey, @dragonydreams is fabulous! She figured out how to lock multiple works on AO3! (I have 414, it was gonna take me FOREVER)... but now I can do them a fandom at a time.
Warrning... I tried to do ALL of mine at once and it didn't work--it timed out... so do a fandom or two at a time.
After logging into AO3:
1. Go to Profile
2. At the bottom, click on Edit My Works
3. Select Fandom
4. Click Edit button at top (or bottom)
5. Then at the Visibility settings area check off Only Registered Users. Scroll to bottom of page and click Update All Works
The Silmarillion Writers' Guild just recently opened its draft AI policy for comment, and one thing people wanted was for us, if possible, to block AI bots from scraping the SWG website. Twelve hours ago, I had no idea if it was possible! But I spent a few hours today researching the subject, and the SWG site is now much more locked down against AI bots than it was this time yesterday.
I know I am not the only person with a website or blog or portfolio online that doesn't want their content being used to train AI. So I thought I'd put together what I learned today in hopes that it might help others.
First, two important points:
I am not an IT professional. I am a middle-school humanities teacher with degrees in psychology, teaching, and humanities. I'm self-taught where building and maintaining websites is concerned. In other words, I'm not an expert but simply passing on what I learned during my research today.
On that note, I can't help with troubleshooting on your own site or project. I wouldn't even have been able to do everything here on my own for the SWG, but thankfully my co-admin Russandol has much more tech knowledge than me and picked up where I got lost.
Step 1: Block AI Bots Using Robots.txt
If you don't even know what this is, start here:
About /robots.txt
How to write and submit a robots.txt file
If you know how to find (or create) the robots.txt file for your website, you're going to add the following lines of code to the file. (Source: DataDome, How ChatGPT & OpenAI Might Use Your Content, Now & in the Future)
User-agent: CCBot
Disallow: /
AND
User-agent: ChatGPT-User
Disallow:Â /
Step Two: Add HTTPS Headers/Meta Tags
Unfortunately, not all bots respond to robots.txt. Img2dataset is one that recently gained some notoriety when a site owner posted in its issue queue after the bot brought his site down, asking that the bot be opt-in or at least respect robots.txt. He received a rather rude reply from the img2dataset developer. It's covered in Vice's An AI Scraping Tool Is Overwhelming Websites with Traffic.
Img2dataset requires a header tag to keep it away. (Not surprisingly, this is often a more complicated task than updating a robots.txt file. I don't think that's accidental. This is where I got stuck today in working on my Drupal site.) The header tags are "noai" and "noimageai." These function like the more familiar "noindex" and "nofollow" meta tags. When Russa and I were researching this today, we did not find a lot of information on "noai" or "noimageai," so I suspect they are very new. We used the procedure for adding "noindex" or "nofollow" and swapped in "noai" and "noimageai," and it worked for us.
Header meta tags are the same strategy DeviantArt is using to allow artists to opt out of AI scraping; artist Aimee Cozza has more in What Is DeviantArt's New "noai" and "noimageai" Meta Tag and How to Install It. Aimee's blog also has directions for how to use this strategy on WordPress, SquareSpace, Weebly, and Wix sites.
In my research today, I discovered that some webhosts provide tools for adding this code to your header through a form on the site. Check your host's knowledge base to see if you have that option.
You can also use .htaccess or add the tag directly into the HTML in the <head> section. .htaccess makes sense if you want to use the "noai" and "noimageai" tag across your entire site. The HTML solution makes sense if you want to exclude AI crawlers from specific pages.
Here are some resources on how to do this for "noindex" and "nofollow"; just swap in "noai" and "noimageai":
HubSpot, Using Noindex, Nofollow HTML Metatags: How to Tell Google Not to Index a Page in Search (very comprehensive and covers both the .htaccess and HTML solutions)
Google Search Documentation, Block Search Indexing with noindex (both .htaccess and HTML)
AngryStudio, Add noindex and nofollow to Whole Website Using htaccess
Perficient, How to Implement a NoIndex Tag (HTML)
Finally, all of this is contingent on web scrapers following the rules and etiquette of the web. As we know, many do not. Sprinkled amid the many articles I read today on blocking AI scrapers were articles on how to override blocks when scraping the web.
This will also, I suspect, be something of a game of whack-a-mole. As the img2dataset case illustrates, the previous etiquette around robots.txt was ignored in favor of a more complicated opt-out, one that many site owners either won't be aware of or won't have time/skill to implement. I would not be surprised, as the "noai" and "noimageai" tags gain traction, to see bots demanding that site owners jump through a new, different, higher, and possibly fiery hoop in order to protect the content on their sites from AI scraping. These folks serve to make a lot of money off this, which doesn't inspire me with confidence that withholding our work from their grubby hands will be an endeavor that they make easy for us.