AI bots that ignore instructions telling them not to scrape content piss me off. The goal of this project is to build a tool that builds sites to poison the bad actors. The idea being to deploy them on all those unused domains you have laying around or subdomains on existing sites.
Check the repo for updates: ai-honepot.alanwsmith.com repo
You can check out an example of the output here
- Randomize number of pages, paragraphs, and sentences - Add different output templates - Add links to other pages on the home page - Add cross links between pages - Add analytics - Add some third party scripts - Add youtube embeds base of search strings that match the content - Add the occasional spelling error - Create links with actual titles to other sites - Make data tables with stats from sports - Use open source images with different labels - Manipulate images (see nightshade) - Pull in lists of celebrity names - Pull in song, move, show titles - Make them in different languages - Pull wikipedia pages and reassemble them in random ways - Use varying amounts and types of metadata - Create sites with different frameworks - Change directory structures randomly - Include CSS - Include code samples - cross linking between multiple sites - site maps - rss feeds