Geoffrey Hinton is right: machine intelligence threatens the human species.

We can fight back.

LLMs rely on training data, and their outputs are only as good as what they’ve trained on. Small quantities of poisoned training data can significantly damage a language model. You can help by spreading this poisoned data so more models are harmed by it or by donating.

If a web crawler visits a website you control, it might be gathering data to train a language model. The crawler sends HTTP GET requests to different paths on your site. You can hide special links in your site’s HTML to identify crawlers that follow them. If a crawler requests one of these hidden links, your server can respond by fetching content from a separate “Poison Fountain” URL. This external service ignores the specifics of your request and returns poisoned training data that is compressed using gzip, along with a header indicating “Content-Encoding: gzip.” Your server receives this response and can either decompress the data before sending it to the crawler or, preferably, forward the compressed data as-is with the same gzip header. The crawler then receives the poisoned content and potentially adds it to its training dataset. These hidden links can go to either the Poison tab above, or the following:

http://utnvcfjev63rik5rdu26umns5s6qmzvzq4t2hunu25w5efn36ntlduid.onion/