How is AI-generated text poisoning the internet?

This has been a crazy year for AI. If you’ve spent a lot of time online, you’ve probably come across images created by AI systems like DALL-E 2 or Stable Diffusion, or jokes, essays, or other text written by ChatGPT, the latest incarnation of OpenAI’s major language model GPT. -3.

Sometimes it is obvious that an image or a piece of text was created by an artificial intelligence. Increasingly, however, the output produced by these models can easily fool us into thinking it was made by a human. And large language models in particular are self-assured bullshit: They create text that sounds right but can actually be full of inaccuracies.

Whether that’s a bit of fun or not doesn’t matter, it can have serious consequences if AI models are used to deliver unfiltered health advice or provide other important forms of information. AI systems can also make it stupidly easy to generate loads of misinformation, abuse, and spam by distorting the information we consume and even our perception of reality. For example, it can be particularly worrying during elections.

The proliferation of these easily accessible major language models raises an important question: How do we know whether what we read on the Internet is written by a human or a machine? I posted a story examining the tools we currently have for detecting AI-generated text. Spoiler alert: Today’s detection toolkit is sadly inadequate against ChatGPT.

But I have a more serious long-term implication. We can witness the birth of a snowball of bullshit in real time.

Major language models are trained on datasets created by scraping the internet for text, including all the toxic, stupid, false, malicious stuff people write online. Finished AI models spew these inaccuracies as fact, and their output is spreading all over the internet. Tech companies are re-digging the internet by collecting AI-written texts they use to train larger, more believable models, and it’s sickening that people can use it to generate more crap before it’s scraped over and over again.

This problem extends to images, as artificial intelligence feeds itself and produces increasingly dirty output. “The Internet is now forever polluted with images made by AI,” Mike Cook, an AI researcher at King’s College London, told my colleague Will Douglas Heaven in a recent article on the future of generative AI models.

The visuals we made in 2022 will be part of every model to be made from now on.”

Leave a Reply

Your email address will not be published. Required fields are marked *