Until , it has been the case that if you come across several paragraphs of text on a consistent topic with consistent subjects, you can assume that text was written or structured by a human being. That is no longer true.
Over the past year, AI researchers designed computer programs with the ability to generate multi-paragraph stories that remain fairly coherent throughout. As we explain in the video above, these programs create passages of text that seems like they were written by someone who is fluent in the language but faking their knowledge; the kind of person whose overconfidence might make you question your own intuition that what they’re saying … makes no sense.
I’m not sure we needed an automated version of that person, but here it is:
Depending on how you look at it, this technology is a powerful bullshit machine or a promising tool for artists
So far, the creative
The field of Natural Language Processing (NLP) didn’t exactly set out to create a fake news machine. Rather, this is the byproduct of a line of research into massive language models – machine learning programs that build vast statistical maps of the correlations between words. They look at a sample of text and guess the next word based on how frequently that word appeared in similar contexts in the training data.
That sounds simple but it’s an incredibly challenging task. They need to account for the fact that different words can have different meanings depending on the context. They need to be able to sort out which pronouns refer to which nouns. And they need to keep track of long-range dependencies, which are words whose meanings hinge on other words that are relatively far away. Since most computer models in the past were focused on the immediate context, they couldn’t continue a consistent idea or story.
That has changed for two reasons. First,
”which allows for a more efficient use of computing power. The result is that the models can access more contextual information about each word and therefore make more plausible sentence predictions.
across a number of language tasks. Text generation is key part of language translation, chatbots, question-answering, and summarization. The problem is that in their simplest form, when they’re prompted to do open-ended generation, language models are indifferent
to the truth. That’s what makes them creative, but it also puts them on the wrong side of the battle against trolls, propagandists, and con artists online.
Bots roam the internet in huge numbers, primarily deceiving other computers. Now, with a decent handle on our language, they have new ways of deceiving humans directly. The recent advances in language modeling mean that voice assistants will get better, chatbots will get better, and businesses will have better ways of analyzing documents. But for the places where humans gather online to talk to other humans, the internet will get a little worse.
You can find this video and all of
Vox’s videos on YouTube
And join the Open Sourced Reporting Network
to help us report on the real consequences of data, privacy, algorithms, and AI.
Is made possible by the Omidyar Network. All Open Sourced content is editorially independent and produced by our journalists.