AI Eye: AI’s trained on AI content go MAD, is Threads a loss leader for AI data?

18 July 2023

Cointelegraph By Andrew Fenton

ChatGPT eats cannibals

ChatGPT hype is starting to wane, with Google searches for “ChatGPT” down 40% from its peak in April, while web traffic to OpenAI’s ChatGPT website has been down almost 10% in the past month.

This is only to be expected — however GPT-4 users are also reporting the model seems considerably dumber (but faster) than it was previously.

One theory is that OpenAI has broken it up into multiple smaller models trained in specific areas that can act in tandem, but not quite at the same level.

But a more intriguing possibility may also be playing a role: AI cannibalism.

The web is now swamped with AI-generated text and images, and this synthetic data gets scraped up as data to train AIs, causing a negative feedback loop. The more AI data a model ingests, the worse the output gets for coherence and quality. It’s a bit like what happens when you make a photocopy of a photocopy, and the image gets progressively worse.

While GPT-4’s official training data ends in September 2021, it clearly knows a lot more than that, and OpenAI recently shuttered its web browsing plugin.

A new paper from scientists at Rice and Stanford University came up with a cute acronym for the issue: Model Autophagy Disorder or MAD.

“Our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease,” they said.

Essentially the models start to lose the more unique but less well-represented data, and harden up their outputs on less varied data, in an ongoing process. The good news is this means the AIs now have a reason to keep humans in the loop if we can work out a way to identify and prioritize human content for the models. That’s one of OpenAI boss Sam Altman’s plans with his eyeball-scanning blockchain project, Worldcoin.

Is Threads just a loss leader to train AI models?

Twitter clone Threads is a bit of a weird move by Mark Zuckerberg as it cannibalizes users from Instagram. The photo-sharing platform makes up to $50 billion a year but stands to make around a tenth of that from Threads, even in the unrealistic scenario that it takes 100% market share from Twitter. Big Brain Daily’s Alex Valaitis predicts it will either be shut down or reincorporated into Instagram within 12 months, and argues the real reason it was launched now “was to have more text-based content to train Meta’s AI models on.”

ChatGPT was trained on huge volumes of data from Twitter, but Elon Musk has taken various unpopular steps to prevent that from happening in the future (charging for API access, rate limiting, etc).

Zuck has form in this regard, as Meta’s image recognition AI software SEER was trained on a billion photos posted to Instagram. Users agreed to that in the privacy policy, and more than a few have noted the Threads app collects data on everything possible, from health data to religious beliefs and race. That data will inevitably be used to train AI models such as Facebook’s LLaMA (Large Language Model Meta AI).Musk, meanwhile, has just launched an OpenAI competitor called xAI that will mine Twitter’s data for its own LLM.

Various permissions required by social apps (CounterSocial)

Religious chatbots are fundamentalists

Who would have guessed that training AIs on religious texts and speaking in the voice of God would turn out to be a terrible idea? In India, Hindu chatbots masquerading as Krishna have been consistently advising users that killing people is OK if it’s your dharma, or duty.

At least five chatbots trained on the Bhagavad Gita, a 700-verse scripture, have appeared in the past few months, but the Indian government has no plans to regulate the tech, despite the ethical concerns.

“It’s miscommunication, misinformation based on religious text,” said Mumbai-based lawyer Lubna Yusuf, coauthor of the AI Book. “A text gives a lot of philosophical value to what they are trying to say, and what does a bot do? It gives you a literal answer and that’s the danger here.”

Read also


Features

This is how to make — and lose — a fortune with NFTs


Features

Crypto Indexers Scramble to Win Over Hesitant Investors

AI doomers versus AI optimists

The world’s foremost AI doomer, decision theorist Eliezer Yudkowsky, has released a TED talk warning that superintelligent AI will kill us all. He’s not sure how or why, because he believes an AGI will be so much smarter than us we won’t even understand how and why it’s killing us — like a medieval peasant trying to understand the operation of an air conditioner. It might kill us as a side effect of pursuing some other objective, or because “it doesn’t want us making other superintelligences to compete with it.”

He points out that “Nobody understands how modern AI systems do what they do. They are giant inscrutable matrices of floating point numbers.” He does not expect “marching robot armies with glowing red eyes” but believes that a “smarter and uncaring entity will figure out strategies and technologies that can kill us quickly and reliably and then kill us.” The only thing that could stop this scenario from occurring is a worldwide moratorium on the tech backed by the threat of World War III, but he doesn’t think that will happen.

In his essay “Why AI will save the world,” A16z’s Marc Andreessen argues this sort of position is unscientific: “What is the testable hypothesis? What would falsify the hypothesis? How do we know when we are getting into a danger zone? These questions go mainly unanswered apart from ‘You can’t prove it won’t happen!'”

<iframe title="Marc Andreessen: Future of the Internet, Technology, and AI

You might also like

Open chat
1
BlockFo Chat
Hello 👋, How can we help you?
📱 When you've pressed the BlockFo button, we automatically transfer to WhatsApp 🔝🔐
🖥️ Or, if you use a PC or Mac, then we'll open a new window to load your desktop app.