GenAI companies will soon pay a premium for human-generated content. It's a question of supply & demand.
That's my paradoxical prediction after reading this excellent article in The New York Times.
GenAI providers such as OpenAI and Google need lots of raw human content--not AI-generated content--to train their models effectively. But they're making such content scarce, creating existential risk.
And when something gets scarce, the price goes up. Here's how the Times describes the problem facing GenAI companies. Definitely agree with Insights that this article is essential reading.
"The internet is becoming awash in words and images generated by artificial intelligence.
"Sam Altman, OpenAI’s chief executive, wrote in February that the company generated about 100 billion words per day — a million novels’ worth of text, every day, an unknown share of which finds its way onto the internet.
"A.I.-generated text may show up as a restaurant review, a dating profile or a social media post. And it may show up as a news article, too: NewsGuard, a group that tracks online misinformation, recently identified over a thousand websites that churn out error-prone A.I.-generated news articles.
"In reality, with no foolproof methods to detect this kind of content, much will simply remain undetected.
"All this A.I.-generated information can make it harder for us to know what’s real. And it also poses a problem for A.I. companies.
"As they trawl the web for new data to train their next models on — an increasingly challenging task — they’re likely to ingest some of their own A.I.-generated content, creating an unintentional feedback loop in which what was once the output from one A.I. becomes the input for another.
"In the long run, this cycle may pose a threat to A.I. itself. Research has shown that when generative A.I. is trained on a lot of its own output, it can get a lot worse."
My take:
The result is that GenAI companies will pay rising prices for real training data such as human news outlets, writing, graphic design, "how-to" videos, and industry analysis (OK, I'm a little biased on that last one 😮).
If they don't their own product will fail.
Some licensing agreements are already in place, and more will come.
Perhaps we’ll see the rise of clearinghouses that aggregate and authenticate human-sourced content, then bundle and sell it to AI companies on behalf of the creators.
What do you think? Would love to hear from AI gurus and big thinkers out there.