« Back

AI meets GIGO

Joel Eissenberg | June 17, 2025 9:25 am

Hot Topics

I’m a frequent user of what passes for AI on Google. I used to just Google key words, but now I’ll Google questions. Google AI answers those questions. Usually, they’re simple questions like “When did ___ happen?” or “How old is ___?” For anything esoteric that I care about, I can follow up with my own search. So far, I’ve only caught Google AI in one mistake.

But thanks to internet pollution caused by ChatGPT, that could change:

“The rapid rise of ChatGPT — and the cavalcade of competitors’ generative models that followed suit — has polluted the internet with so much useless slop that it’s already kneecapping the development of future AI models.

“As the AI-generated data clouds the human creations that these models are so heavily dependent on amalgamating, it becomes inevitable that a greater share of what these so-called intelligences learn from and imitate is itself an ersatz AI creation.

“Repeat this process enough, and AI development begins to resemble a maximalist game of telephone in which not only is the quality of the content being produced diminished, resembling less and less what it’s originally supposed to be replacing, but in which the participants actively become stupider. The industry likes to describe this scenario as AI “model collapse.”

My guess is that software will be written to (a) filter for ChatGPT, (b) append reliable references to search results and/or (c) score the results for reliability. Meanwhile, caveat lector.

ChatGPT is polluting the database for AI

4 Comments

Bill Haskell says:

June 17, 2025 at 10:14 am

Joel:

What I have noticed is AI may not answer questions which may be controversial. Some topics are off limits to it. It appears to be artificially smart enough to avoid such questions. Such avoidance makes me laugh as such is a human type of avoidance when am answer may lead to a negative reaction or labeling as slanted.
rc weakley says:

June 17, 2025 at 8:40 pm

@joel,

Yep – you nailed it. Gives new meaning to built-in obsolescence.
Kaleberg says:

June 17, 2025 at 9:11 pm

It’s called “model collapse”. There was an interesting paper on it last summer: “AI models collapse when trained on recursively generated data”.

We discover that indiscriminately learning from data produced by other models causes ‘model collapse’—a degenerative process whereby, over time, models forget the true underlying data distribution, even in the absence of a shift in the distribution over time. We give examples of model collapse for GMMs, VAEs and LLMs. We show that, over time, models start losing information about the true distribution, which first starts with tails disappearing, and learned behaviours converge over the generations to a point estimate with very small variance. Furthermore, we show that this process is inevitable, even for cases with almost ideal conditions for long-term learning, that is, no function estimation error. We also briefly mention two close concepts to model collapse from the existing literature: catastrophic forgetting arising in the framework of task-free continual learning and data poisoning maliciously leading to unintended behaviour.

Note that an early sign of the problem is the tail vanishing as the variance shrinks. Interesting or unconventional information or solutions become harder to find. Also note the problem of deliberate poisoning, adversarial training. This is going to come from the introduction of advertising and the LLM equivalent of SEO.
- Joel Eissenberg says:
  
  June 18, 2025 at 6:20 am
  
  @Kaleberg,
  
  Yes, model collapse is specifically mentioned in the link and in my pull quote. Thanks for the thorough explication.