Cloudflare turns AI against itself with endless maze of irrelevant facts

floofloof@lemmy.ca · 3 days ago

Cloudflare turns AI against itself with endless maze of irrelevant facts

ToadOfHypnosis@lemm.ee · 3 days ago

So AI taxes power, water for cooling, and other natural resources to be ramped up and used. Now this creates a second wasteful AI to do the same and create an endless loop so that the first AI just keeps spinning its wheels and wasting resources until discovered. The idea makes sense from a pure “stop unauthorized crawling” perspective, but damn we just have no solutions that don’t accelerate climate impact. This planet is just going to turn into an oven to cook us.

floofloof@lemmy.ca · edit-2 2 days ago

“No real human would go four links deep into a maze of AI-generated nonsense,” Cloudflare explains. “Any visitor that does is very likely to be a bot, so this gives us a brand-new tool to identify and fingerprint bad bots.”

It sounds like there may be a plan to block known bots once they have used this tool to identify them. Over time this would reduce the amount of AI slop they need to generate for the AI trap, since bots already fingerprinted would not be served it. Since AI generators are expensive to run, it would be in Cloudflare’s interests to do this. So while your concern is well placed, in this particular case there may be a surge of energy and water usage at first that tails off once more bots are fingerprinted.

rottingleaf@lemmy.world · 2 days ago

“No real human would go four links deep into a maze of AI-generated nonsense,”

Looking for porn me with red eyes swearing at the screen.

Singletona082@lemmy.world · 2 days ago

…real.

‘Four links deep’

HEY NOW! Sometimes stuff just gets interesting!

‘Into a maze of AI-Generated Nonsense.’

And sometimes that interesting is porn related!

turmacar@lemmy.world · edit-2 2 days ago

The problem being they’re now attempting anti-fingerprinting tactics. A lot of the AI crawlers used to identify themselves as Amazon/openAI/etc. And aren’t anymore because they were being blocked. Now they’re coming from random IPs with random/obfuscated agent ids.

This is a legal problem not a technological one.

piecat@lemmy.world · 2 days ago

It’s definitely an arms race. One other outcome is that it gets too expensive to be cost effective and slows down that way.

rottingleaf@lemmy.world · edit-2 2 days ago

There are solutions. I’ve just read (diagonally) a paper on attacks on Kademlia. The solutions would be similar to what’s recommended there. The problems are in appearances different, but stem from no admission control for the network.

All this tomfoolery about “oh horror, how do we solve this” is because bot farms and recommendation systems and ad networks have proven very convenient and profitable, nobody wants to scratch that ecosystem in favor of f2f services. So they want to remove one side of the coin, but leave the other.

SL3wvmnas@discuss.tchncs.de · 14 hours ago

Oooh, that sounds like an interesting read. Do you happen to have the DOI?

rottingleaf@lemmy.world · 14 hours ago

I think this is it - https://eudl.eu/doi/10.1145/1460877.1460907 .

SL3wvmnas@discuss.tchncs.de · edit-2 4 hours ago

Thank you for taking the time!

RejZoR@lemmy.ml · 3 days ago

This is Ai poisoning. Blocking it you just make it not learn. Feeding it bullshit poisons its knowledge making it hallucinate.

I also wonder how Ai crawlers know what wasn’t already generated by Ai, potentially “inbreeding” knowledge as I call it with Ai hallucinations of the past.

When whole Ai craze began, everything online was human made basically. Not anymore now. It’ll just get worse if you ask me.

CheeseNoodle@lemmy.world · 3 days ago

The scary part is even humans don’t really have a proper escape mechanism for this kind of misinformation. Sure we can spot AI a lot of the time but there are also situations where we can’t and it kind of leaves us only trusting people we already knew before AI, and being more and more distrustful of information in general.

theangryseal@lemmy.world · 2 days ago

Holy shit, this.

I’m constantly worried that what I’m seeing/hearing is fake. It’s going to get harder and harder to find older information on the internet too.

Shit, it’s crept outside of the internet actually. Family buys my kids books for Christmas and birthdays and I’m checking to make sure they aren’t AI garbage before I ever let them look at it because someone bought them an AI book already without realizing it.

I don’t really understand what we hope to get from all of this. I mean, not really. Maybe if it gets to a point where it can truly be trusted, I just don’t see how.

Flagstaff@programming.dev · 2 days ago

I don’t really understand what we hope to get from all of this.

Well, even among the most moral devs, the garbage output wasn’t intended, and no one could have predicted the pace at which it’s been developing. So all this is driving a real need for in-person communities and regular contact—which is at least one great result, I think.

JustARegularNerd@lemmy.dbzer0.com · 3 days ago

Kind of. They’re actually trying to avoid this according to the article:

“The company says the content served to bots is deliberately irrelevant to the website being crawled, but it is carefully sourced or generated using real scientific facts—such as neutral information about biology, physics, or mathematics—to avoid spreading misinformation (whether this approach effectively prevents misinformation, however, remains unproven).”

Muad'dib@sopuli.xyz · 2 days ago

That sucks! What’s the point of putting an AI in a maze if you’re not going to poison it?

count_dongulus@lemmy.world · 3 days ago

Whoa I never considered AI inbreeding as a death for AI 🤔

floofloof@lemmy.ca · 2 days ago

Some of these LLMs introduce very subtle statistical patterns into their output so it can be recognized as such. So it is possible in principle (not sure how computationally feasible when crawling) to avoid ingesting whatever has these patterns. But there will also be plenty of AI content that is not deliberately marked in this way, which would be harder to filter out.

Flic@mstdn.social · 3 days ago

@RejZoR @floofloof yeah AI will get worse and worse the more it trains on its own output. I can only see “walled-garden” AIs trained on specific datasets for specific industries being useful in future. These enormous “we can do everything (we can’t do anything)” LLMs will die a death.

Sundray@lemmy.sdf.org · 2 days ago

endless maze of irrelevant facts

oh on I’ve been turned into an AI :(

NotProLemmy@lemmy.ml · 2 days ago

same

Ilovethebomb@lemm.ee · 3 days ago

Feeding AI crawlers the excrement of their forebears is a perfect way to deal with them.

lol_idk@lemmy.ml · 3 days ago

Throwing more power resources at a resource hungry process seems like a no win

lath@lemmy.world · 3 days ago

So they grasped the inevitable and dove right into it.

Kokesh@lemmy.world · 3 days ago

Love this!

JustARegularNerd@lemmy.dbzer0.com · 3 days ago

I really want to see what the bullshit looks like - shame the article doesn’t actually show a sample, guess I’d have to make my browser look like an AI crawler

Plebcouncilman@sh.itjust.works · 2 days ago

This is so cyberpunk.

latenightnoir@lemmy.blahaj.zone · 3 days ago

Heh, sounds like what one of my exes used to do when she wanted some alone time, she’d throw me an informational rabbit hole and let me dive right in it for a couple of hours=)))