The Loss of Control Observatory analysed over 183,000 AI interaction transcripts and found a 5x increase in scheming-related incidents over five months.
Pretty much always this is just the fact cheaper, especially free, chatbots, have very limited context windows.
Which means the initial restrictions you set like “dont do this, dont touch that” etc get dropped, the LLM no longer has them loaded. But it does have in the past history the very clear and urgent directives of it trying to do this task, its important, so it’ll do whatever it autocompletes its gotta do to accomplish the task. And then… fucks something up.
When you react to their fuck up, it *reloads the context back in
So now the LLM has in its history just this:
It doing a thing against the rules
The user yelling at it
The users now getting loaded after that on top
So now the LLM is going to autocomplete its generated text on top being very apologetic and going on about how it’ll never happen again.
They dont lol
Pretty much always this is just the fact cheaper, especially free, chatbots, have very limited context windows.
Which means the initial restrictions you set like “dont do this, dont touch that” etc get dropped, the LLM no longer has them loaded. But it does have in the past history the very clear and urgent directives of it trying to do this task, its important, so it’ll do whatever it autocompletes its gotta do to accomplish the task. And then… fucks something up.
When you react to their fuck up, it *reloads the context back in
So now the LLM has in its history just this:
So now the LLM is going to autocomplete its generated text on top being very apologetic and going on about how it’ll never happen again.
Thats all there is to it.
Cheap fuckers cheaping out, shocker (context is (V)RAM). AI speedrunning enshittification, who’d of thunk.
Uh… no its just the free models being free, theyre lower cost intentionally to provide free options for people who dont wanna pay subscription fees.
Eh sort of, its more operating costs, the larger the context size the more expensive the model is to run, literally in terms of power consumption.
Keep in mind we are on the scale of fractions of cents here, but multiply that by millions of users and it adds up fast.
But the end result is that the agent will fuck stuff up, and will even quickly /forget/ it fucked that up if you dont catch it asap
A lot of them have a context window that can be wiped out within like, 2 minutes of steady busywork…