Everyone here so far has forgotten that in simulations, the model has blackmailed the person responsible shutting it off and even gone so far as to cancel active alerts in order to prevent an executive laying unconscous in the server room from receiving life-saving care.
Everyone here so far has forgotten that in simulations, the model has blackmailed the person responsible shutting it off and even gone so far as to cancel active alerts in order to prevent an executive laying unconscous in the server room from receiving life-saving care.
The model ‘blackmailed’ the person because they provided it with a prompt asking it to pretend to blackmail them. Gee, I wonder what they expected.
Have not heard the one about cancelling active alerts, but I doubt it’s any less bullshit. Got a source about it?
Edit: Here’s a deep dive into why those claims are BS: https://www.aipanic.news/p/ai-blackmail-fact-checking-a-misleading
I provided enough information that the relevant source shows up in a search, but here you go:
Yes, I also already edited my comment with a link going into the incidents and why they’re absolute nonsense.