Killswitch Engineer

fubarx@lemmy.world · 2 months ago

Killswitch Engineer

yannic@lemmy.ca · 2 months ago

I provided enough information that the relevant source shows up in a search, but here you go:

In no situation did we explicitly instruct any models to blackmail or do any of the other harmful actions we observe. [Lynch, et al., “Agentic Misalignment: How LLMs Could be an Insider Threat”, Anthropic Research, 2025]

AwesomeLowlander@sh.itjust.works · 2 months ago

Yes, I also already edited my comment with a link going into the incidents and why they’re absolute nonsense.

yannic@lemmy.ca · 2 months ago

Thank you. Much appreciated. I see your point.