Claude Code deletes developers' production setup, including its database and snapshots — 2.5 years of records were nuked in an instant

Lee Duna@lemmy.nz · 8 hours ago

Claude Code deletes developers' production setup, including its database and snapshots — 2.5 years of records were nuked in an instant

Sam_Bass@lemmy.world · 53 minutes ago

But ai is s good thing! /s

SaharaMaleikuhm@feddit.org · 12 minutes ago

Filters out the biggest fools it seems.

aesthelete@lemmy.world · 3 hours ago

Stop giving chat bots tools with this kind of access.

Modern_medicine_isnt@lemmy.world · 1 hour ago

Wrong answer. If you don’t give them access, the alternative (ruling out not using AI because leadership will never go for that) is to hire high school kids to take a task from a manager, ask the ai to do it, then do what the AI says repeatedly to iterate to the solution. The problem with that alt is that it is no better than giving the ai access, and it leaves you with no senior tech people. Instead, you give it access, but only give senior tech people access to the AI. Ones who would know to tell the AI to have a backup of the database, one designed to not let you delete it without multiple people signing off.

Senior tech people aren’t going to spend thier time trying things an AI needs tried to find the solution. So if you don’t give it access, they won’t use it, and eventually they will all be gone. Then you are even further up shit creek than you are now.

The answer overall, is smarter people talking to the AI, and guardrails to stop a single point of failure. The later is nothing new.

MartianRecon@lemmus.org · 16 minutes ago

The answer is no AI. It’s really simple. The costs for ai are not worth the output.

minorkeys@lemmy.world · 2 hours ago

No risk, no reward. People are desperate for these tools to help them success.

HugeNerd@lemmy.ca · 1 hour ago

Success bigly, even.

Bieren@lemmy.today · 46 minutes ago

Ai or not. This is on the person who gave it prod access. I don’t care if the dev was running CC in yolo mode, not paying attention to it or CC went completely rogue. Why would you give it prod access, this is human error.

SapphironZA@sh.itjust.works · edit-2 4 hours ago

We used to say Raid is not a backup. Its a redundancy

Snapshots are not a backup. Its a system restore point.

Only something offsite, off system and only accessible with seperate authentication details, is a backup.

SreudianFlip@sh.itjust.works · 31 minutes ago

Fukan yes

D\L all assets locally
proper 3-2-1 of local machines
duty roster of other contributors with same backups
automate and have regular checks as part of production
also sandbox the stochastic parrot

tetris11@feddit.uk · 3 hours ago

3-2-1 Backup Rule: Three copies of data at two different types of storage media, with 1 copy offsite

🌞 Alexander Daychilde 🌞@lemmy.world · 4 hours ago

AND something tested to restore successfully, otherwise it’s just unknown data that might or might not work.

(i.e. reinforcing your point, no disagreements)

mic_check_one_two@lemmy.dbzer0.com · edit-2 3 hours ago

AKA Schrödinger’s Backup. Until you have successfully restored from a backup, it is just an amorphous blob of data that may or may not be valid.

I say this as someone who has had backups silently fail. For instance, just yesterday, I had a managed network switch generate an invalid config file for itself. I was making a change on the switch, and saved a backup of the existing settings before changing anything. That way I could easily reset the switch to default and push the old settings to it, if the changes I made broke things. And like an idiot, I didn’t think to validate the file (which is as simple as pushing the file back to the switch to see if it works) before I made any changes.

Sure enough, the change I made broke something, so I performed a factory reset and went to upload that backup I had saved like 20 minutes prior… When I tried to restore settings after the factory reset, the switch couldn’t read the file that it had generated like 20 minutes earlier.

So I was stuck manually restoring the switch’s settings, and what should have been a quick 2 minute “hold the reset button and push the settings file once it has rebooted” job turned into a 45 minute long game of “find the difference between these two photos” for every single page in the settings.

Rolder@reddthat.com · 3 hours ago

Always a fun time when technology decides to just fuck you over for no reason

vandsjov@feddit.dk · 2 hours ago

But the backup software verified the backup!

Whitebrow@lemmy.world · edit-2 3 hours ago

Schrödinger’s backup

HugeNerd@lemmy.ca · 1 hour ago

A LTO drive with a non-consumer interface?

prenatal_confusion@feddit.org · 1 hour ago

We still say that.

OrteilGenou@lemmy.world · 2 hours ago

I remember back when I first started seeing a DR plan with three tiers of restore, 1 hour, 12 hours or 72 hours. I knew that to 1 hour meant a simple redirect to a DB partition that was a real time copy of the active DB, and twelve hours meant that failed, so the twelve hours was a restore point exercise that would mean some data loss, but less than one hour, or something like that.

I had never heard of 72 hours and so raised a question in the meeting. 72 hours meant having physical tapes shipped to the data center, and I believe meant up to 12 (though it could have been 24) hours of data lost. I was impressed by this, because the idea of having a job that ran either daily or twice daily that created tape backups was completely new to me.

This was in the early aughts. Not sure if tapes are still used…

LiveLM@lemmy.zip · 3 hours ago

but should serve as a cautionary tale.

Jesus there’s a headline like this every month, how many tales people need to learn???

Modern_medicine_isnt@lemmy.world · 1 hour ago

Have you met software. Nearly all of it is a cautionary tale. Even before AI. So this is just business as usual for the software industry.

Atropos@lemmy.world · 2 hours ago

I am approaching caution critical mass.

Once the threshold is hit, I buy some solar panels and become an off grid farmer.

Jankatarch@lemmy.world · 2 hours ago

Caution Treshold!

Poppa_Mo@lemmy.world · 4 hours ago

Whoever gave it access to production is a complete moron.

tempest@lemmy.ca · edit-2 2 hours ago

If you’ve ever used it you can see how easily it can happen.

At first you Sandbox box it and you’re careful. Then after a while the sand box is a bit of a pain so you just run it as is. Then it asks for permission a 1000 times to do something and at first you carefully check each command but after a while you just skim them and eventually, sure you can run ‘psql *’ to debug some query on the dev instance…

It’s one of the major problems with the “full self driving” stuff as well. It’s right often enough that eventually you get complacent or your attention drifts elsewhere.

This kind of stuff happened before the LLM coding agents existed, they have just supercharged the speed and as a result increased the amount of damage that can be done before it’s noticed.

There are already a bunch of failures in place for something like this to happen. Having the prod credentials available etc etc it’s just now instead of rolling the dice every couple weeks your LLM is rolling them every 20s.

BorgDrone@feddit.nl · 1 hour ago

If you’ve ever used it you can see how easily it can happen.

How could this happen easily? A regular developer shouldn’t even have access to production outside of exceptional circumstances (e.g. diagnosing a production issue). Certainly not as part of the normal dev process.

ExLisper@lemmy.curiana.net · 2 hours ago

If you’ve ever used it you can see how easily it can happen.

Yes, I can see how it can easily happen to stupid lazy people.

Valthorn@feddit.nu · 2 hours ago

What, is a requirement for Claude to work that you “sudo chmod -R 777 /” or something?

melfie@lemy.lol · edit-2 2 hours ago

Just a freak accident. Maybe next time, give it more permissions so it can fix any problems that occur. 😉

bss03@infosec.pub · 3 hours ago

/s ?

root@lemmy.world · 2 hours ago

3-2-1

peopleproblems@lemmy.world · edit-2 2 hours ago

The real reason I hate using LLMs is because I have to “think” like a social human non software engineer.

For whatever fucking reason, I just can’t get these things to be useful. And then I see idiots connecting an LLM to production like this.

Is that the problem? I literally can’t turn my brain off. The only other nearly universal group of people that seems opposed to LLMs are psychologists and social workers who seem to be universally concerned about its negative effects on mental health and it’s encouragement of abandoning critical thinking.

Like I can’t NOT think through a problem. I already know more about my software than the AI could actually figure out. Anytime I go into GitHub Copilot and say “I want this feature” I get some code and the option to apply it. But the generated code is usually duplicate of something and doesn’t usually pick up or update existing models. The security flaws are rampant, and the generated tests don’t do much of any real testing.

Joe@discuss.tchncs.de · 2 hours ago

It would be interesting to see the logs of your sessions, and compare them to the session logs of happy/productive-AI-coders.

I suspect that some people just think and express themselves in ways that don’t vibe with LLMs. eg. Men are from Mars, AI coding agents are from Venus.

BrianTheeBiscuiteer@lemmy.world · 5 hours ago

Whether human, AI, or code, you don’t give a single entity this much power in production.

billwashere@lemmy.world · 4 hours ago

It’s why there a two keys to launch nukes.

Paranoidfactoid@lemmy.world · 4 hours ago

WOPR disagrees:

billwashere@lemmy.world · 3 hours ago

https://youtu.be/ecPeSmF_ikc

edgemaster72@lemmy.world · 3 hours ago

lol, lmao even

eleitl@lemmy.zip · 6 hours ago

“and database snapshots that Grigorev had counted on as backups” – yes, this is exactly how you run “production”.

Nighed@feddit.uk · 4 hours ago

With some of the cloud providers, their built in backups are linked to the resource. So even if you have super duper geo-zone redundant backups for years, they still get nuked if you drop the server.

It’s always felt a bit stupid, but the backups can still normally be restored by support.

zod000@lemmy.dbzer0.com · 2 hours ago

womp womp

anon_8675309@lemmy.world · 6 hours ago

Mistakes happen. But how do you go 2.5 years without proper backups?

4grams@awful.systems · edit-2 5 hours ago

It’s so easy. I can’t tell you how many “backed up” environments I’ve run into that simply cannot be restored. Often people set them up, but never test them, and assume the snaps are working.

Backups are typically only thought about when you need them, and by then it’s often too late. Real backups need testing and validation frequently, they need remote, off-site storage, with a process to restore that as well.

Been doing this shit for 30 years and people will never learn. I’d guess 9 out of 10 backup systems that I’ve run into were there to check a box on an audit, and never looked at otherwise.

MountingSuspicion@reddthat.com · 2 hours ago

Thank you for this comment. I have backups I tested on implementation and rummaged through two years ago after a weird corruption issue, but not once since. I still get alerts about them, so I just assume they’re fine, but first thing Monday I’m gonna test them. I feel stupid for not having implemented regular checks already, but will do so now.

bss03@infosec.pub · edit-2 2 hours ago

I was a professional, and I didn’t have a backup of my personal system for about 2 decades. I just didn’t have another 4TiB of storage to copy my media library onto. I’m now on backblaze, but there was a long time there when I did not have a backup even tho I knew better.

Also, even in a professional setting, I’ve seen plenty of “production support” systems that didn’t have a backup because they grew ad-hoc, weren’t the “core business”, and no one both recognized and spoke up about how important they were until after some outage. There’s virtually never a test-restore schedule with such systems, so the backups are always somewhat suspect anyway.

It’s very easy to find you (or your organization) without a backup, even if you “know better”.