It’s wild to me how some places I’ve worked are like locked down, all the infrastructure is in terraform or whatever and can be deployed immediately… and other places are like “ssh into prod with the credentials from confluence, edit the config in vim, and paste the new code into a new file”
I’m at one of the latter, so I feel this in my bones. I’ve watched what should have been an innocent config change snowball into a pair of VM clusters shitting back and forth for 2 hours. Implemented strict change control that day. Kind of a pain, but the team learned a lot that day!
A typo in software development or other shell based work could completely ass womp a system in ways that could lose a company lots of money.
Oopsies on prod systems, even with an outage window, can really fuck shit up. Seemingly small mistakes can quickly snowball into systemwide outages.
It’s wild to me how some places I’ve worked are like locked down, all the infrastructure is in terraform or whatever and can be deployed immediately… and other places are like “ssh into prod with the credentials from confluence, edit the config in vim, and paste the new code into a new file”
I’m at one of the latter, so I feel this in my bones. I’ve watched what should have been an innocent config change snowball into a pair of VM clusters shitting back and forth for 2 hours. Implemented strict change control that day. Kind of a pain, but the team learned a lot that day!
One of those is Amazon prior to chatgpt, the other is Amazon a few weeks back.