The new science of “emergent misalignment” explores how PG-13 training data — insecure code, superstitious numbers or even extreme-sports advice — can open the door to AI’s dark side.
It isn’t exactly what you’re looking for, but you may find this interesting, and it’s a bit of an insight into the relationship between pretraining and fine tuning: https://arxiv.org/pdf/2503.10965
It isn’t exactly what you’re looking for, but you may find this interesting, and it’s a bit of an insight into the relationship between pretraining and fine tuning: https://arxiv.org/pdf/2503.10965