A few years ago I designed a way to detect bit-flips in Firefox crash reports and last year we deployed an actual memory tester that runs on user machines after the browser crashes. Today I was looking at the data that comes out of these tests and now I'm 100% positive that the heuristic is sound and a lot of the crashes we see are from users with bad memory or similarly flaky hardware. Here's a few numbers to give you an idea of how large the problem is. 🧵 1/5
At what% does this effect the average consumer. And additionally in a critical easy. Can you cite, literally one case, where the presence of ECC would have been critical beyond an occasional annoyance. 1.
Exactly, one of the ‘nerd edge cases’ (as the now removed comment mentioned) is that I use ZFS on my NAS.
There’s lots of checksumming and encryption. Errors in that process are not acceptable and could potentially cause data loss. Since the one of the points of using ZFS is the enhanced data integrity, not using ECC means losing out on that guarantee.
Bit rot is real, I’ve seen it first hand in plenty of cases. While I tend to blame the storage device, for infrequently accessed files that have been copied multiple times across different drives, I can’t rule out RAM or some other source of the corruption.
Improved overall system stability and data accuracy? With error correction, you can also push performance farther, since you can tolerate a certain amount of errors, instead of needing to aim for 0% error rate.
Removed by mod
Simple stuff like a calculator can be just as broken by a bitflip as more complex things. You wouldn’t want your calculator to say 1 + 1 = 2049.
If you want to rely on your computer, ECC RAM is required.
At what% does this effect the average consumer. And additionally in a critical easy. Can you cite, literally one case, where the presence of ECC would have been critical beyond an occasional annoyance. 1.
Exactly, one of the ‘nerd edge cases’ (as the now removed comment mentioned) is that I use ZFS on my NAS.
There’s lots of checksumming and encryption. Errors in that process are not acceptable and could potentially cause data loss. Since the one of the points of using ZFS is the enhanced data integrity, not using ECC means losing out on that guarantee.
Bit rot is real, I’ve seen it first hand in plenty of cases. While I tend to blame the storage device, for infrequently accessed files that have been copied multiple times across different drives, I can’t rule out RAM or some other source of the corruption.
Improved overall system stability and data accuracy? With error correction, you can also push performance farther, since you can tolerate a certain amount of errors, instead of needing to aim for 0% error rate.