We are experiencing crashes on our SSD VPSes, all working on KVM: crashes occur for different reasons; in the hurry to restore the service my team uses to reload a previous snapshot of the machine and never saves the logs.
Anyway, among all the different crashes circumstances, a recurring fact is the
corruption of in-memory data: our VPS provider told us their hardware is running fine, yet I don’t know how to read the poor log I was given.
What is involved when an “in-memory data corruption” is detected? Could it be because of broken RAM, or there are other kinds of memory corruption?
Funny thing: a VPS provider using VMware never gave us troubles, the one using KVM is really driving us crazy because of these crashes.
Edit 1: I, by no means, demand that you people deduce the solution from this miserable log. I’m stuck with this issue where no decent log is provided,
memtest is useless since the hardware is emulated and the VPS provider granted that their hardware is fine and no instances of KVM or QEMU crashed.
corruption of in-memory data detected is haunting me, and I can’t think of any productive approach to investigate further this issue.
“Corruption of in-memory data detected” doesn’t necessarily mean that the hardware RAM is bad. It could also indicate a block was read or written incorrectly, the storage flipped a bit or is otherwise failing, possibly filesystem bugs, and a few other causes.
Reverting to a snapshot probably won’t resolve the problem if there is some latent filesystem corruption; it’ll just show up again later.
Instead, you should
xfs_repair the filesystem, but since it’s the root filesystem you’ll need to boot from installation media or a rescue environment provided by your VPS provider.
xfs_repair fails to repair the filesystem, you may run
xfs_repair -L which will clear the XFS log (which may itself be corrupt) and then try to repair the filesystem again.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.