Corruption of in-memory data detected: where does the issue lie?

elmazzun asked:

We are experiencing crashes on our SSD VPSes, all working on KVM: crashes occur for different reasons; in the hurry to restore the service my team uses to reload a previous snapshot of the machine and never saves the logs.

Anyway, among all the different crashes circumstances, a recurring fact is the corruption of in-memory data: our VPS provider told us their hardware is running fine, yet I don’t know how to read the poor log I was given.

enter image description here

What is involved when an “in-memory data corruption” is detected? Could it be because of broken RAM, or there are other kinds of memory corruption?

Funny thing: a VPS provider using VMware never gave us troubles, the one using KVM is really driving us crazy because of these crashes.

Edit 1: I, by no means, demand that you people deduce the solution from this miserable log. I’m stuck with this issue where no decent log is provided, memtest is useless since the hardware is emulated and the VPS provider granted that their hardware is fine and no instances of KVM or QEMU crashed. corruption of in-memory data detected is haunting me, and I can’t think of any productive approach to investigate further this issue.

My answer:

“Corruption of in-memory data detected” doesn’t necessarily mean that the hardware RAM is bad. It could also indicate a block was read or written incorrectly, the storage flipped a bit or is otherwise failing, possibly filesystem bugs, and a few other causes.

Reverting to a snapshot probably won’t resolve the problem if there is some latent filesystem corruption; it’ll just show up again later.

Instead, you should xfs_repair the filesystem, but since it’s the root filesystem you’ll need to boot from installation media or a rescue environment provided by your VPS provider.

If xfs_repair fails to repair the filesystem, you may run xfs_repair -L which will clear the XFS log (which may itself be corrupt) and then try to repair the filesystem again.

View the full question and any other answers on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.