How to cleanly kill a stuck docker service?

Nutle asked:

Working with a Centos7 with 3.10 kernel, docker 19.03.12.

Eventually, one of the docker images got full and wrote the entire /var/ mount to 100%, crashing both the docker service and the running containers.

Now there are 2 zombie process left that I can’t kill (with kill -9 or killall:

ps axjf | grep docker
    1 30215 30215 30215 ?           -1 Ds       0   0:00 [docker-entrypoi]
    1 32063 32063 32063 ?           -1 Zsl      0   0:00 [dockerd] <defunct>

Meanwhile, on /var/log/messages I’m getting:

kernel: XFS (dm-8): Failing async write on buffer block 0xb78170. Retrying async write.
kernel: XFS (dm-8): metadata I/O error: block 0xb78170 ("xfs_buf_iodone_callback_error") error 28 numblks 8

where it seems that some IO is still trying to write some data. This seems to be repeating on an infinite cycle, and I’m not sure how to stop it.

du -sh and ls -al hangs quickly when inspecting the /var/lib/docker files.

Additionally, service docker stop/start also hangs; top reports very high load/wait times (around 23 for a 4 core machine).

My question: without rebooting the machine, what would the best way to cleanly stop the xfs writes, kill the zombie processes and restart the services?

My answer:

Free up some disk space.

The error reported by the kernel messages you posted is 28, "No space left on device".

View the full question and any other answers on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.