Alaa Alomari asked:
I got nagios critical warning about a server, and when i checked
ps -aux i found that all of nginx (php-fpm) are in Uninterruptible sleep
www-data 1330 0.4 0.3 299992 108560 ? D 16:06 0:16 php-fpm: pool www www-data 1338 0.4 0.2 254728 92728 ? D 16:06 0:16 php-fpm: pool www www-data 1346 0.4 0.3 293544 100272 ? D 16:06 0:17 php-fpm: pool www www-data 1356 0.7 0.3 302504 101532 ? D 16:06 0:29 php-fpm: pool www www-data 1357 0.3 0.2 270672 85952 ? D 16:06 0:13 php-fpm: pool www ....
and i was stuck with it and couldn’t even restart nginx. and finally i restart the server to fix the issue!
although I have this in /etc/php5/fpm/php.ini
emergency_restart_threshold=10 emergency_restart_interval=1m process_control_timeout=10s
which means that php5-fpm is supposed to restart in such cases, but it didn’t!!
any idea of what might cause those processes to go in uninterruptible sleep status and how to avoid such case in future?
Thanks for your help
While D in
top means uninterruptible sleep, I find it’s easier to just think of D for Disk. The process is waiting on the kernel to get back to it with something, and 95% of the time this is reading from a disk.
The fact that it’s uninterruptible sleep is why php-fpm can’t restart itself.
So in this case you will want to check your disks, first with
fsck -f /dev/mapper/VG-LV in single user mode, (if it’s a remote dedicated server or VPS then you’ll have to use a remote KVM console for this) then read the SMART data with
smartctl -a /dev/sd? (if they’re not in a hardware RAID array; if it’s hardware RAID, use the vendor-provided tool) to see if one of your disks may be going bad.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.