Why is bzip2 -d running in parallel not fully utilizing cpu resource?

John Jiang asked:

I have 1000 .bz2 files and a 48 core machine. So I decided to run bzip2 -d in parallel to speed up unzipping them. But what I found is that the utilization of cpu under top is very suboptimal. For instance, most of them have status D. Why is that? If I run an awk command in parallel, the utilization can easily hit 90%.

ls *.bz2  | xargs -P42 -n1 -I {} bash -c 'bzip2 -d {}'



top - 02:00:18 up 74 days, 18:44,  1 user,  load average: 43.48, 71.56, 98.47
Tasks: 518 total,   1 running, 517 sleeping,   0 stopped,   0 zombie
%Cpu(s): 16.1 us,  0.4 sy,  0.0 ni, 33.4 id, 50.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 26372707+total, 28146360 free,  4073868 used, 23150684+buff/cache
KiB Swap: 16777212 total,  9176276 free,  7600936 used. 25842281+avail Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 93025 xxx  20   0    7908   4208    524 D  22.5  0.0   0:48.88 bzip2
 93033 xxx  20   0    7908   4208    524 D  22.2  0.0   0:49.80 bzip2
 92999 xxx  20   0    7908   4208    524 D  21.9  0.0   0:48.03 bzip2
 93037 xxx  20   0    7908   4204    524 D  21.9  0.0   0:48.23 bzip2
 93028 xxx  20   0    7908   4208    524 D  21.5  0.0   0:50.01 bzip2
 93029 xxx  20   0    7908   4208    524 D  21.2  0.0   0:48.99 bzip2
 93010 xxx  20   0    7908   4208    524 D  20.9  0.0   0:48.67 bzip2
 93000 xxx  20   0    7908   4208    524 D  20.5  0.0   0:46.93 bzip2
 93017 xxx  20   0    7908   4208    524 D  20.5  0.0   0:49.94 bzip2
 93024 xxx  20   0    7908   4208    524 D  20.5  0.0   0:48.33 bzip2
 93012 xxx  20   0    7908   4204    524 D  20.2  0.0   0:50.15 bzip2
 93035 xxx  20   0    7908   4208    524 D  20.2  0.0   0:46.70 bzip2
 93030 xxx  20   0    7908   4208    524 D  19.9  0.0   0:49.31 bzip2
 93009 xxx  20   0    7908   4208    524 D  19.5  0.0   0:50.94 bzip2
 93036 xxx  20   0    7908   4208    524 D  19.5  0.0   0:49.81 bzip2
 93015 xxx  20   0    7908   4208    524 D  19.2  0.0   0:47.92 bzip2
 93018 xxx  20   0    7908   4208    524 D  19.2  0.0   0:46.80 bzip2
 93020 xxx  20   0    7908   4208    524 D  18.9  0.0   0:46.07 bzip2
 93022 xxx  20   0    7908   4208    524 D  18.9  0.0   0:48.09 bzip2
 93034 xxx  20   0    7908   4204    524 D  18.9  0.0   0:45.20 bzip2
 93001 xxx  20   0    7908   4204    524 D  18.5  0.0   0:46.34 bzip2
 93003 xxx  20   0    7908   4208    524 D  18.5  0.0   0:49.36 bzip2
 93007 xxx  20   0    7908   4208    524 D  18.5  0.0   0:40.25 bzip2
 93008 xxx  20   0    7908   4204    524 D  18.5  0.0   0:47.40 bzip2
 93016 xxx  20   0    7908   4208    524 D  18.5  0.0   0:49.45 bzip2
 93026 xxx  20   0    7908   4204    524 D  18.5  0.0   0:49.37 bzip2
 93011 xxx  20   0    7908   4208    524 D  18.2  0.0   0:46.10 bzip2
 93021 xxx  20   0    7908   4208    524 D  18.2  0.0   0:48.18 bzip2
 93031 xxx  20   0    7908   4208    524 D  17.9  0.0   0:47.79 bzip2
 93013 xxx  20   0    7908   4208    524 D  17.5  0.0   0:51.38 bzip2
 93014 xxx  20   0    7908   4208    524 D  17.2  0.0   0:48.84 bzip2
 93032 xxx  20   0    7908   4208    524 D  17.2  0.0   0:50.32 bzip2
 93038 xxx  20   0    7908   4208    524 D  17.2  0.0   0:49.15 bzip2
 93006 xxx  20   0    7908   4208    524 D  16.9  0.0   0:38.50 bzip2
 93027 xxx  20   0    7908   4208    524 D  16.9  0.0   0:44.81 bzip2
 93039 xxx  20   0    7908   4208    524 D  16.9  0.0   0:48.65 bzip2
 93004 xxx  20   0    7908   4208    524 D  16.6  0.0   0:39.21 bzip2
 93040 xxx  20   0    7908   4208    524 D  16.6  0.0   0:44.44 bzip2
 93002 xxx  20   0    7908   4208    524 D  16.2  0.0   0:38.31 bzip2
 93023 xxx  20   0    7908   4204    524 D  16.2  0.0   0:45.50 bzip2
 93019 xxx  20   0    7908   4208    524 D  15.6  0.0   0:46.01 bzip2
 93005 xxx  20   0    7908   4208    524 D  14.9  0.0   0:39.37 bzip2

My answer:


D officially means “uninterruptible sleep”, but in reality it means disk. You can’t parallelize this any further because your storage is too slow to handle the load you have asked for.

The CPU can certainly handle 42 threads of bzip, but the disks cannot keep up.


View the full question and any other answers on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.