I have 10 disks with 8 TB each in a hardware RAID6 (thus, 8 data disks + 2 parity). Following the answer of a very similar question, I hoped for an automatic detection of all necessary parameters. However, when creating the XFS file system at the end, I got
# mkfs.xfs /dev/vgdata/lvscratch meta-data=/dev/vgdata/lvscratch isize=256 agcount=40, agsize=268435455 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0 data = bsize=4096 blocks=10737418200, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
This looks like that striping has not been used. Due to the different terms I found on different sites (strip size, stripe size, stripe chunk, …), I would like to ask whether I got the manual parameters right.
The RAID 6 has been set-up with a strip size of 256KB:
# ./storcli64 /c0/v1 show all | grep Strip Strip Size = 256 KB
Thus, the stripe size is 8*256KB = 2048KB = 2MB. Is this correct? According to this (and if I understand it correctly), the
pvcreate has to use the strip (or chunk) size as argument to
# pvcreate --dataalignment 256K /dev/sdb Physical volume "/dev/sdb" successfully created
Note that I used the whole RAID device without partitions. Now a
# vgcreate vgdata /dev/sdb Volume group "vgdata" successfully created
with a default PE Size of 4MB should be fine because it is a multiple of the stripe size of 2MB. Correct?
Now, a part of the vgroup is assigned to a logical volume:
# lvcreate -L 40T vgdata -n lvscratch Logical volume "lvscratch" created.
Finally, the file system is created but now with the correct arguments (stripe size of 2MB, stripe width of 8):
# mkfs.xfs -d su=2048k,sw=8 /dev/vgdata/lvscratch meta-data=/dev/vgdata/lvscratch isize=256 agcount=41, agsize=268434944 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0 data = bsize=4096 blocks=10737418240, imaxpct=5 = sunit=512 swidth=4096 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=0 log =internal log bsize=4096 blocks=521728, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0
Is this approach correct? Is there anything to keep in mind for an extension of the logical volume or the volume group? I suppose that if the volume group would be extended with another RAID6 system, the strip size should be equal to the present RAID6.
EDIT: My confusion seems to be mainly based on the different usage of terms connected to stripe. The manufactor of my RAID controller, LSI or Avago, defines the terms in the following way:
Stripe width is the number of drives involved in a drive
group where striping is implemented. For example, a four-disk drive
group with disk striping has a stripe width of four.
stripe size is the length of the interleaved data segments that the
RAID controller writes across multiple drives, not including parity
drives. For example, consider a stripe that contains 64 KB of disk
space and has 16 KB of data residing on each disk in the stripe. In
this case, the stripe size is 64 KB, and the strip size is 16 KB.
The strip size is the portion of a stripe that resides on a
The segments of sequential data written to or read from a disk before
the operation continues on the next disk are usually called chunks,
strides or stripe units, while their logical groups forming single
striped operations are called strips or stripes. The amount of data in
one chunk (stripe unit), often denominated in bytes, is variously
referred to as the chunk size, stride size, stripe size, stripe depth
or stripe length. The number of data disks in the array is sometimes
called the stripe width, but it may also refer to the amount of data
within a stripe.
The amount of data in one stride multiplied by the number of data
disks in the array (i.e., stripe depth times stripe width, which in
the geometrical analogy would yield an area) is sometimes called the
stripe size or stripe width. Wide striping occurs when chunks of
data are spread across multiple arrays, possibly all the drives in the
system. Narrow striping occurs when the chunks of data are spread
across the drives in a single array.
Even in the Wikipedia text above stripe size is used with two different meanings. However, I suppose now, when creating the xfs file system, the size of a single chunk stored on a single drive has to be given as argument to su. This, it should be
mkfs.xfs -d su=256k,sw=8 in the command above. Correct?
Rather than “strip size” and “stripe size”, the XFS man pages use the terms “stripe unit” and “stripe width” respectively.
This makes it possible to decode the otherwise confusing text in the
mkfs.xfs(8) man page:
sunit=value This is used to specify the stripe unit for a RAID device or a logical volume. The value has to be specified in 512-byte block units. Use the su subop‐ tion to specify the stripe unit size in bytes. This suboption ensures that data allocations will be stripe unit aligned when the current end of file is being extended and the file size is larger than 512KiB. Also inode allocations and the internal log will be stripe unit aligned. su=value This is an alternative to using sunit. The su sub‐ option is used to specify the stripe unit for a RAID device or a striped logical volume. The value has to be specified in bytes, (usually using the m or g suffixes). This value must be a multiple of the filesystem block size.
So, with your array reporting a strip size of 256KiB, you would specify either
sunit=512 (because 512 512-byte blocks equals 256KiB).
swidth=value This is used to specify the stripe width for a RAID device or a striped logical volume. The value has to be specified in 512-byte block units. Use the sw suboption to specify the stripe width size in bytes. This suboption is required if -d sunit has been specified and it has to be a multiple of the -d sunit suboption. sw=value suboption is an alternative to using swidth. The sw suboption is used to specify the stripe width for a RAID device or striped logical volume. The value is expressed as a multiplier of the stripe unit, usu‐ ally the same as the number of stripe members in the logical volume configuration, or data disks in a RAID device. When a filesystem is created on a logical volume device, mkfs.xfs will automatically query the logi‐ cal volume for appropriate sunit and swidth values.
With 10 spindles (8 data, 2 parity) you would specify either
sw=8 (data spindles) or
swidth=2M (the strip size multiplied by data spindles).
swidth as being specified in units of 512B sectors; that’s unfortunately not the unit they’re reported in, however.
mkfs.xfs report them in multiples of your basic block size (
bsize) and not in 512B sectors.
The easiest way to specify these is usually by strip size and spindle count, thus
su= strip size and
sw= spindle count.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.