How to manage RAID 10 on copr-backend
There are currently six AWS EBS sc1 volumes used for hosting Copr Backend build
results. Four disks are forming one 24T raid10, two more disk form 16T
raid1. These two arrays are used as “physical volumes” for the
copr-backend-data LVM volume group, and we have a single logical volume on
it with the same name copr-backend-data (ext4 formatted, mounted as
/var/lib/copr/public_html).
Everything is configured so the machine starts on its own and mounts everything
correctly. We just need to take a look at /proc/mdstat from time to time.
Manually checking/stopping checks
Commands needed:
echo idle > /sys/block/md127/md/sync_action
echo check > /sys/block/md127/md/sync_action
Detaching volume
It’s not safe to just force detach the volume in AWS EC2, it could cause data corruption. Since there are several layers (volumes -> raid -> LVM -> ext4) we need to go the vice versa while detaching.
stop apache, copr-backend, cron jobs, etc.
unmount:
umount /var/lib/copr/public_htmldisable volume group:
vgchange -a n copr-backend-datastop raids:
mdadm --stop /dev/md127now you can detach the volumes from the instance in ec2
Attaching volume
attach the volumes in AWS EC2
start raid and volume group
mdadm --assemble --scan. In case the--assemble --scandoesn’t reconstruct the array, it is OK to add the volumes manuallymdadm /dev/md127 --add /dev/nvme2n1p1.mount the
/dev/disk/by-label/copr-repovolume
There’s a ansible configuration for this, and list of volumes.
Adding more space
Create two
gp3volumes in EC2 of the same size and type, tag them withFedoraGroup: copr,CoprInstance: production,CoprPurpose: infrastructure. Attach them to a freshly started temporary instance (we don’t want to overload I/O with the initial RAID sync on production backend). Make sure the instance type has enough EBS throughput to perform the initial sync quickly enough.Always partition the disks with a single partition on them, otherwise kernel might have troubles to auto-assemble the disk arrays:
cfdisk /dev/nvmeXn1 cfdisk /dev/nvmeYn1
Create the
raid1array on both the new partitions:$ mdadm --create --name=raid-be-03 --verbose /dev/mdXYZ --level=1 --raid-devices=2 /dev/nvmeXn1p1 /dev/nvmeYn1p1
Wait till the new empty array is synchronized (may take hours or days, note we sync 2x16T). Check the details with
mdadm -Db /dev/md/raid-be-03. See the tips bellow how to make the sync speed unlimited withsysctl.Note
In case the disk is marked “readonly”, you might need the
mdadm --readwrite /dev/md/raid-be-03command.Place the new
raid1array into the volume group as a new physical volume (vgextend does pvcreate automatically):$ vgextend copr-backend-data /dev/md/raid-be-03
Extend the logical volume to span all the free space:
$ lvextend -l +100%FREE /dev/copr-backend-data/copr-backend-data
Resize the underlying
ext4filesystem (takes 15 minutes and more!):$ resize2fs /dev/copr-backend-data/copr-backend-data
Switch the volume types from
gp3tosc1, we don’t need the power ofgp3for backend purposes.Modify the https://github.com/fedora-copr/ansible-fedora-copr group vars referencing the set(s) of volume IDs.
Other tips
Note the sysctl dev.raid.speed_limit_max (in KB/s), this might affect
(limit) the initial sync speed, periodic raid checks and potentially the raid
re-build.
While trying to do a fast rsync, we experimented with a very large instance type
(c5d.18xlarge, 144GB RAM) and with vm.vfs_cache_pressure=2, to keep as many
inodes and dentries in kernel caches (see slabtop, we eventually had 60M of
inodes cached, 28M inodes and 15T synced in 6.5hours). We had also decreased
the dirty_ratio and dirty_background_ratio to have more frequent syncs
considering the large RAM.