Replacing the disc while maintaining the correct numbering in CEPH

It is assumed that as a result of this method we save the sequence in which the disks are displayed using the ceph osd tree command. If they are there in order, then it is easier to read and is considered, if necessary.

Lyrical digression on the topic. The official method of replacing a disk in ceph involves deleting all the logical entities associated with this disk from the cluster and then re-creating them. As a result, a newly-created osd (under some circumstances) can change its number (the number in the entity name, which is osd. The number) and location in the crush map and will naturally appear in another dream on the ceph osd tree and others. Change its sequence number.

The idea of this method is that we will not change any logical entities, but simply slip a new disk to the “old” place in the cluster. To do this, on this new disk you need to (re) create the correct data structures: all sorts of id, symlinks, and keys.

Mark up the new disk.

parted /dev/диск_с_данными mklabel gpt

Create a new section on our partition.

 parted /dev/sdaa mkpart primary ext2 0% 100% /sbin/sgdisk --change-name=1:'ceph data' -- /dev/sda1

We get the UUID of the dead osd

 ceph osd dump|grep 'osd.Номер'

We put PARTUUID on the data disk

 /sbin/sgdisk --typecode=1:99886b14-7904-4396-acef-c031095d4b62 -- /dev/Диск_с_данными

Find a section with a magazine

 ceph-disk list | grep for | sort

Create a file system disk

 /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdaa1

Mount FS

 mount -o rw,noatime,attr2,inode64,noquota /dev/Партиция_на_диске_с_данными /var/lib/ceph/osd/ceph-номер_OSD

We copy data from the next OSD

In fact, this is the most disgusting part of the procedure, you need to do everything carefully.

When copying, you must skip the / var / lib / ceph / osd / ceph-NUMBER / current directory, this is the data directory. We will create a symlink to journal later.

Copying

 for i in activate.monmap active ceph_fsid fsid journal_uuid keyring magic ready store_version superblock systemd type whoami; do cp /var/lib/ceph/osd/ceph-НОМЕР_СОСЕДА/${i} /var/lib/ceph/osd/ceph-НОМЕР; done

Looking for a magazine

 ceph-disk list | grep for | sort

accordingly, we find the partition, and do

 ls -l /dev/disk/by-partuuid | grep Партиция_Номер

We make a symlink for this UUID

 ln -s /dev/disk/by-partuuid/UUID /var/lib/ceph/osd/ceph-НОМЕР/journal

We fill fsid with the correct value

This fsid is actually a unique id, under which the osd scale is listed in the cluster, it is important because if you do not guess with id, then the osd-scale itself will not see the cluster and it will be mutual.

And the value must be taken from the partuuid partition on the log with the data.

 echo -n UUID >/var/lib/ceph/osd/ceph-НОМЕР/fsid

Fill in the keyring

With this, the osd scale is authorized in the cluster.

 ceph auth list|grep --after-context=1 'osd.НОМЕР'

It is recorded in a file in the format

 [osd.НОМЕР] key = СТРОКА_С_КЛЮЧОМ

Fill whoami

Just write in this file the number of OSD-shki, which we want to revive.

We hammer in magazine

 dd bs=32M oflag=direct if=/dev/zero of=/var/lib/ceph/osd/ceph-НОМЕР/journal

Create log metadata and osd-shki

 ceph-osd --mkfs -i Номер_OSD ceph-osd --mkjournal -i Номер_OSD

Change data owner

 chown -R ceph:ceph /var/lib/ceph/osd/ceph-НОМЕР

We start ceph-osd

Attention: Immediately after the launch of ceph-osd, rebuild will start if, until the moment when the disk exited the cliter, the ceph osd out NUMBER command was given.

 systemctl start ceph-osd.НОМЕР

Source: https://habr.com/ru/post/436494/

Replacing the disc while maintaining the correct numbering in CEPH

More articles: