Note: this was done more as an experiment than for something I intended to use in production – so consider it to be more a compilation of notes than a full out procedure.
DRBD – Distributed Replicated Block Device – is a kernel level storage system that replicates data across a network. It uses TCP – and typically runs on port(s) starting at 7788. A typical setup will pair DRBD with Heartbeat/Corosync, so that in the event of the failure of a node, the other node can be promoted to primary (or will use a dual-primary setup), and a network filesystem so that both nodes can access the data simultaneously.
The setup described below will only allow one node to access the data at any given time and requires a manual failover to promote the secondary node to primary.
For the following, I am using 2 up-to-date instances running Amazon’s Linux AMI 2011.09 (ami-31814f58) – which is derived from CentOS/RHEL. Both are in the same security group, and these are the only two instances in that security group. Also the hostnames of both instances are unchanged from their default – this is only relevant if you try to use the script included below – if you manually setup the configuration, the hostnames can be whatever you wish.
I have attached one EBS volume to each instance (in addition to the root volume), at /dev/sdf (which is actually /dev/xvdf on Linux).
Install DRBD
Note: all steps in the section are to be performed on both nodes
This AMI already includes the DRBD kernel module in its default kernel. You can verify this with the following:
modprobe -l | grep drbd kernel/drivers/block/drbd/drbd.ko
Likewise, to find the version of the kernel module, you can use:
modinfo drbd | grep version version: 8.3.8
It is typically preferable to have the version of the kernel module match the version of the userland binaries. DRBD is no longer included in the CentOS 6 repository – and is not in either the amzn or EPEL repositories. The remaining options, are to therefore use another repository or to build from source – I’d favour the former.
ElRepo – which contains primary hardware related packages – maintains up to date binaries for CentOS and its derivatives – we can either install a specific RPM or simply use the latest copy from the repository.
rpm --import http://elrepo.org/RPM-GPG-KEY-elrepo.org
From RPM (for 32-bit version):
rpm -Uvh http://elrepo.org/linux/elrepo/el5/i386/RPMS/drbd83-utils-8.3.8-3.el5.elrepo.i386.rpm
From Repository (current version 8.3.12 – doesn’t match installed kernel version 8.3.8):
rpm -Uvh http://elrepo.org/elrepo-release-6-4.el6.elrepo.noarch.rpm yum install drbd83-utils
Load the kernel module with:
modprobe -v drbd
Setup meta-data storage
Note: all steps in the section are to be performed on both nodes
DRBD can store meta-data internally or externally. Internal storage tends to be easier to recover, while external storage tends to offer better latency. Moreover, for EBS volumes using an XFS filesystems with existing data, external meta-data is required (since there is typically no place to store the meta-data on the disk – XFS can’t shrink, and EBS can’t be enlarged directly).
According to the DRBD User Guide, meta-data size, in sectors, can be calculated with:
echo $(((`blockdev --getsz /dev/xvdf`/32768)+72))
However, for external meta data disks, it appears that you need 128MiB per index (disk). Creating a smaller disk will result in the error “Meta device too small”.
To create our meta-data storage (/var/drbd-meta
– change as desired) – initially zeroed out – we will use dd
, with /dev/zero
as an input source and then mount the file on a loopback device.
dd if=/dev/zero of=/var/drbd-meta bs=1M count=128 losetup /dev/loop0 /var/drbd-meta
Configure DRBD
The default DRBD install creates /etc/drbd.conf
– which includes /etc/drbd.d/global_common.conf
and /etc/drbd.d/*.res
. You will want to make some changes to global_common.conf – for performance and error handling, but for now I am just using the default.
You will need to know the hostname and IP address of both instances in your cluster to setup a resource file. It is important to note that DRBD uses IP address of the local machine to determine which interface to bind to – therefore, you must use the private IP address for the local machine.
You can of course, use an elastic IP as the public IP address. The default port used by DRBD is 7788, and I have used the same, below – you need to open this port (TCP) in your security group.
Setup a resource file /etc/drbd.d/drbd_res0.res.tmpl (on both nodes):
resource drbd_res0 { syncer {rate 50M;} device /dev/drbd0; disk /dev/xvdf; meta-disk /dev/loop0[0]; on @@LOCAL_HOSTNAME@@ { address @@LOCAL_IP@@:7788; } on @@REMOTE_HOSTNAME@@ { address @@REMOTE_IP@@:7788; } }
The above ‘resource’ defines the basic information about the disk and the instances. Note: you should change the ‘disk
’ to match the device name you attached your EBS volume as, and ‘meta-disk
’ should correspond to the device setup above (or use internal).
If you manually replace the template placeholders, above, you must use the private IP address for the LOCAL_IP
, however, you can use either the public or private IP for the REMOTE_IP
. The LOCAL_HOSTNAME and REMOTE_HOSTNAME values should match the output of the hostname
command on each system. Keep in mind that if you are using a public IP address, you may incur data transfer charges (also keep in mind that an elastic IP maps to the private IP address at times, which will save on data transfer charges). Also the file extension should be .res (not .tmpl) if you make the replacement manually.
A typical setup would have identical resource files on both the local and remote machines. If we wish to use the public IP addresses, this is not possible (since the public IP is not associated with an interface in EC2). Therefore, I used the following script to setup the correct values in the above file (note, you need to setup your private key and certificate in order to use the API tools):
#!/bin/sh export EC2_PRIVATE_KEY=/path/to/pk-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.pem export EC2_CERT=/path/to/cert-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.pem REMOTE_INFO=$(ec2-describe-instances --filter instance-state-name=running --filter group-name=$(curl -s http://169.254.169.254/latest/meta-data/security-groups) | grep INSTANCE | grep -v $(curl -s http://169.254.169.254/latest/meta-data/instance-id) | awk '{sub(/\..*/, "", $5);print $5, $14}') REMOTE_HOSTNAME=$(echo $REMOTE_INFO | cut -d ' ' -f1) REMOTE_IP=$(echo $REMOTE_INFO | cut -d ' ' -f2) LOCAL_HOSTNAME=$(hostname) LOCAL_IP=$(ifconfig eth0 | grep "inet addr" | cut -d':' -f2 | cut -d' ' -f1) sed -e "s/@@LOCAL_HOSTNAME@@/$LOCAL_HOSTNAME/g" \ -e "s/@@LOCAL_IP@@/$LOCAL_IP/g" \ -e "s/@@REMOTE_HOSTNAME@@/$REMOTE_HOSTNAME/g" \ -e "s/@@REMOTE_IP@@/$REMOTE_IP/g" \ /etc/drbd.d/drbd_res0.res.tmpl > /etc/drbd.d/drbd_res0.res
Of course, there are a few shortcomings to the above – it will only handle two instances (the local and one remote) in the group and it expects the hostname to be unchanged (i.e. the value derived from ec2-describe-instances
). The above script uses the security group to determe the servers in the. As such, it requires both instances to be in the same security group and will only work if that security group has exactly two instances in it. (It would be trivial to modify it to use something other than security group – for instance a specific tag, but handling more than two instances matching the criteria would take a bit more effort).
At this point you should have an /etc/drbd.d/drbd_res0.res
file on both nodes, with the appropriate information filled in (either manually or using a script) – it is worth mentioning that the filename doesn’t actually matter (as long as it ends in .res
– which is what /etc/drbd.conf
is setup to look for).
Final steps
We are just about done at this point – everything is configured, and DRBD is setup on each instance. We now need to actually create the meta-data disk for our specific resource (run on both nodes):
drbdadm create-md drbd_res0
Finally, we start DRBD (on both nodes):
service drbd start
We can find the status of our nodes, either by using service drbd status
, drbd-overview
, or cat /proc/drbd
:
version: 8.3.8 (api:88/proto:86-94) srcversion: 299AFE04D7AFD98B3CA0AF9 0: cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r---- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:1048576
At this point, we have not actually defined which node is to be the primary node – both are therefore classed as secondary, something we will resolve momentarily.
Up until this point, all steps have been done on both instances. Without a dual-primary/network file system setup, the DRBD files will only be accessible to one instance at a time. The primary node will be able to read and write to the volume, but the secondary node will not. In a failover scenario, we would promote the secondary node to primary, and it will then have full access to the volume.
We must now promote one node to primary. It is important to note that you cannot promote a node to primary if the nodes are inconsistent (see the status above). To do so, initially, you will need to use the --overwrite-data-of-peer
option. Be careful, as this option will completely overwrite the data on the other node:
drbdadm -- --overwrite-data-of-peer primary drbd_res0
If the nodes are UpToDate, you can use:
drbdadm -- primary drbd_res0
Checking the status of our nodes, will now reveal, that one is primary, and if necessary, a sync may be in progress:
version: 8.3.8 (api:88/proto:86-94) srcversion: 299AFE04D7AFD98B3CA0AF9 0: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r---- ns:88968 nr:0 dw:0 dr:97432 al:0 bm:5 lo:5 pe:17 ua:248 ap:0 ep:1 wo:b oos:960128 [>...................] sync'ed: 9.0% (960128/1048576)K delay_probe: 0 finish: 0:00:32 speed: 29,480 (29,480) K/sec
Wait for the sync to finish before proceeding – at which point there should be 0 bytes out of sync (oos:0), and both nodes should be UpToDate:
version: 8.3.8 (api:88/proto:86-94) srcversion: 299AFE04D7AFD98B3CA0AF9 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r---- ns:1048576 nr:0 dw:0 dr:1049240 al:0 bm:64 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
Filesystem and Mounting
At this point, we are ready to use our DRBD device. We start by setting up a filesystem. My preference is XFS:
yum install xfsprogs mkfs.xfs /dev/drbd0
(Note: both nodes should have xfsprogs
installed if you use XFS as your filesystem – but you will only format the device on the primary node).
We now create a mountpoint and mount the device (again, only on the primary node):
mkdir /data mount /dev/drbd0 /data
Hopefully, at this point everything is setup and operational – any data we save to /data
should now be replicated over the network to our secondary node.
A Quick Test
The most basic test involves the following – create a test file on the primary node, manually failover, and check for the file on what was originally the secondary node:
On the primary node:
echo "This is a test" > /data/test.txt umount /data drbdadm secondary drbd_res0
On the secondary node:
drbdadm primary drbd_res0 mkdir /data mount /dev/drbd0 /data cat /data/test.txt
To be able to simultaneously access the data on both nodes, we need to setup both nodes as primary, and use a network file system – such as OCFS2 or GFS2 (instead of XFS), in order to minimize the risk of inconsistencies. That, however, is an experiment for a future date. (Of course, there are other alternatives to DRBD – my personal preference being GlusterFS on EC2, which, while having a bit of additional overhead, is simpler to setup and has quite a few more features).
1 comment