I am currently working on setting up a small high availability server cluster on Amazon’s EC2 cloud. Such a setup requires several underlying technologies to work together. Common among these are a distributed file system, a load balancer, and some form of monitoring and resource control. This article looks at the one aspect of ‘monitoring’ – a messaging layer – and its basic setup.
Package | Description |
Cluster Glue | Common dependency |
Heartbeat | Messaging layer (older, no new features being added) |
Corosync | Messaging layer (newer, preferred and under active development) |
OpenAIS | Protocol, formerly part of Corosync, implements AIS layer (not always required) |
Pacemaker | Resource manager (works with both messaging layers, formerly part of Heartbeat) |
Resource Agents | Scripts for controlling some services/resources |
Messaging Layers
The underlying basis of a monitoring setup entails requesting (or sending) data (a file, a packet, etc.) from each node in the cluster on a periodic basis. The nodes would ideally all communicate with each other (either directly, or through a master node) and will therefore each know the status of the other nodes. Typically, the role of this inter-node communication is performed by something akin to a ‘bus’. Essentially a general protocol which will accept messages to be transmitted to other nodes as well as providing a ‘pulse’ to signify that the node is up. (If the pulse is not received when expected, other nodes conclude that the node is down).
Two of the commonly used messaging layers are Heartbeat and Corosync.
Corosync is the more recent and possibly more flexible one, and is likely to eventually obsolete heartbeat. The project has some notable backers (e.g. Red Hat) and is under active development. Corosync is available in the amzn
repository, however, that version (1.2.3 at this time) does not support unicast. Only Corosync versions 1.3.0+ (Nov, 2010) have support for UDPU (UDP unicast) [although, a patch exists for some previous versions]. Unicast is of specific mention here because Amazon’s network does not permit broadcast or multicast transmissions.
Despite having finally opted to use Corosync as my messaging layer, I initially experimented with Heartbeat. The following briefly outlines how to setup and test Heartbeat on EC2. (This doesn’t include the monitoring of specific resources or the setup of the haresources file – just the basic setup and testing).
The RPMs generated here satisfy some of the dependencies of Pacemaker (in addition to Heartbeat and Cluster-Glue, Pacemaker also requires Corosync).
Pre-requisites
The setup below is specific to Amazon’s Linux distribution (RHEL/CentOS derived) – but should be applicable to other distributions with little modification.
Heartbeat is not available from the amzn repository, and I have decided to build RPMs from source (since you will need to install it on more than one server, the RPM approach saves a bit of time over re-compiling on each node).
For building of RPMs, you will want to have the ‘Development Tools’ installed (not recommended on a production machine).
yum groupinstall "Development Tools"
Heartbeat requires Cluster-Glue for compilation/installation. Resource Agents are also commonly installed.
You might want to create a user: hacluster and group: haclient as they are referenced by the install (the install will succeed though, even without them)
There are a number of dependencies you should install first:
(flex
and bison
are installed with ‘Development Tools’, and will be ignored in the list below if you already have them) (Note: mailx
(and the mail
command) is not included in the most recent version of Amazon’s Linux and has therefore been included as a dependency, below.)
yum install -y flex bison net-snmp OpenIPMI glib2-devel libxml2-devel bzip2-devel libuuid-devel docbook-utils docbook-dtds libtool-ltdl libtool-ltdl-devel libxslt perl-TimeDate python-devel OpenIPMI-devel openssl-devel docbook-style-xsl help2man e2fsprogs-devel mailx
You also need Python. .
Visit the Linux High Availability download page for the latest releases of the files below: http://www.linux-ha.org/wiki/Download
Cluster-Glue RPM
wget http://hg.linux-ha.org/glue/archive/glue-1.0.7.tar.bz2 tar -xjvf glue-*.tar.bz2 mv Reusable-Cluster-Components-glue--glue-* cluster-glue tar -cjvf cluster-glue.tar.bz2 cluster-glue mv cluster-glue.tar.bz2 /usr/src/rpm/SOURCES/ cd cluster-glue rpmbuild --bb cluster-glue-fedora.spec
The above commands download, extract, rename, and repackage the files; copy the repackaged file to the SOURCES directory, and build the RPM using the fedora spec file provided.
When done, the following RPMS can be found in: /usr/src/rpm/RPMS/x86_64/
- cluster-glue-1.0.7-1.amzn1.x86_64.rpm
- cluster-glue-debuginfo-1.0.7-1.amzn1.x86_64.rpm
- cluster-glue-libs-1.0.7-1.amzn1.x86_64.rpm
- cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64.rpm
Heartbeat RPM
Install cluster-glue-libs, cluster-glue, and cluster-glue-libs-devel (built above) [we need the --nogpgcheck
parameter, since we have not signed the RPMs we created]:
yum install --nogpgcheck cluster-glue-1.0.7-1.amzn1.x86_64.rpm cluster-glue-libs-1.0.7-1.amzn1.x86_64.rpm cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64.rpm
Following a procedure essentially the same as above, run the following to build the Heartbeat RPMs:
wget http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/STABLE-3.0.4.tar.bz2 tar -xjvf STABLE-*.tar.bz2 mv Heartbeat-3-0-STABLE-* heartbeat tar -cjvf heartbeat.tar.bz2 heartbeat mv heartbeat.tar.bz2 /usr/src/rpm/SOURCES/ cd heartbeat rpmbuild --bb heartbeat-fedora.spec
When done, the following RPMS can be found in: /usr/src/rpm/RPMS/x86_64/
- heartbeat-3.0.4-1.amzn1.x86_64.rpm
- heartbeat-debuginfo-3.0.4-1.amzn1.x86_64.rpm
- heartbeat-devel-3.0.4-1.amzn1.x86_64.rpm
- heartbeat-libs-3.0.4-1.amzn1.x86_64.rpm
Resource Agents RPMs
This last step is optional, but commonly used with Heartbeat – these are scripts for managing resources. The preparation is quite similar to those above.
wget -O agents-1.0.4.tgz https://github.com/ClusterLabs/resource-agents/tarball/agents-1.0.4 tar -xzvf agents-*.tgz mv ClusterLabs-resource-agents-* resource-agents tar -cjvf resource-agents.tar.bz2 resource-agents mv resource-agents.tar.bz2 /usr/src/rpm/SOURCES/ cd resource-agents rpmbuild --bb resource-agents.spec
When done, the following RPMS can be found in: /usr/src/rpm/RPMS/x86_64/
- ldirectord-1.0.4-1.amzn1.x86_64.rpm
- resource-agents-1.0.4-1.amzn1.x86_64.rpm
- resource-agents-debuginfo-1.0.4-1.amzn1.x86_64.rpm
Install the RPMs
(Again, we need –nogpgcheck since we haven’t signed the RPMs)
yum --nogpgcheck install heartbeat-3.0.4-1.amzn1.x86_64.rpm heartbeat-libs-3.0.4-1.amzn1.x86_64.rpm resource-agents-1.0.4-1.amzn1.x86_64.rpm
Basic Setup
The following outlines a minimum necessary to get heartbeat sending a pulse between two nodes.
cd /usr/share/doc/heartbeat-3.0.4/ cp authkeys ha.cf haresources /etc/ha.d/ cd /etc/ha.d
Essentially, we are copying the sample files to the configuration directory (ha.d).
We will now generate authkeys: (the sed command is to remove an extra ‘(stdin)= ‘ that openssl adds)
( echo -ne "auth 1\n1 sha1 "; dd if=/dev/urandom bs=512 count=1 | openssl sha1 | sed 's/.*= //' ) > /etc/ha.d/authkeys
Set the permissions:
chmod 600 authkeys
Basic configuration
/etc/ha.d/ha.cf: (change the IPs)
logfile /var/log/ha-log logfacility local0 keepalive 2 deadtime 30 initdead 120 udpport 694 ucast eth0 10.xxx.xxx.xxa ucast eth0 10.xxx.xxx.xxb auto_failback off node server1.domain.com node server2.domain.com
The above presumes that you have your node IPs setup in your hosts
file (or some other DNS equivalent), if not, use IP addresses for the node values. The IPs on the ucast lines are the private IPs of the instances (one is the local instance, the second is the other node).
For full details on the configuration file, see: http://www.linux-ha.org/wiki/Ha.cf
Remember to open UDP 694 in the security group
Copy authkeys
, ha.cf
, haresources
to second machine and set the file permissions.
Start heartbeat on both machines.
service heartbeat start
Test and Diagnose
Verify ports are open:
(if nmap is not installed: yum install –y nmap
)
nmap -p 694 -sU -P0 10.xxx.xxx.xxx
Watch the ‘heartbeat’ communications:
tcpdump port 694
Watch the log to see when a node comes up or goes down (i.e when you start/stop heartbeat on the other server) [use ctrl+c
to exit]
tail -f /var/log/ha-log
you also need there deps before issuing rpmbuild for the first time:
yum install -y libtool autoconf automake
All of these are part of the “Development Tools” package in the amzn repository.
Running yum groupinfo “Development Tools” gives the following:
Group: Development Tools, Description:
These tools include core development tools such as automake, gcc, perl, python, and debuggers.,
Mandatory Packages: autoconf, automake, binutils, bison, flex, gcc, gcc-c++, gdb, gettext, libtool, make, pkgconfig, rpm-build, strace, system-rpm-config
Default Packages: automake14, automake15, automake16, automake17, byacc, cscope, ctags, cvs, dev86, diffstat, doxygen, elfutils, gcc-gfortran, indent, ltrace, oprofile, patchutils, pfmon, pstack, python-ldap, rcs, splint, subversion, swig, systemtap, texinfo, valgrind
Optional Packages: ElectricFence, dejagnu, expect, gcc-gnat, gcc-objc, gcc44, gcc44-c++, gcc44-gfortran, imake, java-1.6.0-openjdk, java-1.6.0-openjdk-devel, libgfortran44, memtest86+, nasm, pexpect, python24-docs, python26-docs, unifdef
also please add this deps and see this error:
sudo yum install -y gettext
rpmbuild –bb heartbeat-fedora.spec
error: Failed build dependencies:
cluster-glue-libs-devel is needed by heartbeat-3.0.4-1.amzn1.x86_64
Again, gettext is part of “Development Tools” – I used it to easily address the build dependencies.
Also, cluster-glue-libs-devel is part of the first line of the ‘Heartbeat RPM’ section:
yum install –nogpgcheck cluster-glue-1.0.7-1.amzn1.x86_64.rpm cluster-glue-libs-1.0.7-1.amzn1.x86_64.rpm cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64.rpm
rpm -i ~/rpmbuild/RPMS/x86_64/cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64.rpm
error: Failed dependencies:
cluster-glue-libs = 1.0.7-1.amzn1 is needed by cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64
liblrm.so.2()(64bit) is needed by cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64
libpils.so.2()(64bit) is needed by cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64
libplumb.so.2()(64bit) is needed by cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64
libplumbgpl.so.2()(64bit) is needed by cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64
libstonith.so.1()(64bit) is needed by cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64
You need to install them all together – cluster-glue-libs-devel depends on cluster-glue-libs.
After you have built them, either install all simultaneously with rpm -i, or use the method above, to install them via yum (which will resolve any additional dependencies for you)
Again: yum install –nogpgcheck cluster-glue-1.0.7-1.amzn1.x86_64.rpm cluster-glue-libs-1.0.7-1.amzn1.x86_64.rpm cluster-glue-libs-devel-1.0.7-1.amzn1.x86_64.rpm
(there are two dashes in front of ‘nogpgcheck’)
in Resource Agents RPMs pls replace the line with:
s/tar -xzvf ClusterLabs-resource-agents-agents-*.tar.gz/tar -xzvf agents-1.0.4/
thanks
Thanks for catching that – I have made a very similar change which accomplishes the same thing.
current ami doesnt have mailx by default. one should add it before this procedure to be able to build heartbeat (second rpmbuild )
Thanks, the additional dependency has been noted in the article.
what is 10.xxx.xxx.xxx ? in
ucast eth0 10.xxx.xxx.xxx
ucast eth0 10.xxx.xxx.xxx
is it local ip of server1.domain.com and server2.domain.com or virtual ip? i want to use any of instance(which is active) from outside using single ip(or host name) how can i do that on EC2?
dont we need so specify
node01 172.16.4.82 httpd
in haresources ?
The lines 10.xxx.xxx.xxx are the private (internal) IP addresses of the instances. They were used in order to keep transmissions secure between the two instances. If you want to use a public IP address, you can do so. However, you may need to make some changes to the security group settings. Also, you should probably use an elastic IP address on each instance – the advantage being that the elastic IP will map to an internal IP when two instances are in the same zone, and otherwise to the the public IP, usually providing the optimal (fastest and least expensive) route. The node directives define all the nodes (including the local node) in the cluster – it should match uname -n for the node (i.e. not an IP address). You can specify more than one node per line though. The ucast directives describe where to send packets (and which interface to use) – IP addresses are good here.
The haresources directive usually has the node name of the node (i.e. uname -n), the configured IP, and the resource. In other words, they need to match what you put into the ha.cf file.
thank you for your reply.
I want to make active/passive failover instance
I have two instance, for example
instance1: hostname1(reachable from public)
localip1
instance2: hostname2(reachable from public)
localip2
so i need to do
ucast eth0 localip1
ucast eth0 localip2
now i want to know if i can use third ip(may be elastic ip) so if i open that in my browser i can hit hostname1 or hostname2 which is available
how can i do that
reply would be appreciated
thanx
You can only map a single IP address – whether it is an elastic IP or the default dynamic IP addresses – to a given instance. The traditional AWS solution to your problem would be to use an Elastic Load Balancer – which will distribute the requests between the two instances. If that is not the avenue you wish to pursue, you can setup HAProxy (or even use nginx or Varnish) as your load balancer. Install it on an instance (it can be on the same instances your are load balancing – although, that adds to the complexity). In the end though, it is fairly typical to have a single public facing IP (unless it is a really large site) (which corresponds to the load balancer), and all traffic is then directed to private (i.e. non-web accessible) backends.
I might suggest you look into Corosync instead of Heartbeat – I found it to be a bit more friendly, and it is actively maintained (unlike Heartbeat which it has mostly superseded). If you are interested, I can post my notes on setting up Corosync on EC2. (It has been a few months, but the way I approached this problem was to create two identical instances running HAProxy, OpenVPN, Pacemaker, Corosync, and Gluster. I seem to recall OpenVPN being required for some of the network communication between some of the components, as they used protocols not supported by the EC2 network – although, it is possible that VPC may work instead (I couldn’t test it at the time).
I might suggest the site ServerFault if you are not familiar with it – it is a Q&A site for system’s admins. If you have specific problems, you should be able to get some help there (and I answer questions there as well, under the same alias).
problem compiler,
rpmbuild --bb heartbeat-fedora.spec
__________________________________________________________
please.
thanks.
Those files are provided by the package
libtool-ltdl-devel
. Although, I do not recall needing it to compile the packages. If you install the package, the files (lt__dirent.c
,lt__strl.c
,argz.c
) are in/usr/share/libtool/libltdl/
. I might suggest ensuring that you have the latest build tools first, and also the latest version of Heartbeat (which is currently 3.0.5 – the article references 3.0.4). Also, keep in mind that the article was written for Amazon’s Linux AMI – while it may work on other RHEL derived systems, it wasn’t tested on them.I am testing the installation on CentOS 6.2 the first compilation worked well but this has drawbacks
________________
la ruta que dijiste esta bien. Esta buscando otra ruta?
(The path you specified is fine. Is this looking in another directory?)
/usr/share/libtool/libltdl/
If you watch the compile, you should see the following:
The path it is looking in is the
replace/
directory of the source tree. Presumably, libtool should copy the needed files into that directory.I spun up a virtual machine with a fresh install of CentOS 6.2 (minimal, i386) and gave it a try. Heartbeat built without any issues. Here is my command log (you’ll note a few variations from the article, but nothing major):
Update and install dependencies:
Setup bulid environment:
Get the packages, extract, rename, retar:
Build (change the architecture if needed):
I didn’t build Resource-agents, but the procedure should amount to: