Now that we have each image setup with its appropriate software and ready to be deployed we have to give some thought to how High Availability will be taken care off.
For this we are reusing the VPN connection we already have established. We will have each server started with an index (this is done through the user data that Amazon Web Services provides as part of the startup of an instance). That index will be used to establish an order on which servers will attempt to establish different roles, for this we are using the following IDs
Frontend1 = OpenVPN server
Frontendn (n > 1) = OpenVPN client, monitors Frontendn-1
Backend1 = Active MySQL server, OpenVPN client
Backendn (n > 1) = Passive MySQL server, OpenVPN client
So, regarding high availability, each server has an ID as well as an IP which is assigned by the first frontend according to the certificate used (each possible role has a different certificate emitted by the frontend). Since this is not going to be a large deployment we can simply write the possible values:
/etc/hosts
127.0.0.1 localhost localhost.localdomain
70.xxx.xxx.xxx frontend1
192.168.21.1 frontend1.internal
192.168.21.11 frontend2.internal
192.168.21.12 frontend3.internal
192.168.21.13 frontend4.internal
192.168.21.14 frontend5.internal
192.168.21.100 backend1.internal
192.168.21.101 backend2.internal
192.168.21.102 backend3.internal
192.168.21.103 backend4.internal
192.168.21.104 backend5.internal
Monitoring
Now, everytime an instance starts with its corresponding ID it logs in through the VPN and the VPN server assigns the IP according to its role, this is done using the client-connect setting to launch a script:
/etc/openvpn/pool
#!/bin/bash
FILE=$1
if [ `grep $common_name.internal /etc/hosts | awk '{print $1;}' | wc -l` = "0" ]; then
echo "Unable to find IP of $common_name" >&2
exit 1
fi
ip=`grep $common_name.internal /etc/hosts | awk '{print $1;}'`
echo "ifconfig-push $ip 255.255.255.0" > $FILE
exit 0
That forces the VPN to assign the right IP address, and once the server has its IP it will start monitoring the server that was started, in its same role, right before. This is true for all but for the first instance of each role.
When a server detects its monitored server is not responding for some time it will assume the server its not responding and it will i) completely stop the instance, ii) take over the role of that server
#!/bin/bash
ID=`cat /tmp/id`
MONITOR_ID=$((ID-1))
ROLE=frontend
REQUIRED_FAILS=5
SLEEP_TIME=5
TIMEOUT_IN_SECONDS=4
missed_pings=0
while (true); do
if [ "$ID" -gt 1 ]; then
ping -c1 $ROLE$MONITOR_ID.internal -W $TIMEOUT_IN_SECONDS >/dev/null 2>&1
if [ "$?" = "0" ]; then
missed_pings=0
else
# Check that frontend1 is accessible
ping -c1 frontend1.internal -W $TIMEOUT_IN_SECONDS >/dev/null 2>&1
if [ "$?" = "0" ]; then
missed_pings=$((missed_pings+1))
fi
fi
if [ "$missed_pings" -gt $REQUIRED_FAILS ]; then
echo "Monitoring as $ID, detected that $ROLE$MONITOR_ID hasn't replied for $((SLEEP_TIME*REQUIRED_FAILS)) seconds, doing fail over"
missed_pings=0
# Destroy instance
echo "Would destroy instance $ROLE$MONITOR_ID"
# Set new ID
echo $MONITOR_ID > /tmp/id
ID=`cat /tmp/id`
MONITOR_ID=$((ID-1))
# Start OpenVPN
if [ "$ID" != "1" ]; then
# OpenVPN seems to have an issue when a client changes its credentials too quickly, work around it
while (true); do
/etc/init.d/openvpn stop
/etc/init.d/openvpn start-client
if [ `grep AUTH_FAILED /var/log/openvpn.log | wc -l` != "0" ]; then
rm -rf /var/log/openvpn.log
echo "Authentication failed, retrying"
sleep 2
else
echo "Client authenticated"
break
fi
done
else
/etc/init.d/openvpn stop
/etc/init.d/openvpn start-server
fi
# Reset monitor data
if [ "$ID" = "1" ]; then
echo "Monitor became $ROLE$ID, stopping monitor role"
exit 0
fi
fi
fi
sleep $SLEEP_TIME
done
Some features are still missing on that script, but it does most of what's mentioned, most noticeably the instance that is not responding will stay alive, which could cause problems, specially for the first servers of each role.
Next: Database high availability