Friday, September 23, 2011

Building A Highly Available Linux Cluster Using Wackamole

Wackamole is an application which manages a bunch of IPs which should be accessible from outside all the time. Given a set of machines and a IPs, wackamole will ensure that if any machine goes down, other machine will take up its IP almost instantly and outside world will see no impact. It tries to balance the number of IPs across the number of machines available. Wackamole uses Spread network messaging system.

Let us start configuring it. I am using two virtual machines running Centos 5. The concept can be extended to as many machines as you want.

Step 1: Install the Wackamole on both the machines. It'll pull spread as a dependency. Sadly it is not yet available for Enterprise Linux or Fedora (I'll upload the package I built and share the link) or you can try it out on debian where wackamole is available in the repo. Centos, Red Hat, Fedora users need to do a manual install for wackamole and spread.

Step 2: Two machines will be having eth0 (or whatever default interface) with two IPs, just to be clear one ip on one box. Let us call them primary IP. Also we'll need a pool of IPs which can be distributed. We'll configure them on virtual interfaces. For this tutorial, I am taking 172.16.31.10 and 172.16.31.11 as primary IPs for the eth0 of the two machines and 172.16.31.20 and 172.16.31.21 as the virtual pool. You can take as many IPs as you want for virtual pool but make sure that they belong to you.

Now pick you favorite machine out of two and do the following steps on that machine only. 


Step 3: Configure the entire pool as virtual interfaces. That means eth0:1 will be mapped to 172.16.31.20 and eth0:2 will be mapped to 172.16.31.20. This can be done by putting a file similar to /etc/sysconfig/network-scripts/ifcfg-eth0 (or you can wait for the next blog post).

Step 4: Let us configure spread. Check out the contents of my /etc/spread.conf file below first.

Spread_Segment  172.16.31.255:4805 {
        centos1 172.16.31.10
        centos2 172.16.31.11
}
EventLogFile = /var/log/spread.log


Here the first line is defining the spread_Segment which is basically the broadcast address of the network along with the port. Don't forget to configure your firewall for white listing that port. Next two lines specify the hostname of the machines along with the primary IP address and the last line will give the location of the log file. There are a lot of other things you can configure, check out the documentation for that.
Start spread by firing spread -n testvm -c /etc/spread.conf. If you use service spread restart and it fails then you might need to tweak your init script at /etc/init.d/spread.
Copy this config file to the other machine also and start spread.

Step 5: Now configure Wackamole. See the config file below:

Spread = 4805 #Spread port
SpreadRetryInterval = 5s #How often to try to connect to spread, if it fails
Group = wack1 #Cluster group
Control = /var/run/wack.it #Name of the socket
Prefer None #You treat all the IPs as equal

VirtualInterfaces {
#IPs from the virtual pool. Can be as many as you want.
        { eth0:172.16.31.20/24 }
        { eth0:172.16.31.21/24 }
}
Arp-Cache = 20s

Notify {
# Notify to this Broadcast address but a not more than 8 times.         eth0:172.16.31.1/32 throttle 8
        arp-cache
}


balance {
        AcquisitionsPerRound = all
        interval = 4s
}
mature = 5s


I have commented the config file for understanding. It is straight forward so there should not be a lot of doubts.

Start Wackamole by firing wackamole -d -c /etc/wackamole.conf. Again, you might have to tweak init file if you use service wackamole start.

Copy this config file to the other machine also and start Wackamole.


That is it. Wackamole is configured. You'll see that one of your virtual interfaces is actually shifted to the other machine. That is wackamole trying to balance your cluster. You can add more machines and IPs to this config.

Since I built the rpms first and then did all the configs, my config files were in /etc/. If you have used source, your config files will be at a different location. Just use the correct config location while firing the command or tweaking the init script.