Tuesday, June 18, 2013

Deploying Big Using BitTorrent [Sharing Files Using BitTorrent]

If you just want to share some files without concern of privacy, please check out this short tutorial on bittorrent.com. This article will talk a bit about BitTorrent's basic internals and it's usage to do large code/application deploys.

Scenario: I have to do deploy some application(s) across many co-located data centers. The collective size of deploy will be of the order of tens of GB.

Conventional methods like scp, rsync and http fails:
  • scp will not resume if it breaks at any point. Every time I will have to start over and over again.
  • rsync works well with text files, not so well with binaries (it works nonetheless). The amount of CPU it  eats is unacceptable though.
  • http can resume most of the times but as more servers try to download the application, the bandwidth limitations slow down the entire process.
Enter BitTorrent! 
  • Resumes the download every time. No problem if the connection breaks.
  • Does not eats my CPU.
  • As more servers download, they can act as seeder and actually increase collective bandwidth.
Now let us start the technical details. For torrent to work, you will need to create a torrent file (also known as a metafile). You'll also need a tracker. Tracker keeps track of what all leechers and seeders (collectively known as peers) are there and help in general coordination by announcing the available peers periodically. Finally you will need a torrent client which can seed the files you are going to share. 
Now the problem is that BitTorrent is no longer open sourced. So either you have to get license from BitTorrent, Inc. which can be very costly (I am not sure) or you can use the older code which was once open source and still works like a charm.

For Centos/Red Hat/Scientific Linux, you should try NauLinux School repo:
# vim /etc/yum.repos.d/naulinux-school.repo:
[naulinux-school]
name=NauLinux School
baseurl=http://downloads.naulinux.ru/pub/NauLinux/6.2/$basearch/sites/School/RPMS/
enabled=0
gpgcheck=1
gpgkey=http://downloads.naulinux.ru/pub/NauLinux/RPM-GPG-KEY-linux-ink

Install bittorrent rpm package:
# yum --enablerepo=naulinux-school install bittorrent

For Fedora, you can try downloading the rpm from their build system koji and manually install it.
# yum localinstall ./bittorrent-4.4.0-16.fc15.noarch.rpm

Also install mktorrent which will be used to create torrent meta files.
# yum install mktorrent

Creating a torrent tracker
As I have mentioned before, tracker is a critical piece of the bittorrent setup. It helps in co-ordinating between the peers and maintains a list of the same. It also keeps a record of all the seeds along with the checksum of the torrent. Needless to say that without a torrent tracker, entire bittorrent setup will fail.
You can setup a tracker for yourself easily. Just run the following command on CentOS:
$ bittorrent-tracker  --port 8080 --dfile dstate --logfile tracker.log

For Fedora, you can use the bttrack command after installing the bittorent package:
$ bttrack --port 8080 --dfile dstate --logfile tracker.log

Alternatively, you can use one of the public tracker like OpenBitTorrent. This may save you sometime.

Creating a torrent metafile
Once we have the tracker up, we need to create the actual torrent file to distribute. A torrent file contains bencoded data about the files and the announce URL of the tracker along with some other information.
Creating torrent using mktorrent easy but if you prefer GUI, you can use transmission or any other bittorrent client.
$ mktorrent -a http://tracker.example.com:8080/announce -l 18 -v /path/to/the/app

Here -a specifies the tracker's announce url which we created before. -l flag specifies the size of each chunk of file which will be transferred at a time and -v flag is for verbosity.

Once the torrent metafile is created, you need to seed the torrent so that other peers can download it. I like to use rtorrent for this:
# yum install rtorrent
$ rtorrent <path to the torrent metafile>


Here is an easy-to-follow tutorial, if you are more interested in rtorrent.

Tips for peaceful life
There are certain parameters that can be tweaked for better performance. While making the torrent try adjusting the -l flag to a higher value if you have really good bandwidth. Since my deployment was for a bunch of data centers which have really good bandwidth, I usually set it up to 20.

If you do the deploys without taking out the machines from production, it is possible to limit the bandwidth usage of torrent client. This comes really handy and helps in avoiding the clogging of network pipes. Check out the tutorials and docs of your torrent client to know about these controls.

Before initiating the transfer, always make sure that you inform the relevant data center technicians and network operations guys. I did not, the first time, and due to huge spike in network, the one of the data center ops thought that we are under some sort of DOS attack and cut off connectivity to all our servers resulting in minor service disruption.

Happy deploying!

Discuss this post on Hacker News.

2 comments:

  1. There should be a better way of running the tracker than using obsolete/unmaintained software.

    ReplyDelete
  2. I think Facebook started doing this to solve the problem of deploying their single binary generated by HipHop on thousands of servers.

    ReplyDelete