Monday, November 5, 2012

Testing Network And TCP Optimizations

This post is more like a "note to self" for certain TCP parameters which I usually modify (or plan to modify) on production servers.

Some good to know terms:
  • Round Trip Time (RTT): It is the time taken by a packet from source machine to reach destination and come back. You can use ICMP ping to get the RTT.
  • Latency: The time from the source sending a packet to the destination receiving it. This is often mixed with RTT. Clarify what you are talking about before interpreting anything.
  • Bandwidth Delay Product (BDP): It is the amount of data that can be in transit in the network or simply the product of link bandwidth and RTT.
Say you want to test your app or benchmark hardware you just bought then first thing you need to do is to add it into the network, even local network will do. Please avoid wireless network because RTT vary a lot for a wireless network and it becomes difficult to see of hardware is at fault or the wireless.

Adding Latency Or RTT Delay To The Network
If you are serious about testing hardware then you may need to test at various RTT/latency values to evaluate the experience of your customers from various locations across the world. To introduce this RTT delay you can use Network Emulator or simply netem and fire the following command:

tc qdisc add dev eth0 root netem delay 100ms

The command above will introduce a RTT delay of 100ms on eth0 interface. Now you can play around with it to check various values of RTT delays. When you are done, remove the delay by deleting the rule.

tc qdisc del dev eth0 root

A awesome tutorial of netem can be found at LinuxFoundation.org. Netem mailing list archives might help in debugging in several cases.

Server Setup For Testing
If you plan to test the hardware then I suggest running a simple and no-frills http server on your hardware like python single threaded server. Using scp for testing is not a good idea since openssh itself had some app level congestion controlling mechanisms. To run python single thread server, fire the following command on your terminal:

cd <doc_root_of_http_server>
python -m SimpleHTTPServer


Make sure a large file is present in the document root of the server and that curl or wget is present on the client. Do not use any browser or download manager to download from server.
Of course if you are testing your app then above thing might not be applicable to you. In that case setup the server and client depending upon your app.

Testing and Recording The Defaults
Recording defaults is important in case you need to revert anything. A full backup can be obtained easily by sysctl command:
sysctl -A > sysctl.bak

Now download the file without introducing any latency from the server. This is the default performance at 0ms added latency. Now let us start the serious testing and introduce latency. Add a RTT of 100ms and download the file using curl or wget and see the speed.

Various TCP Optimizations and Parameters To Check
I just found out during this experiment that new kernels have great settings for TCP, still cross checking won't hurt.

First and foremost get acquainted with /proc of you machine, specifically /proc/sys/net/ directory. I would also encourage you to do through the man page of tcp and understand the parameters.

The changes I am going to suggest depend heavily on kernel version and the distribution. If not done correctly, these changes can degrade networking performance or may harm your machine in any other way. You have been warned. You are on your own.
  • First of all we'll examine if TCP selective ack is turned on or not and turn it on if it is off. It is boolean so just set the right value to 1 and you are good to go:
    sysctl -w net.ipv4.tcp_sack=1
  • We need to make sure that TCP window can scale to utilize maximum buffer possible:
    sysctl -w net.ipv4.tcp_window_scaling=1
  • Fix the read and write buffers for tcp to an optimum value. It is an array of 3 values which defines minimum, default and maximum values of memory that can be utilized. Also note that this overwrites the values defined for generic (non-tcp) connections in the following files:

    /proc/sys/net/core/rmem_max
    /proc/sys/net/core/wmem_max
    /proc/sys/net/core/rmem_default
    /proc/sys/net/core/wmem_default


    Setting this is usually heuristic and depends largely on your network. Also with auto scaling on, it can scale up to the maximum value defined. Set it up by using the following command:
    sysctl -w net.ipv4.tcp_rmem='4096 87380 4194304'
    sysctl -w net.ipv4.tcp_wmem='4096 16384 4194304'

    Here default memory allocated to receive buffer for each TCP connection would be 87380 bytes  and can scale up to 4194304 depending upon the connection. I suggest that you experiment with the values a bit to find the most optimum combination.
    If you are doing non-tcp optimizations as well then set net.core.rmem_max, net.core.wmem_max, net.core.rmem_default, net.core.wmem_default as well to similar values.
  • Enable the TCP time_wait reuse. This would allow the reuse of connections that are in time_wait state. This generally increases performance if your machine is going to make a lot of short lived connections.
    sysctl -w net.ipv4.tcp_tw_reuse=1
  • Maximum number of concurrent connections can sometimes play a role in servers handling high traffic. This can be determined by dividing the difference of values in the file /proc/sys/net/ipv4/ip_local_port_range by the value in /proc/sys/net/ipv4/tcp_fin_timeout. For my system it is (61000-32768)/60 which turns out to be 470. You can increase the range of the ports or you can reduce the tcp_fin_timeout but experiment first before deploying in production.
There are a lot of other parameters that can be tweaked for higher performance. You can try all of them out but do not march straight into production servers with these tweaks. Experiment in your staging boxes first.