This article is meant to provide an overview of TCP tuning.
It is important to understand that there is no single set of optimal TCP parameters. The optimal tuning will depend on your specific operating system, application(s), network setup, and traffic patterns.
The content presented here is a guide to common parameters that can be tuned, and how to check common TCP problems.
It is recommended that you consult the documentation for your specific operating system and applications for guidance on recommended TCP settings. It is also highly recommended that you test any changes thoroughly before implementing them on a production system.
TCP auto tuning
Depending on your specific operating system/version and configuration, your network settings may be autotuned.
To check if autotune is enabled on many linux based systems:
cat /proc/sys/net/ipv4/tcp_moderate_rcvbuf
or
sysctl –a | grep tcp_moderate_rcvbuf
If tcp_moderate_rcvbuf is set to 1, autotuning is active and buffer size is adjusted dynamically.
While TCP autotuning provides adequate performance in some applications, there are times where manual tuning will yield a performance increase.
Common TCP Parameters
This table shows some commonly tuned linux TCP parameters and what they are for. You can look up the equivalent parameter names for other operating systems.
net.core.rmem_default | Default memory size of receive(rx) buffers used by sockets for all protocols. Value is in bytes. |
net.core.rmem_max | Maximum memory size of receive(rx) buffers used by sockets for all protocols. Value is in bytes. |
net.core.wmem_default | Default memory size of transmit(tx) buffers used by sockets. Value is in bytes. |
net.core.wmem_max | Maximum memory size of transmit(tx) buffers used by sockets. Value is in bytes. |
net.ipv4.tcp_rmem | TCP specific setting for receive buffer sizes. This is a vector of 3 integers: [min, default, max]. The max value can’t be larger than the equivalent net.core.{r,w}mem_max. Values are in bytes. |
net.ipv4.tcp_wmem | TCP specific setting for transmit buffer sizes. This is a vector of 3 integers: [min, default, max]. The max value can’t be larger than the equivalent net.core.{r,w}mem_max. Value are in bytes. |
net.core.netdev_max_backlog | Incoming connections backlog queue is the number of packets queued when the interface receives packets faster than kernel can process them. Once this number is exceeded, the kernel will start to drop the packets. |
File limits | While not directly TCP, this is important to TCP functioning correctly. ulimit on linux will show you the limits imposed on the current user and system. You must have enough hard and soft limits for the number of TCP sockets your system will open. These can be set here:/etc/security/limits.d/soft nofile XXXXX
hard nofile XXXXX |
The running value of these parameters can be checked on most linux based operating systems using sysctl.
To see all of your currently configured parameters, use:
sysctl –a
If you want to search for a specific parameter or set of parameters, you can use grep. Example:
sysctl –a | grep rmem
The values you set for these depend on your specific usage and traffic patterns. Larger buffers don’t necessarily equate to more speed. If the buffers are too small, you’ll likely see overflow as applications can’t service received data quickly enough. If buffers are too large, you’re placing an unnecessary burden on the kernel to find and allocate memory which can lead to packet loss.
Key factors the will impact your buffer needs are the speed of your network (100mb, 1gb, 10gb), and your round trip time (RTT).
RTT is the measure of time it takes a packet to travel from the host, to a destination, and back to the host again. A common tool to measure RTT is ping.
It is important to note that just because a server has a 10gb network interface, that does not mean it will receive a maximum of 10gb traffic. The entire infrastructure will determine the maximum bandwidth of your network.
A common way to calculate your buffer needs is as follows:
Bandwidth-in-bits-per-second * Round-trip-latency-in-seconds = TCP window size in bits / 8 = TCP window size in bytes
Example, using 50ms as our RTT:
NIC speed is 1000Mbit(1GBit), which equals 1,000,000,000 bits.
RTT is 50ms, which equals .05 seconds.
Bandwidth delay product(BDP) in bits – 1,000,000,000 * .05 = 50,000,000
Convert BDP to bytes – 50,000,000/8 = 6,250,000bytes, or 6.25mb
Many products/network appliances state to double, or even triple your BDP value to determine your maximum buffer size.
Table with sample buffer sizes based on NIC speed:
NIC Speed (Mbit) | RTT(ms) | NIC bits | BDP(bytes) | BDP(mb) | net.core.rmem_max | net.ipv4.tcp_rmem |
100 | 100 | 100000000 | 1250000 | 1.25 | 2097152 | 4096 65536 2097152 |
1000 | 100 | 1000000000 | 12500000 | 12.5 | 16777216 | 4096 1048576 16777216 |
10000 | 100 | 10000000000 | 125000000 | 125 | 134217728 | 4096 1048576 33554432 |
Notice in the 10gb NIC the net.core.rmem.max value is great than the net.ipv4.rcp.rmem max value. This is an example of splitting the size for multiple data streams. Depending on what your server is being used for, you may have several streams running at one time. For example, a multistream FTP client can establish several streams for a single file transfer.
Note that for net.ipv4.tcp_{r,w}mem, the max value can’t be larger than the equivalent net.core.{r,w}mem_max.
net.core.netdev_max_backlog should be set based on your system load and traffic patterns. Some common values used are 32768 or 65536.
Jumbo frames
For Ethernet networks, enabling jumbo frames (Maximum Transmission Unit (MTU)) on all systems (hosts and switches) can provide a significant performance improvement, especially when the application uses large payload sizes. Enabling jumbo frames on some hosts in the configuration and not others can cause bottlenecks. It is best to enable jumbo frames on all hosts in the configuration or none of the hosts in the configuration.
The default 802.3 Ethernet frame size is 1518 bytes. The Ethernet header consumes 18 bytes of this, leaving an effective maximum payload of 1500 bytes. Jumbo Frames increasing the payload from 1500 to 9000 bytes. Ethernet frames use a fixed-size header. The header contains no user data, and is overhead. Transmitting a larger frame is more efficient, because the overhead-to-data ratio is improved.
Setting TCP parameters
The following is a list of methods for setting TCP parameters on various operating systems. This is not an all-inclusive list, consult your operating system documentation for more details.
If you make changes to any kernel parameters, it is strongly recommended that you test these changes before making changes in a production environment.
It is also suggested that you consult product documentation for recommended settings for specific products. Many products will provide minimum required settings and tuning guidance to achieve optimal performance for their product.
Windows
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\ParametersMaxUserPort = dword:0000fffe
Solaris
ndd -set /dev/tcp tcp_max_buf 4194304
AIX
/usr/sbin/no -o tcp_sendspace=4194304
Linux
sysctl -w net.ipv4.tcp_rmem=”4096 87380 8388608″
HP-UX
ndd -set /dev/tcp tcp_ip_abort_cinterval 20000
Common TCP parameters by operating system
The following is a list of commonly tuned parameters for various operating systems. Consulting the documentation for your specific operating system and/or product for more details on what parameters are available, recommended settings, and how to change their values.
Solaris
- tcp_time_wait_interval
- tcp_keepalive_interval
- tcp_fin_wait_2_flush_interval
- tcp_conn_req_max_q
- tcp_conn_req_max_q0
- tcp_xmit_hiwat
- tcp_recv_hiwat
- tcp_cwnd_max
- tcp_ip_abort_interval
- tcp_rexmit_interval_initial
- tcp_rexmit_interval_max
- tcp_rexmit_interval_min
- tcp_max_buf
AIX
- tcp_sendspace
- tcp_recvspace
- udp_sendspace
- udp_recvspace
- somaxconn
- tcp_nodelayack
- tcp_keepinit
- tcp_keepintvl
Linux
- net.ipv4.tcp_timestamps
- net.ipv4.tcp_tw_reuse
- net.ipv4.tcp_tw_recycle
- net.ipv4.tcp_fin_timeout
- net.ipv4.tcp_keepalive_time
- net.ipv4.tcp_rmem
- net.ipv4.tcp_wmem
- net.ipv4.tcp_max_syn_backlog
- net.core.rmem_default
- net.core.rmem_max
- net.core.wmem_default
- net.core.wmem_max
- net.core.netdev_max_backlog
HP-UX
- tcp_conn_req_max
- tcp_xmit_hiwater_def
- tcp_ip_abort_interval
- tcp_rexmit_interval_initial
- tcp_keepalive_interval
- tcp_recv_hiwater_def
- tcp_recv_hiwater_max
- tcp_xmit_hiwater_def
- tcp_xmit_hiwater_max
Checking TCP performance
The following are some useful commands and statistics you can examine to help determine the performance of TCP on your system.
ifconfig
ifconfig –a, or ifconfig <specific_interface>
Sample output:
eth1 Link encap:Ethernet HWaddr 00:00:27:6F:64:F2
inet addr:192.168.56.102 Bcast:192.168.56.255 Mask:255.255.255.0
inet6 addr: fe80::a00:27ff:fe64:6af9/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5334443 errors:35566 dropped:0 overruns:0 frame:0
TX packets:23434553 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:15158 (14.8 KiB) TX bytes:5214 (5.0 KiB)
Examine the RX and TX packets lines of the output.
errors | Packets errors. Can be caused by numerous issues, such as transmission aborts, carrier errors, and window errors. |
dropped | How many packets were dropped and not processed. Possibly because of low memory. |
overruns | Overruns often occur when data comes in faster than the kernel can process it. |
frame | Frame errors, often caused by bad cable, bad hardware. |
collisions | Usually caused by network congestion. |
netstat –s
netstat –s will display statistics for various protocols.
Output will vary by operating system. In general, you are looking for anything related to packets being “dropped”, “pruned”, and “overrun”.
Below is sample TCPExt output.
Depending on your specific system, output for these values will only be displayed if it is non-zero.
XXXXXX packets pruned from receive queue because of socket buffer overrun | Receive buffer possibly too small |
XXXXXXpackets collapsedin receive queue due to low socket buffer | Receive buffer possibly too small |
XXXXXX packets directly received from backlog | Packets being placed in the backlog because they could not be processed fast enough. Check if you are dropping packets. Just because the backlog is being used does not necessarily mean something bad is happening. It depends on the volume of packets in the backlog, and whether or not they are being dropped. |
Further reading
The following additional reading provides the RFC for TCP extensions, as well as recommended tuning for various applications.
RFC 1323
RFC 1323 defines TCP Extensions for High Performance
https://www.ietf.org/rfc/rfc1323.txt
Oracle Databse 12c
https://docs.oracle.com/database/121/LTDQI/toc.htm#BHCCADGD
Oracle Coherence 12.1.2
https://docs.oracle.com/middleware/1212/coherence/COHAG/tune_perftune.htm#COHAG219
JBoss 5 clustering
Websphere on System z
Tuning for Web Serving on the Red Hat Enterprise Linux 6.4 KVM Hypervisor
ftp://public.dhe.ibm.com/linux/pdfs/Tuning_for_Web_Serving_on_RHEL_64_KVM.pdf
Oracle Glassfish server 3.1.2
https://docs.oracle.com/cd/E26576_01/doc.312/e24936/tuning-os.htm#GSPTG00007
Solaris 11 tunable parameters
https://docs.oracle.com/cd/E26502_01/html/E29022/appendixa-28.html
AIX 7 TCP tuning
http://www.ibm.com/developerworks/aix/library/au-aix7networkoptimize3/
Redhat 6 Tuning
All site content is the property of Oracle Corp. Redistribution not allowed without written permission