TCP KeepAlive basics
Points to note about TCP keep Alive and basic Linux system tools to configure them at kernel level
TCP is connection oriented protocol.. i.e. there is a connection establishment phase, a transfer of data phase and finally a connection termination phase.
Connection establishment is using the 3 way handshake
Now based on timeout of connections in the system.. the connection could get closed on either side. When this happens the side that timed-out will send a FIN signal to close the connection and this responded by FIN/ACK
Now there can be 2 cases where due to network going down there can be no reply to the incoming signal.ย
1) Dead peer - peer is down and request to him is not replied.. such case the sender knows peer is down only when sending a request to peer and peer does not reply, resulting in the connection time out. Also when sending request to peer who got restarted and lost the connection handshake he earlier had result in a reset request being replied to sender.
2) Network Partition: When connecting to systems through the NAT or through the firewall.. the in-memory network monitoring tools that hold the connections to the system has limitation of memory size and will evict connections it holds based on the LRU evict rule. Hence connections scoped with long timeout may get removed from NAT / Proxy in between and would require it to be resent.
KeepAlive is the concept of configuring TCP to send an extra signal that is empty and just expects and ACK from the receiver. This signal monitors if the peer server is active and stops avoiding wasteful sending of messages when the peer is dead.ย
Configuring this a kernel level linux requires using either of the following two tools
1) sysclt - cmd line tool to set/get kernel variables
2) procfs - seen as /proc directory. this directory holds network config and kernel variables as files in the subtree under /proc. Modifying the files is as good as setting the kernel variables as per requirement
Key variables that play a role in tcp keep-alive are:
1) tcp_keepalive_time - idle time after payload message was sent that keepAlive message would be sent, this should ideally be less than timeout.
2) tcp_keepalive_intvl - irrespective of data sent.. the keepAlive message will be sent after this interval
3) tcp_keepalive_probes - number of probes sent for keepAlive that returned with error. This threshold number is used to define if connection to the remote service is down or not
Note: Even after having these configured in the kernel, ย its at the app level we get to enable and disable keepAlive. enabling will apply these settings.













