β

redis 连接一直是 ESTABLISHED 的问题排查

nosa.me 408 阅读

昨天想删一台机器,发现上面还有 redis 连接:

# netstat -nat |grep ESTABLISHED |grep 6379
tcp 0 0 10.0.27.92:6379 10.0.27.157:24044 ESTABLISHED
tcp 0 0 10.0.27.92:6379 10.0.96.27:28975 ESTABLISHED
tcp 0 0 10.0.27.92:6379 10.0.69.47:58511 ESTABLISHED
tcp 0 0 10.0.27.92:6379 10.16.29.9:44571 ESTABLISHED
tcp 0 0 10.0.27.92:6379 10.0.29.49:48137 ESTABLISHED
tcp 0 0 10.0.27.92:6379 10.0.69.46:8854 ESTABLISHED
tcp 0 0 10.0.27.92:6379 10.0.70.67:42271 ESTABLISHED
tcp 0 0 10.0.27.92:6379 10.0.70.67:42269 ESTABLISHED
tcp 0 0 10.0.27.92:6379 10.0.24.30:17776 ESTABLISHED
tcp 0 0 10.0.27.92:6379 10.0.22.91:17823 ESTABLISHED
tcp 0 0 10.0.27.92:6379 10.0.23.79:59200 ESTABLISHED
tcp 0 0 10.0.27.92:6379 10.0.24.30:46296 ESTABLISHED
tcp 0 0 10.0.27.92:6379 10.0.23.98:31277 ESTABLISHED
tcp 0 0 10.0.27.92:6379 10.0.22.118:40458 ESTABLISHED
tcp 0 0 10.0.27.92:6379 10.16.29.9:44548 ESTABLISHED

这些连接一直都在,更奇怪的是,10.0.70.67 这个 IP 已经 ping 不通了,连接却不会断,正常情况下 tcp keepalive 会隔段时间检查一次,发现不通之后会发送 reset 。

为了看看到底有没有发送 tcp keepalive ack 包,抓个包看看,命令是 tcpdump -i em2 port 6379,下面是抓了一夜的包: redis.cpap

用 wireshark 分析,发现基本都是 keepalive ack 的包,取 10.0.96.27 这个IP,截个图看看:

4888351E-2D1D-4E6F-8062-F2DA259CF9C7

可以看到,10.0.96.27 主动 发送 ACK(SEQ: M、ACK: N)给 redis,redis 回复 ACK(SEQ: N、ACK:M+1),且 Len 都是0。

这能够解释大部分 IP 一直在 ESTABLISHED,因为一直有 tcp keepalive,但是 10.0.70.67 解释不通了,而且上面根本没抓到 10.0.70.67 的包,这只有一种可能: redis 不主动发送 keepalive。

找了下文档,发现 redis 确实默认关闭 tcp keepalive,所以对于已经建立的连接,不会发送 tcp keepalive ack 来确认对方存活,而如果对方突然死机或者关电源导致对方不主动关闭连接,那么 redis 就一直认为对方是活的,就不会去关闭连接了。

redis 提供了配置文件来更改这一默认行为:

# TCP keepalive.
#
# If non-zero, use SO_KEEPALIVE to send TCP ACKs to clients in absence
# of communication. This is useful for two reasons:
#
# 1) Detect dead peers.
# 2) Take the connection alive from the point of view of network
# equipment in the middle.
#
# On Linux, the specified value (in seconds) is the period used to send ACKs.
# Note that to close the connection the double of the time is needed.
# On other kernels the period depends on the kernel configuration.
#
# A reasonable value for this option is 60 seconds.
tcp-keepalive 0

事实上,如果 redis 对 idle 有时间限制,我遇到的情况也不会存在,但是 redis 也确实默认对 idle 的连接不加时间限制, 只是提供了 timeout 参数来更改。

Related posts:

  1. tornado 实现把 session 存储到 redis
  2. Nginx 接收到 method 为 T 的问题排查
  3. 关于 Nginx upstream keepalive 的说明
作者:nosa.me
未来不会有sa
原文地址:redis 连接一直是 ESTABLISHED 的问题排查, 感谢原作者分享。

发表评论