Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.0
-
Triaged
-
0
-
Yes
Description
Starting within the last few days, Analytics is seeing test failures due to memcached getting stuck on graceful shutdown. The test expects that memcached will close the client connections on shutdown which it did previously, but now it does not. Even once the connection is detected as dead within Analytics and closed, memcached continues to log periodically it is waiting for the connections to be closed, and does not exit. I was forced to kill -9 memcached to get it to exit.
2024-02-14T11:41:40.487333-05:00 INFO Gracefully shutting down due to shutdown message or stdin closure
|
2024-02-14T11:41:40.550639-05:00 INFO Initiating graceful shutdown. sigint:false sigterm:false
|
2024-02-14T11:41:40.550643-05:00 INFO Stop all buckets
|
2024-02-14T11:41:40.550645-05:00 INFO Waiting for delete of gbook_users to complete
|
2024-02-14T11:41:40.563166-05:00 INFO <none>: Delete bucket [gbook_users]. Notifying engine
|
2024-02-14T11:41:40.563182-05:00 INFO (gbook_users) Shutting down all DCP connections in preparation for bucket deletion.
|
2024-02-14T11:41:40.563185-05:00 INFO (gbook_users) Shutting down dcp connections!
|
...
|
2024-02-14T11:41:40.563389-05:00 INFO 69: (No Engine) DCP (Producer) eq_dcpq:cbas:Local:gbook_users:88cd2fc68bb3fc3f7571f7c0f9f87bf4:1 - Removing connection [ {"ip":"192.168.38.72","port":59802} - {"ip":"192.168.38.72","port":11998} (System, @cbas) ]
|
2024-02-14T11:41:40.563547-05:00 INFO 68: (No Engine) DCP (Producer) eq_dcpq:cbas:Local:gbook_users:88cd2fc68bb3fc3f7571f7c0f9f87bf4:0 - Removing connection [ {"ip":"192.168.38.72","port":59801} - {"ip":"192.168.38.72","port":11998} (System, @cbas) ]
|
2024-02-14T11:41:40.563569-05:00 INFO <none>: Delete bucket [gbook_users]. Engine ready for shutdown
|
2024-02-14T11:41:40.563572-05:00 INFO <none>: Delete bucket [gbook_users]. Wait for 2 clients to disconnect
|
after several minutes, the connections are still active:
$ date; netstat -n | grep 11998
|
Wed Feb 14 11:48:24 EST 2024
|
tcp4 0 0 192.168.38.72.11998 192.168.38.72.59802 ESTABLISHED
|
tcp4 0 0 192.168.38.72.11998 192.168.38.72.59801 ESTABLISHED
|
Even once our DeadConnectionDetection closes the connections, memcached doesn't exit and continues to list the connections as open:
2024-02-14T11:49:35.875-05:00 INFO CBAS.events.DeadConnectionDetection [FixerThread:cbas:Local:gbook_users:88cd2fc68bb3fc3f7571f7c0f9f87bf4:1] Running dead connection detection
|
2024-02-14T11:49:35.875-05:00 INFO CBAS.conductor.DcpChannel [FixerThread:cbas:Local:gbook_users:88cd2fc68bb3fc3f7571f7c0f9f87bf4:1] Detected dead connection on {"class" : "DcpChannel", "inetAddress" : "/192.168.38.72:11998", "bucket" : "gbook_users", "state" : "CONNECTED"}
|
2024-02-14T11:49:35.875-05:00 INFO CBAS.conductor.DcpChannel [FixerThread:cbas:Local:gbook_users:88cd2fc68bb3fc3f7571f7c0f9f87bf4:1] {"class" : "DcpChannel", "inetAddress" : "/192.168.38.72:11998", "bucket" : "gbook_users", "state" : "CONNECTED"} is disconnecting
|
2024-02-14T11:49:35.876-05:00 INFO CBAS.events.DeadConnectionDetection [FixerThread:cbas:Local:gbook_users:88cd2fc68bb3fc3f7571f7c0f9f87bf4:0] Running dead connection detection
|
2024-02-14T11:49:35.876-05:00 INFO CBAS.conductor.DcpChannel [FixerThread:cbas:Local:gbook_users:88cd2fc68bb3fc3f7571f7c0f9f87bf4:0] Detected dead connection on {"class" : "DcpChannel", "inetAddress" : "/192.168.38.72:11998", "bucket" : "gbook_users", "state" : "CONNECTED"}
|
2024-02-14T11:49:35.876-05:00 INFO CBAS.conductor.DcpChannel [FixerThread:cbas:Local:gbook_users:88cd2fc68bb3fc3f7571f7c0f9f87bf4:0] {"class" : "DcpChannel", "inetAddress" : "/192.168.38.72:11998", "bucket" : "gbook_users", "state" : "CONNECTED"} is disconnecting
|
2024-02-14T11:49:35.878-05:00 DEBU CBAS.conductor.DcpChannelCloseListener [nioEventLoopGroup-3-3] DCP Connection dropped
|
2024-02-14T11:49:35.879-05:00 DEBU CBAS.conductor.DcpChannelCloseListener [nioEventLoopGroup-3-4] DCP Connection dropped
|
2024-02-14T11:54:40.616786-05:00 INFO <none>: Delete bucket [gbook_users].Still waiting: 2 clients connected: {"68":{"agent_name":"cbas:Local:gbook_users:88cd2fc68","bucket_index":1,"clustermap":{"epoch":-1,"revno":0},"connection":"0x0000000109f3c000","connection_id":"192.168.38.72:59801","cookies":[],"datatype":"json","dcp":true,"dcp_deleted_user_xattr":false,"dcp_no_value":false,"dcp_xattr_aware":false,"features":["xerror","tcp nodelay","collections"],"flow_ctrl_buffer_size":5242880,"internal":true,"last_used":"12 ms","max_reqs_per_event":5,"max_sched_time":"2000","min_sched_time":"0","nevents":5,"peername":{"ip":"192.168.38.72","port":59801},"priority":"Medium","protocol":"memcached","refcount":1,"sasl_enabled":false,"sendqueue":{"actual":0,"last":3781449419508583,"size":0,"term":false},"socket":68,"socket_options":{"so_keepalive":8,"so_linger":"off","so_rcvbuf":261312,"so_sndbuf":7456540,"tcp_keepcnt":3,"tcp_keepidle":360,"tcp_keepintvl":10},"sockname":{"ip":"192.168.38.72","port":11998},"ssl":true,"state":"pending close","thread":3,"total_cpu_time":"233033625","total_recv":5538,"total_send":18031174,"type":"producer","user":{"domain":"local","name":"@cbas"},"yields":0},"69":{"agent_name":"cbas:Local:gbook_users:88cd2fc68","bucket_index":1,"clustermap":{"epoch":-1,"revno":0},"connection":"0x0000000109fcc000","connection_id":"192.168.38.72:59802","cookies":[],"datatype":"json","dcp":true,"dcp_deleted_user_xattr":false,"dcp_no_value":false,"dcp_xattr_aware":false,"features":["xerror","tcp nodelay","collections"],"flow_ctrl_buffer_size":5242880,"internal":true,"last_used":"12 ms","max_reqs_per_event":5,"max_sched_time":"459","min_sched_time":"0","nevents":5,"peername":{"ip":"192.168.38.72","port":59802},"priority":"Medium","protocol":"memcached","refcount":1,"sasl_enabled":false,"sendqueue":{"actual":0,"last":3781449459330708,"size":0,"term":false},"socket":69,"socket_options":{"so_keepalive":8,"so_linger":"off","so_rcvbuf":261312,"so_sndbuf":7456540,"tcp_keepcnt":3,"tcp_keepidle":360,"tcp_keepintvl":10},"sockname":{"ip":"192.168.38.72","port":11998},"ssl":true,"state":"pending close","thread":4,"total_cpu_time":"199525350","total_recv":5538,"total_send":17986113,"type":"producer","user":{"domain":"local","name":"@cbas"},"yields":0}}
|
I have attached cbcollect_infos.