Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-63258

Memcached ignores "shutdown"

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Morpheus
    • Morpheus
    • memcached
    • None
    • Untriaged
    • 0
    • Unknown

    Description

      After I added --stdin arg (and writing of bootstrap deks) for memcached I noticed that our cluster tests started to hang intermittently.

      Example:
      https://cv.jenkins.couchbase.com/job/ns-server-cluster-tests/9554/console

      In those tests we see that node n12 is leaving the cluster at 04:30:15:

      [cluster:debug,2024-08-21T04:30:15.147Z,n_12@127.0.0.1:ns_cluster<0.277.0>:ns_cluster:leave_body:475]Leaving cluster
      

      One of the steps of that process is memcached restart:

      [ns_server:debug,2024-08-21T04:30:21.935Z,babysitter_of_n_12@cb.local:ns_ports_manager<0.141.0>:ns_ports_manager:handle_call:68]Restart of port memcached is requested
      

      Right after the restart memcached_config_mgr writes bootstrap deks to memcached.
      The string looks like the following:

      BOOTSTRAP_DEK=<BootstrapKeysJson>\nDONE\n
      

      That happens at 04:30:22.177:

      [ns_server:debug,2024-08-21T04:30:22.177Z,n_12@127.0.0.1:memcached_config_mgr<0.30212.0>:memcached_config_mgr:write_bootstrap_keys:140]0 bootstrap config keys were written to memcached's stdin (encryption is off)
      

      Since that was the last test for this cluster, shutdown is initiated for this node immediately:

      [ns_server:info,2024-08-21T04:30:22.185Z,babysitter_of_n_12@cb.local:<0.9.0>:ns_babysitter_bootstrap:stop:34]50068: got shutdown request. Terminating.
      

      Which triggers shutdown for memcached:

      [ns_server:debug,2024-08-21T04:30:22.190Z,babysitter_of_n_12@cb.local:<0.1436.0>:ns_port_server:terminate:207]Shutting down port memcached
      [ns_server:debug,2024-08-21T04:30:22.190Z,babysitter_of_n_12@cb.local:<0.1436.0>:ns_port_server:port_shutdown:357]Shutdown command: "shutdown"
      

      After that ns_server waits for memcached to stop for >4 hours until it stops finally because jenkins job terminates everything by timeout:

      [ns_server:info,2024-08-21T08:45:54.138Z,babysitter_of_n_12@cb.local:<0.1436.0>:ns_port_server:handle_info:152]Got {exit_status,0} from port memcached. Exiting normally
      

      So it looks like the shutdown command was ignored by memcached.
      In the beginning I thought ns_server was sending shutdown before writing BootstrapKeys (which indeed can happen in current implementation), but according to logs that's not the case.

      Note: changes that add --stdin and bootstrap deks are not merged to master yet. Here is the gerrit change where the problem happened: https://review.couchbase.org/c/ns_server/+/214623

      Attachments

        Activity

          People

            timofey.barmin Timofey Barmin
            timofey.barmin Timofey Barmin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty