Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-53339

Failure to restart memcached

    XMLWordPrintable

Details

    • Task
    • Resolution: Duplicate
    • Major
    • None
    • 6.6.1
    • ns_server
    • None
    • 1

    Description

      https://issues.couchbase.com/browse/CBSE-12381 had an issue where memcached died and it was not started for 12 hours. 

       
      The summary is that the restart of the memcached is dependent on ns_ports_setup and memcached_config_mgr processes which sit under the ns_server_sup hierarchy.
      The periodic restarts of ns_server_sup tree because of max intensity(triggered by another issue in memcached) prevented memcached from being started due to this dependency.

       
      1. http://src.couchbase.org/source/xref/6.6.1/ns_server/deps/ns_babysitter/src/ns_port_server.erl#102 Generally other processes start via the port_open() call in init() of ns_port_server. ns_port_server sits under the babysitter->ns_child_ports_sup hierarchy so is unaffected by any ns_server_sup chaos.
      2. Memcached is configured with the "port_server_dont_start" option so does not get started in the init() of ns_port_server as other processes get started.
      3. Instead memcached is started here: http://src.couchbase.org/source/xref/6.6.1/ns_server/deps/ns_babysitter/src/ns_port_server.erl#165 when ns_port_server process gets an "activate."
      4. The active is called from init() of memcached_config_mgr. http://src.couchbase.org/source/xref/6.6.1/ns_server/src/memcached_config_mgr.erl#85
      5. In addition it won't even get to that until ns_ports_setup:sync() completes earlier in that init(): http://src.couchbase.org/source/xref/6.6.1/ns_server/src/memcached_config_mgr.erl#50

      So 4 and 5 are dependencies under the ns_server_sup tree so chaos/process restarts there will prevent the restart of memcached.  

      log shows it as well...this is from when memcached hasn’t been started despite it being crashed for a while

      [ns_server:debug,2022-08-02T13:24:02.733-04:00,ns_1@gpdclxcbsmi115.gp.ocean.com:memcached_config_mgr<0.13214.108>:memcached_config_mgr:init:49]waiting for completion of initial ns_ports_setup round
      [error_logger:error,2022-08-02T13:24:02.734-04:00,ns_1@gpdclxcbsmi115.gp.ocean.com:error_logger<0.32.0>:ale_error_logger_handler:do_log:203]
      =========================CRASH REPORT=========================
        crasher:
          initial call: ns_ports_setup:setup_body_tramp/0
          pid: <0.17872.109>
          registered_name: ns_ports_setup
      

      So we never see the “ns_ports_setup seems to be ready” message because ns_ports_setup itself is crashing. So ns_port_server will never get an “active” for a memcached restart in that case.

      In the case when it does work and memcached is restarted:

      [ns_server:debug,2022-08-02T13:54:30.459-04:00,ns_1@gpdclxcbsmi115.gp.ocean.com:memcached_config_mgr<0.699.0>:memcached_config_mgr:init:49]waiting for completion of initial ns_ports_setup round
      [ns_server:debug,2022-08-02T13:54:30.485-04:00,ns_1@gpdclxcbsmi115.gp.ocean.com:memcached_config_mgr<0.699.0>:memcached_config_mgr:init:51]ns_ports_setup seems to be ready
      

      And that lines up with when memcached came back.

       

      The ns_server_sup->(ns_port_server/memcached_config_mgr) dependencies preventing the restart of memcached in this case needs more analysis.  

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Abhijeeth.Nuthan Abhijeeth Nuthan
              navdeep.boparai Navdeep Boparai
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty