Details
-
Task
-
Resolution: Duplicate
-
Major
-
None
-
6.6.1
-
None
-
1
Description
https://issues.couchbase.com/browse/CBSE-12381 had an issue where memcached died and it was not started for 12 hours.
The summary is that the restart of the memcached is dependent on ns_ports_setup and memcached_config_mgr processes which sit under the ns_server_sup hierarchy.
The periodic restarts of ns_server_sup tree because of max intensity(triggered by another issue in memcached) prevented memcached from being started due to this dependency.
1. http://src.couchbase.org/source/xref/6.6.1/ns_server/deps/ns_babysitter/src/ns_port_server.erl#102 Generally other processes start via the port_open() call in init() of ns_port_server. ns_port_server sits under the babysitter->ns_child_ports_sup hierarchy so is unaffected by any ns_server_sup chaos.
2. Memcached is configured with the "port_server_dont_start" option so does not get started in the init() of ns_port_server as other processes get started.
3. Instead memcached is started here: http://src.couchbase.org/source/xref/6.6.1/ns_server/deps/ns_babysitter/src/ns_port_server.erl#165 when ns_port_server process gets an "activate."
4. The active is called from init() of memcached_config_mgr. http://src.couchbase.org/source/xref/6.6.1/ns_server/src/memcached_config_mgr.erl#85
5. In addition it won't even get to that until ns_ports_setup:sync() completes earlier in that init(): http://src.couchbase.org/source/xref/6.6.1/ns_server/src/memcached_config_mgr.erl#50
So 4 and 5 are dependencies under the ns_server_sup tree so chaos/process restarts there will prevent the restart of memcached.
log shows it as well...this is from when memcached hasn’t been started despite it being crashed for a while
[ns_server:debug,2022-08-02T13:24:02.733-04:00,ns_1@gpdclxcbsmi115.gp.ocean.com:memcached_config_mgr<0.13214.108>:memcached_config_mgr:init:49]waiting for completion of initial ns_ports_setup round
|
[error_logger:error,2022-08-02T13:24:02.734-04:00,ns_1@gpdclxcbsmi115.gp.ocean.com:error_logger<0.32.0>:ale_error_logger_handler:do_log:203]
|
=========================CRASH REPORT=========================
|
crasher:
|
initial call: ns_ports_setup:setup_body_tramp/0
|
pid: <0.17872.109>
|
registered_name: ns_ports_setup
|
So we never see the “ns_ports_setup seems to be ready” message because ns_ports_setup itself is crashing. So ns_port_server will never get an “active” for a memcached restart in that case.
In the case when it does work and memcached is restarted:
[ns_server:debug,2022-08-02T13:54:30.459-04:00,ns_1@gpdclxcbsmi115.gp.ocean.com:memcached_config_mgr<0.699.0>:memcached_config_mgr:init:49]waiting for completion of initial ns_ports_setup round
|
[ns_server:debug,2022-08-02T13:54:30.485-04:00,ns_1@gpdclxcbsmi115.gp.ocean.com:memcached_config_mgr<0.699.0>:memcached_config_mgr:init:51]ns_ports_setup seems to be ready
|
And that lines up with when memcached came back.
The ns_server_sup->(ns_port_server/memcached_config_mgr) dependencies preventing the restart of memcached in this case needs more analysis.
Attachments
Issue Links
- duplicates
-
MB-47298 ns_server alert subsystem resets hostname resolution/lookup options
- Closed