Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44951

[System Test] Service 'saslauthd_port' exited with status 143

    XMLWordPrintable

    Details

      Description

      Build : 7.0.0-4678
      Test : -test tests/integration/cheshirecat/test_cheshirecat_kv_gsi_coll_xdcr_backup_sgw_fts_itemct_txns_eventing_cbas.yml -scope tests/integration/cheshirecat/scope_cheshirecat_with_backup.yml
      Scale : 2
      Iteration : 2nd

      On 172.23.106.100, at 2021-03-14T16:46:05, saw that "Service 'saslauthd_port' exited with status 143".

      Seeing the following in the error.log on 172.23.106.100 at the same time -

      [root@localhost logs]# zgrep 2021-03-14T16:46:05 error.log
      [ns_server:error,2021-03-14T16:46:05.113-07:00,ns_1@172.23.106.100:capi_doc_replicator-bucket8<0.25679.694>:doc_replicator:loop:118]Got unexpected message: {replicated_storege_pid,<14833.20623.11>}
      [ns_server:error,2021-03-14T16:46:05.113-07:00,ns_1@172.23.106.100:capi_doc_replicator-default<0.14442.695>:doc_replicator:loop:118]Got unexpected message: {replicated_storege_pid,<14833.20625.11>}
      [ns_server:error,2021-03-14T16:46:05.114-07:00,ns_1@172.23.106.100:capi_doc_replicator-bucket7<0.11385.694>:doc_replicator:loop:118]Got unexpected message: {replicated_storege_pid,<14833.20624.11>}
      [ns_server:error,2021-03-14T16:46:05.114-07:00,ns_1@172.23.106.100:<0.2282.690>:ns_port_server:handle_info:157]Got unexpected exit signal from port: {'EXIT',<0.3368.690>,normal}. Exiting.
      [ns_server:error,2021-03-14T16:46:05.118-07:00,ns_1@172.23.106.100:capi_doc_replicator-bucket6<0.4923.694>:doc_replicator:loop:118]Got unexpected message: {replicated_storege_pid,<14833.20633.11>}
      [ns_server:error,2021-03-14T16:46:05.134-07:00,ns_1@172.23.106.100:capi_doc_replicator-bucket5<0.19454.693>:doc_replicator:loop:118]Got unexpected message: {replicated_storege_pid,<14833.20658.11>}
      [ns_server:error,2021-03-14T16:46:05.137-07:00,ns_1@172.23.106.100:capi_doc_replicator-NEW_ORDER<0.28144.692>:doc_replicator:loop:118]Got unexpected message: {replicated_storege_pid,<14833.20660.11>}
      [ns_server:error,2021-03-14T16:46:05.137-07:00,ns_1@172.23.106.100:capi_doc_replicator-ITEM<0.14746.692>:doc_replicator:loop:118]Got unexpected message: {replicated_storege_pid,<14833.20662.11>}
      [ns_server:error,2021-03-14T16:46:05.151-07:00,ns_1@172.23.106.100:capi_doc_replicator-bucket4<0.15539.693>:doc_replicator:loop:118]Got unexpected message: {replicated_storege_pid,<14833.20674.11>}
      [ns_server:error,2021-03-14T16:46:05.152-07:00,ns_1@172.23.106.100:capi_doc_replicator-WAREHOUSE<0.2536.693>:doc_replicator:loop:118]Got unexpected message: {replicated_storege_pid,<14833.20682.11>}
      [ns_server:error,2021-03-14T16:46:05.127-07:00,ns_1@172.23.106.100:prometheus_cfg<0.30701.689>:prometheus_cfg:handle_info:466]Received exit from Prometheus port server - <0.2282.690>: normal. Restarting Prometheus...
      [ns_server:error,2021-03-14T16:46:05.198-07:00,ns_1@172.23.106.100:capi_doc_replicator-bucket9<0.4796.695>:doc_replicator:loop:118]Got unexpected message: {replicated_storege_pid,<14833.20620.11>}
      [ns_server:error,2021-03-14T16:46:05.385-07:00,ns_1@172.23.106.100:ns_config<0.311.0>:ns_config:handle_info:845]Saving ns_config failed. Trying to ignore: {{badmatch,{error,enoent}},
      [ns_server:error,2021-03-14T16:46:05.535-07:00,ns_1@172.23.106.100:<0.28549.705>:prometheus:post_async:194]Prometheus http request failed:
      [ns_server:error,2021-03-14T16:46:05.572-07:00,ns_1@172.23.106.100:<0.26387.705>:prometheus:post_async:194]Prometheus http request failed:
      [ns_server:error,2021-03-14T16:46:05.585-07:00,ns_1@172.23.106.100:<0.29166.705>:prometheus:post_async:194]Prometheus http request failed:
      [ns_server:error,2021-03-14T16:46:05.711-07:00,ns_1@172.23.106.100:ns_config<0.311.0>:ns_config:handle_info:845]Saving ns_config failed. Trying to ignore: {{badmatch,{error,enoent}},
      [ns_server:error,2021-03-14T16:46:05.738-07:00,ns_1@172.23.106.100:<0.22842.705>:menelaus_util:reply_server_error:208]Server error during processing: ["web request failed",
      [ns_server:error,2021-03-14T16:46:05.738-07:00,ns_1@172.23.106.100:<0.13111.692>:menelaus_util:reply_server_error:208]Server error during processing: ["web request failed",
      [ns_server:error,2021-03-14T16:46:05.737-07:00,ns_1@172.23.106.100:<0.25960.705>:menelaus_util:reply_server_error:208]Server error during processing: ["web request failed",
      [ns_server:error,2021-03-14T16:46:05.769-07:00,ns_1@172.23.106.100:<0.19939.694>:dcp_proxy:handle_info:117]Socket #Port<0.334986> was closed. Closing myself. State = {state,
      [ns_server:error,2021-03-14T16:46:05.820-07:00,ns_1@172.23.106.100:<0.18473.694>:dcp_proxy:handle_info:117]Socket #Port<0.334967> was closed. Closing myself. State = {state,
      [ns_server:error,2021-03-14T16:46:05.881-07:00,ns_1@172.23.106.100:<0.20412.694>:dcp_proxy:handle_info:117]Socket #Port<0.335029> was closed. Closing myself. State = {state,
      [ns_server:error,2021-03-14T16:46:05.884-07:00,ns_1@172.23.106.100:<0.14910.694>:dcp_proxy:handle_info:117]Socket #Port<0.334768> was closed. Closing myself. State = {state,
      [ns_server:error,2021-03-14T16:46:05.891-07:00,ns_1@172.23.106.100:<0.17263.694>:dcp_proxy:handle_info:117]Socket #Port<0.334970> was closed. Closing myself. State = {state,
      [ns_server:error,2021-03-14T16:46:05.899-07:00,ns_1@172.23.106.100:<0.22883.694>:dcp_proxy:handle_info:117]Socket #Port<0.335060> was closed. Closing myself. State = {state,
      [ns_server:error,2021-03-14T16:46:05.901-07:00,ns_1@172.23.106.100:<0.19449.694>:dcp_proxy:handle_info:117]Socket #Port<0.334972> was closed. Closing myself. State = {state,
      [ns_server:error,2021-03-14T16:46:05.907-07:00,ns_1@172.23.106.100:<0.16530.694>:dcp_proxy:handle_info:117]Socket #Port<0.334982> was closed. Closing myself. State = {state,
      [ns_server:error,2021-03-14T16:46:05.913-07:00,ns_1@172.23.106.100:<0.7790.695>:dcp_proxy:handle_info:117]Socket #Port<0.335848> was closed. Closing myself. State = {state,
      [ns_server:error,2021-03-14T16:46:05.936-07:00,ns_1@172.23.106.100:<0.8417.695>:dcp_proxy:handle_info:117]Socket #Port<0.335837> was closed. Closing myself. State = {state,
      [ns_server:error,2021-03-14T16:46:05.937-07:00,ns_1@172.23.106.100:<0.13193.692>:dcp_proxy:handle_info:117]Socket #Port<0.327204> was closed. Closing myself. State = {state,
      [ns_server:error,2021-03-14T16:46:05.940-07:00,ns_1@172.23.106.100:<0.18394.692>:dcp_proxy:handle_info:117]Socket #Port<0.327157> was closed. Closing myself. State = {state,
      [ns_server:error,2021-03-14T16:46:05.941-07:00,ns_1@172.23.106.100:<0.6234.695>:dcp_proxy:handle_info:117]Socket #Port<0.335834> was closed. Closing myself. State = {state,
      [ns_server:error,2021-03-14T16:46:05.942-07:00,ns_1@172.23.106.100:<0.9547.695>:dcp_proxy:handle_info:117]Socket #Port<0.335845> was closed. Closing myself. State = {state,

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          dfinlay Dave Finlay added a comment -

          Mihir Kamdar: we're going to need the logs from .100 for this one.

          Show
          dfinlay Dave Finlay added a comment - Mihir Kamdar : we're going to need the logs from .100 for this one.
          Hide
          dfinlay Dave Finlay added a comment -

          I should add that an exit status of 143 means that the saslauthd_port process received a SIGTERM signal. (143 - 128 = 15, which is the value of the SIGTERM signal.) Nothing in Couchbase Server to my knowledge sends SIGTERMs to random processes, so this is almost certainly some external piece of code that has done this. I'm going to set the component to 'test-execution' for now, but will happily look at the logs for .100 once they get attached to this ticket.

          Show
          dfinlay Dave Finlay added a comment - I should add that an exit status of 143 means that the saslauthd_port process received a SIGTERM signal. (143 - 128 = 15, which is the value of the SIGTERM signal.) Nothing in Couchbase Server to my knowledge sends SIGTERMs to random processes, so this is almost certainly some external piece of code that has done this. I'm going to set the component to 'test-execution' for now, but will happily look at the logs for .100 once they get attached to this ticket.
          Hide
          mihir.kamdar Mihir Kamdar added a comment -

          Hi Dave Finlay Here is a fresher set of logs that also includes the ones from 172.23.106.100.

          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.104.137.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.104.155.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.104.157.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.104.5.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.104.69.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.104.70.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.105.107.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.105.111.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.106.100.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.106.188.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.108.103.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.120.245.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.121.117.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.121.3.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.123.27.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.123.28.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.96.148.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.96.251.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.96.252.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.96.253.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.97.119.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.97.121.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.97.122.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.97.239.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.97.242.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.98.135.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.99.11.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.99.20.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.99.21.zip
          https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.99.25.zip

          I looked up the test to see what was the test doing at this time. The test had issued a "systemctl stop couchbase-server.service" command to 172.23.106.100, which should have been graceful. Now, this step isn't new to the test, and has been there since older releases. But I don't think we have seen this error earlier.

          [2021-03-14T16:45:36-07:00, sequoiatools/couchbase-cli:7.0:dbfef4] setting-autofailover -c 172.23.108.103:8091 -u Administrator -p password --enable-auto-failover=1 --auto-failover-timeout=5 --max-failovers=1
          [2021-03-14T16:45:42-07:00, sequoiatools/cmd:c23358] 10
          [2021-03-14T16:45:58-07:00, sequoiatools/cbutil:f285ea] /cbinit.py 172.23.106.100 root couchbase stop
          [2021-03-14T16:46:20-07:00, sequoiatools/cmd:c08634] 10
          

          Show
          mihir.kamdar Mihir Kamdar added a comment - Hi Dave Finlay Here is a fresher set of logs that also includes the ones from 172.23.106.100. https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.104.137.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.104.155.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.104.157.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.104.5.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.104.69.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.104.70.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.105.107.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.105.111.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.106.100.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.106.188.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.108.103.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.120.245.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.121.117.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.121.3.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.123.27.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.123.28.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.96.148.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.96.251.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.96.252.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.96.253.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.97.119.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.97.121.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.97.122.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.97.239.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.97.242.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.98.135.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.99.11.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.99.20.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.99.21.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1615783401/collectinfo-2021-03-15T044325-ns_1%40172.23.99.25.zip I looked up the test to see what was the test doing at this time. The test had issued a "systemctl stop couchbase-server.service" command to 172.23.106.100, which should have been graceful. Now, this step isn't new to the test, and has been there since older releases. But I don't think we have seen this error earlier. [2021-03-14T16:45:36-07:00, sequoiatools/couchbase-cli:7.0:dbfef4] setting-autofailover -c 172.23.108.103:8091 -u Administrator -p password --enable-auto-failover=1 --auto-failover-timeout=5 --max-failovers=1 [2021-03-14T16:45:42-07:00, sequoiatools/cmd:c23358] 10 [2021-03-14T16:45:58-07:00, sequoiatools/cbutil:f285ea] /cbinit.py 172.23.106.100 root couchbase stop [2021-03-14T16:46:20-07:00, sequoiatools/cmd:c08634] 10
          Hide
          mihir.kamdar Mihir Kamdar added a comment -

          I'll keep the component as test-execution until you analyze the logs for 106.100.

          Show
          mihir.kamdar Mihir Kamdar added a comment - I'll keep the component as test-execution until you analyze the logs for 106.100.
          Hide
          dfinlay Dave Finlay added a comment -

          Thanks Mihir - will take a look.

          Show
          dfinlay Dave Finlay added a comment - Thanks Mihir - will take a look.
          Hide
          dfinlay Dave Finlay added a comment -

          What's happening is that something is sending a SIGTERM to all of Couchbase Server processes.

          The shutdown begins normally:

          [ns_server:info,2021-03-14T16:46:03.349-07:00,babysitter_of_ns_1@cb.local:<0.15645.54>:ns_babysitter_bootstrap:stop:36]19005: got shutdown request. Terminating.
          

          This seems likely to be the systemctl command to shutdown the server. The babysitter begins to shutdown all of its sub-processes, including the saslauthd_port:

          [ns_server:debug,2021-03-14T16:46:05.115-07:00,babysitter_of_ns_1@cb.local:<0.29180.51>:ns_port_server:terminate:196]Shutting down port saslauthd_port
          [ns_server:debug,2021-03-14T16:46:05.116-07:00,babysitter_of_ns_1@cb.local:<0.29180.51>:ns_port_server:port_shutdown:297]Shutdown command: "shutdown"
          

          The for good measure someone sends a SIGTERM to the babysitter:

          [error_logger:info,2021-03-14T16:46:05.117-07:00,babysitter_of_ns_1@cb.local:erl_signal_server<0.72.0>:ale_error_logger_handler:do_log:107]
          =========================NOTICE REPORT=========================
          SIGTERM received - shutting down
          

          And saslauthd_port reports that it too received a SIGTERM:

          [user:info,2021-03-14T16:46:05.162-07:00,ns_1@172.23.106.100:<0.2807.690>:ns_log:crash_consumption_loop:69]Service 'saslauthd_port' exited with status 143. Restarting. 
          

          If this is happening in docker containers, I think it's the case that containers send SIGTERMs to processes running inside them when they're shutdown, but I'm not sure about this. So, I'm not sure what's happening precisely here and who is sending the SIGTERM, but at least this all seems to be external to Couchbase Server.

          Show
          dfinlay Dave Finlay added a comment - What's happening is that something is sending a SIGTERM to all of Couchbase Server processes. The shutdown begins normally: [ns_server:info,2021-03-14T16:46:03.349-07:00,babysitter_of_ns_1@cb.local:<0.15645.54>:ns_babysitter_bootstrap:stop:36]19005: got shutdown request. Terminating. This seems likely to be the systemctl command to shutdown the server. The babysitter begins to shutdown all of its sub-processes, including the saslauthd_port: [ns_server:debug,2021-03-14T16:46:05.115-07:00,babysitter_of_ns_1@cb.local:<0.29180.51>:ns_port_server:terminate:196]Shutting down port saslauthd_port [ns_server:debug,2021-03-14T16:46:05.116-07:00,babysitter_of_ns_1@cb.local:<0.29180.51>:ns_port_server:port_shutdown:297]Shutdown command: "shutdown" The for good measure someone sends a SIGTERM to the babysitter: [error_logger:info,2021-03-14T16:46:05.117-07:00,babysitter_of_ns_1@cb.local:erl_signal_server<0.72.0>:ale_error_logger_handler:do_log:107] =========================NOTICE REPORT========================= SIGTERM received - shutting down And saslauthd_port reports that it too received a SIGTERM: [user:info,2021-03-14T16:46:05.162-07:00,ns_1@172.23.106.100:<0.2807.690>:ns_log:crash_consumption_loop:69]Service 'saslauthd_port' exited with status 143. Restarting. If this is happening in docker containers, I think it's the case that containers send SIGTERMs to processes running inside them when they're shutdown, but I'm not sure about this. So, I'm not sure what's happening precisely here and who is sending the SIGTERM, but at least this all seems to be external to Couchbase Server.
          Hide
          mihir.kamdar Mihir Kamdar added a comment -

          Thanks Dave for the analysis. I'll close this out for now since this is not a product issue. From the test, like how I mentioned earlier, we are not doing anything special or new. This has been an existing step in the longevity tests.

          Show
          mihir.kamdar Mihir Kamdar added a comment - Thanks Dave for the analysis. I'll close this out for now since this is not a product issue. From the test, like how I mentioned earlier, we are not doing anything special or new. This has been an existing step in the longevity tests.

            People

            Assignee:
            mihir.kamdar Mihir Kamdar
            Reporter:
            mihir.kamdar Mihir Kamdar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty