Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60817

Service 'ns_server' exited with status 137 | random failover of all nodes of cluster | 7.6.0 2135

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • No

    Description

      The following test was performed on Capella.
      AWS cluster with ami - 
      CSP - GCP
      ami - couchbase-cloud-server-7-6-0-2135-v1-0-28
      All of a sudden, all 3 nodes of the cluster randomly got failed over and then got added back to the cluster in a bit..
      The following errors were observed multiple times in the logs - 
      ----------------------------------------------------------------------------------------------------------------------------------------------------------

      Service 'ns_server' exited with status 137. Restarting. Messages: working as port 1946: Booted. Waiting for shutdown request 1946: Booted. Waiting for shutdown request working as port [os_mon] cpu supervisor port (cpu_sup): Erlang has closed [os_mon] memory supervisor port (memsup): Erlang has closed

      ----------------------------------------------------------------------------------------------------------------------------------------------------------

      {}Compactor for database `sift_bucket` (pid [{type,database}, {important,true}, {name,<<"sift_bucket">>}, {fa, {#Fun<compaction_daemon.4.76759806>, [<<"sift_bucket">>, {config,{_}

      _

      {30,undefined}

      _

      , {30,undefined}, undefined,false,false, {daemon_config,30,131072, 20971520}}, false, {[

      _

      {type,bucket}

      _

      ]}]}}]) terminated unexpectedly: {compromised_reply, {error, timeout, [{ns_memcached, worker_loop, 3, [

      _

      {file, "src/ns_memcached.erl"}

      _

      , {line, 253}]}, {proc_lib, init_p_do_apply, 3, [

      _

      {file, "proc_lib.erl"}

      _

      , {line, 240}]}]}, {gen_server, call, [

      _

      {'ns_memcached-sift_bucket', 'ns_1@svc-dqisea-node-002.x3w1mh9c1wvwukx.sandbox.nonprod-project-avengers.com'}

      _

      , {raw_stats, <<"diskinfo">>, undefined, #Fun<compaction_daemon.18.76759806>, {<<"0">>, <<"0">>}}, 300000]}}_
      ----------------------------------------------------------------------------------------------------------------------------------------------------------

      server logs - 
      https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-16T054818-ns_1%40svc-dqisea-node-001.x3w1mh9c1wvwukx.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-16T054818-ns_1%40svc-dqisea-node-002.x3w1mh9c1wvwukx.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-16T054818-ns_1%40svc-dqisea-node-003.x3w1mh9c1wvwukx.sandbox.nonprod-project-avengers.com.zip

      This sudden failover was observed around 8:19 PM PST.

      Attachments

        1. screenshot-5.png
          screenshot-5.png
          39 kB
        2. screenshot-4.png
          screenshot-4.png
          34 kB
        3. screenshot-3.png
          screenshot-3.png
          64 kB
        4. screenshot-2.png
          screenshot-2.png
          63 kB
        5. screenshot-1.png
          screenshot-1.png
          45 kB
        6. MB-60817-nutshell.txt
          51 kB
        7. MB-60817-memcached-analysis.txt
          7 kB
        8. image-2024-02-16-16-06-03-677.png
          image-2024-02-16-16-06-03-677.png
          34 kB
        9. image-2024-02-16-15-23-28-235.png
          image-2024-02-16-15-23-28-235.png
          63 kB
        10. 003-KV-ops.png
          003-KV-ops.png
          282 kB
        11. 003-CPU-memory.png
          003-CPU-memory.png
          307 kB
        12. 002-KV-ops.png
          002-KV-ops.png
          286 kB
        13. 002-CPU-memory.png
          002-CPU-memory.png
          292 kB
        14. 001-sys-mem-limit-and-used.png
          001-sys-mem-limit-and-used.png
          111 kB
        15. 001-KV-ops.png
          001-KV-ops.png
          255 kB
        16. 001-CPU-memory.png
          001-CPU-memory.png
          289 kB

        Activity

          People

            steve.watanabe Steve Watanabe
            aman.srivastava Aman Srivastava
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty