Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-56761

[CDC} Ns server exited with status 3 (failed_to_start_child,ns_server_cluster_sup)

    XMLWordPrintable

Details

    Description

      Steps To Recreate:

      1. Create a 4 node cluster
      2. Create a magma bucket with (bucket_history_retention_seconds=600,bucket_history_retention_bytes=6000000000, replicas=1)
      3. Create 30000000 items(doc size = 256)
      4. Start new doc ops(Create:Expiry)
      5. While doc ops are going on keep restarting couchbase server(service couchbase-server restart), Between two restarts test waits for warmup to finish and after warm up finishes test waits for 30 to 60 before next iteration of couchbase service restart, so total time between two restart is = warmup_time+30/60 seconds)
      6. Observed Ns server exited with status 3 (failed_to_start_child,ns_server_cluster_sup)

      Note
      Last iteration of couchbase-service restart was @ 2023-05-04 22:54:36, and ns server crash was observed @ 2023-05-04 22:54:11
      Crash was found on node 172.23.121.81 and 172.23.121.129

      Also, memcached was OOM Killed @Thu May 4 21:22:15 on 172.23.121.81 and [Thu May 4 21:32:31 2023] on 172.23.121.129(Though start time of the test where we saw below crash is 2023-05-04 21:53:44, so which means memcached was oom killed in the previous test)

      Cluster is still in same state: http://172.23.121.81:8091/ui/index.html#/overview/stats?commonBucket=default&scenarioZoom=minute&scenario=avd36v9h9&statsHostname=all

      Logs:

      Service 'ns_server' exited with status 3. Restarting. Messages:
      [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
      Crap error:{badmatch,
      {error,
      {{shutdown,
      {failed_to_start_child,ns_server_cluster_sup,
      {shutdown,
      {failed_to_start_child,ns_config_sup,
      {shutdown,
      {failed_to_start_child,ns_config,
      {noproc,
      {gen_server,call,
      [{encryption_service,'babysitter_of_ns_1@cb.local'},
      {decrypt,
      <<0,211,106,255,101,162,219,188,96,221,169,238,87,117,
      90,35,26,61,59,208,112,142,55,10,255,242,207,239,
      182,52,13>>},
      infinity]}}}}}}}},
      {ns_server,start,[normal,[]]}}}}
      [{ns_bootstrap,start,0,[{file,"src/ns_bootstrap.erl"},{line,31}]},
      {child_erlang,do_child_start,1,[{file,"src/child_erlang.erl"},{line,111}]},
      {child_erlang,child_start,1,[{file,"src/child_erlang.erl"},{line,89}]},
      {init,start_em,1,[]},
      {init,do_boot,3,[]}]
      

      QE-TEST:

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.85831.ini bucket_storage=magma,rerun=false,GROUP=P1;graceful_replica,randomize_value=true,doc_size=256,bucket_eviction_policy=fullEviction,nodes_init=4,enable_dp=false,collect_pcaps=True,get-cbcollect-info=True,autoCompactionDefined=true,bucket_history_retention_seconds=600,bucket_history_retention_bytes=6000000000,upgrade_version=7.2.0-5322 -t storage.magma.magma_crash_recovery.MagmaCrashTests.test_crash_during_ops,num_items=5000000,doc_size=1024,sdk_timeout=60,graceful=True,doc_ops=create:expiry,replicas=1,GROUP=P1;graceful_replica,multiplier=20'
      

      Job: http://qe-jenkins1.sc.couchbase.com/job/test_suite_executor-TAF/25356/consoleFull (Test/job passed on 7.2.0-5321)

      Attachments

        Issue Links

          Activity

            People

              ankush.sharma Ankush Sharma
              ankush.sharma Ankush Sharma
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty