Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
7.2.0
-
7.2.0-5322
-
Untriaged
-
-
0
-
Unknown
Description
Steps To Recreate:
- Create a 4 node cluster
- Create a magma bucket with (bucket_history_retention_seconds=600,bucket_history_retention_bytes=6000000000, replicas=1)
- Create 30000000 items(doc size = 256)
- Start new doc ops(Create:Expiry)
- While doc ops are going on keep restarting couchbase server(service couchbase-server restart), Between two restarts test waits for warmup to finish and after warm up finishes test waits for 30 to 60 before next iteration of couchbase service restart, so total time between two restart is = warmup_time+30/60 seconds)
- Observed Ns server exited with status 3 (failed_to_start_child,ns_server_cluster_sup)
Note
Last iteration of couchbase-service restart was @ 2023-05-04 22:54:36, and ns server crash was observed @ 2023-05-04 22:54:11
Crash was found on node 172.23.121.81 and 172.23.121.129
Also, memcached was OOM Killed @Thu May 4 21:22:15 on 172.23.121.81 and [Thu May 4 21:32:31 2023] on 172.23.121.129(Though start time of the test where we saw below crash is 2023-05-04 21:53:44, so which means memcached was oom killed in the previous test)
Cluster is still in same state: http://172.23.121.81:8091/ui/index.html#/overview/stats?commonBucket=default&scenarioZoom=minute&scenario=avd36v9h9&statsHostname=all
Logs:
Service 'ns_server' exited with status 3. Restarting. Messages:
|
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
|
Crap error:{badmatch,
|
{error,
|
{{shutdown,
|
{failed_to_start_child,ns_server_cluster_sup,
|
{shutdown,
|
{failed_to_start_child,ns_config_sup,
|
{shutdown,
|
{failed_to_start_child,ns_config,
|
{noproc,
|
{gen_server,call,
|
[{encryption_service,'babysitter_of_ns_1@cb.local'},
|
{decrypt,
|
<<0,211,106,255,101,162,219,188,96,221,169,238,87,117,
|
90,35,26,61,59,208,112,142,55,10,255,242,207,239,
|
182,52,13>>},
|
infinity]}}}}}}}},
|
{ns_server,start,[normal,[]]}}}}
|
[{ns_bootstrap,start,0,[{file,"src/ns_bootstrap.erl"},{line,31}]},
|
{child_erlang,do_child_start,1,[{file,"src/child_erlang.erl"},{line,111}]},
|
{child_erlang,child_start,1,[{file,"src/child_erlang.erl"},{line,89}]},
|
{init,start_em,1,[]},
|
{init,do_boot,3,[]}]
|
QE-TEST:
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.85831.ini bucket_storage=magma,rerun=false,GROUP=P1;graceful_replica,randomize_value=true,doc_size=256,bucket_eviction_policy=fullEviction,nodes_init=4,enable_dp=false,collect_pcaps=True,get-cbcollect-info=True,autoCompactionDefined=true,bucket_history_retention_seconds=600,bucket_history_retention_bytes=6000000000,upgrade_version=7.2.0-5322 -t storage.magma.magma_crash_recovery.MagmaCrashTests.test_crash_during_ops,num_items=5000000,doc_size=1024,sdk_timeout=60,graceful=True,doc_ops=create:expiry,replicas=1,GROUP=P1;graceful_replica,multiplier=20'
|
Job: http://qe-jenkins1.sc.couchbase.com/job/test_suite_executor-TAF/25356/consoleFull (Test/job passed on 7.2.0-5321)
Attachments
Issue Links
- duplicates
-
MB-56471 investigate the late start of encryption_service
- Resolved