Details
-
Bug
-
Resolution: Not a Bug
-
Critical
-
7.2.4
-
Operating System : Debian GNU/Linux 11 (bullseye)
Couchbase Enterprise Edition build 7.2.4-7059
-
Untriaged
-
Linux x86_64
-
-
0
-
Unknown
Description
Steps to reproduce
- Created a 4 node cluster with the following services
- 172.23.216.174 - kv
- 172.23.217.66 - kv, index, n1ql
- 172.23.217.73 - kv
- 172.23.214.74 - kv
- Created 2 ephemeral buckets named 'default' and 'standard_bucket' and loaded some docs onto it
- Added 2 nodes 172.23.216.204, 172.23.216.80 and removed node 172.23.216.174 and started a rebalance
Rebalance fails
2023-12-11T09:45:25.161-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.217.66) - Rebalance exited with reason {mover_crashed, {important_nodes_went_down, {ns_node_disco_events, ['ns_1@172.23.216.174','ns_1@172.23.216.204', 'ns_1@172.23.216.80','ns_1@172.23.217.66', 'ns_1@172.23.217.73','ns_1@172.23.217.74'], ['ns_1@172.23.216.174','ns_1@172.23.216.204', 'ns_1@172.23.217.66','ns_1@172.23.217.73', 'ns_1@172.23.217.74']}}}.Rebalance Operation Id = 0780830a206cd0bfa0328f47263df70e |
Observing a crash in babysitter logs for 172.23.216.80
[ns_server:info,2023-12-11T09:45:02.777-08:00,babysitter_of_ns_1@cb.local:<0.6324.0>:ns_port_server:log:226]memcached<0.6324.0>: WARNING: Logging before InitGoogleLogging() is written to STDERRmemcached<0.6324.0>: W1211 09:45:02.575379 2339608 HazptrDomain.h:670] Using the default inline executor for asynchronous reclamation may be susceptible to deadlock if the current thread happens to hold a resource needed by the deleter of a reclaimable object |
[error_logger:info,2023-12-11T09:45:25.209-08:00,babysitter_of_ns_1@cb.local:net_kernel<0.72.0>:ale_error_logger_handler:do_log:101]=========================NOTICE REPORT========================={net_kernel,{'EXIT',#Port<0.15>,normal}}[error_logger:info,2023-12-11T09:45:25.209-08:00,babysitter_of_ns_1@cb.local:net_kernel<0.72.0>:ale_error_logger_handler:do_log:101]=========================NOTICE REPORT========================={net_kernel,{'EXIT',<0.374.0>,connection_closed}}[error_logger:info,2023-12-11T09:45:25.209-08:00,babysitter_of_ns_1@cb.local:net_kernel<0.72.0>:ale_error_logger_handler:do_log:101]=========================NOTICE REPORT========================={net_kernel,{auto_connect,'ns_1@172.23.216.80', {8077698,#Ref<0.1728375175.3453878273.146421>}}}[ns_server:debug,2023-12-11T09:45:25.209-08:00,babysitter_of_ns_1@cb.local:cb_dist<0.69.0>:cb_dist:info_msg:872]cb_dist: Connection down: {con,#Ref<0.1728375175.3453747201.146418>, inet_tcp_dist,<0.374.0>, #Ref<0.1728375175.3453747201.146419>}[ns_server:info,2023-12-11T09:45:25.212-08:00,babysitter_of_ns_1@cb.local:<0.134.0>:ns_port_server:log:226]ns_server<0.134.0>: Monotonic time stepped backwards!ns_server<0.134.0>: Previous time: 8286430844667ns_server<0.134.0>: Current time: 8286430844655ns_server<0.134.0>: [os_mon] cpu supervisor port (cpu_sup): Erlang has closedns_server<0.134.0>: [os_mon] memory supervisor port (memsup): Erlang has closed |
[error_logger:error,2023-12-11T09:45:25.217-08:00,babysitter_of_ns_1@cb.local:<0.134.0>:ale_error_logger_handler:do_log:101]=========================ERROR REPORT=========================** Generic server <0.134.0> terminating ** Last message in was {#Port<0.9>,{exit_status,134}}** When Server state == {state,#Port<0.9>,2325437, {ns_server,"/opt/couchbase/lib/erlang/bin/erl", ["+A","16","-smp","enable","+sbt","u","+P", "327680","+K","true","+swt","low","+sbwt", "none","+MMmcs","30","+e102400","-setcookie", "nocookie","-kernel","logger", "[{handler, default, undefined}]","-kernel", "prevent_overlapping_partitions","false", "-user","user_io","-run","child_erlang", "child_start","ns_bootstrap","--","-smp", "enable","-kernel","logger", "[{handler, default, undefined}]","inetrc", "\"/opt/couchbase/etc/couchbase/hosts.cfg\"", "dist_config_file", "\"/opt/couchbase/var/lib/couchbase/config/dist_cfg\"", [...]|...], [{env, [{"NS_SERVER_BABYSITTER_PID","2325312"}, {"CHILD_ERLANG_ENV_ARGS", "[{ns_babysitter,\n [{pidfile,\"/opt/couchbase/var/lib/couchbase/couchbase-server.pid\"},\n {nodefile,\n \"/opt/couchbase/var/lib/couchbase/couchbase-server.babysitter.node\"},\n {cookiefile,\n \"/opt/couchbase/var/lib/couchbase/couchbase-server.babysitter.cookie\"}]},\n {ns_server,\n [{path_config_secdir,\"/opt/couchbase/etc/security\"},\n {loglevel_error_logger,debug},\n {path_config_tmpdir,\"/opt/couchbase/var/lib/couchbase/tmp\"},\n {config_path,\"/opt/couchbase/etc/couchbase/static_config\"},\n {path_config_datadir,\"/opt/couchbase/var/lib/couchbase\"},\n {disk_sink_opts_json_rpc,\n [{rotation,\n [{compress,true},\n {size,41943040},\n {num_files,2},\n {buffer_size_max,52428800}]}]},\n {max_t,10},\n {loglevel_ns_doctor,debug},\n {nodefile,\"/opt/couchbase/var/lib/couchbase/couchbase-server.node\"},\n {loglevel_default,debug},\n {loglevel_cluster,debug},\n {loglevel_rebalance,debug},\n {path_config_bindir,\"/opt/couchbase/bin\"},\n {loglevel_ns_server,debug},\n {loglevel_cbas,debug},\n {disk_sink_opts,\n [{rotation,\n [{compress,true},\n {size,41943040},\n {num_files,10},\n {buffer_size_max,52428800}]}]},\n {max_r,20},\n {net_kernel_verbosity,10},\n {loglevel_mapreduce_errors,debug},\n {loglevel_xdcr,debug},\n {loglevel_user,debug},\n {loglevel_couchdb,info},\n {loglevel_access,info},\n {loglevel_menelaus,debug},\n {path_config_libdir,\"/opt/couchbase/lib\"},\n {loglevel_views,debug},\n {loglevel_stats,debug},\n {path_config_etcdir,\"/opt/couchbase/etc/couchbase\"},\n {error_logger_mf_dir,\"/opt/couchbase/var/lib/couchbase/logs\"}]}]"}, {"ERL_CRASH_DUMP", "erl_crash.dump.1703714386.2325312.ns_server"}]}, exit_status,use_stdio,stream,eof]}, {ringbuffer,670,1024, {[{<<"[os_mon] memory supervisor port (memsup): Erlang has closed">>, 59}, {<<"[os_mon] cpu supervisor port (cpu_sup): Erlang has closed">>, 57}, {<<"Current time: 8286430844655">>,28}, {<<"Previous time: 8286430844667">>,28}, {<<"Monotonic time stepped backwards!">>,33}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"working as port">>,15}, {<<"2325437: Booted. Wai"...>>,45}], [{<<"working as port">>,15}]}}, undefined,#Ref<0.1728375175.3453747201.173278>, [<<"[os_mon] memory supervisor port (memsup): Erlang has closed">>, <<"[os_mon] cpu supervisor port (cpu_sup): Erlang has closed">>, <<"Current time: 8286430844655">>, <<"Previous time: 8286430844667">>, <<"Monotonic time stepped backwards!">>], 0}** Reason for termination ==** {abnormal,134} |
[error_logger:error,2023-12-11T09:45:25.219-08:00,babysitter_of_ns_1@cb.local:<0.134.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: ns_port_server:init/1 pid: <0.134.0> registered_name: [] exception exit: {abnormal,134} in function gen_server:handle_common_reply/8 (gen_server.erl, line 1241) ancestors: [<0.133.0>,<0.132.0>,child_ns_server_sup,ns_babysitter_sup, <0.116.0>] message_queue_len: 1 messages: [{#Port<0.9>,eof}] links: [<0.133.0>,#Port<0.9>] dictionary: [] trap_exit: true status: running heap_size: 17731 stack_size: 28 reductions: 142590 neighbours: |
[ns_server:info,2023-12-11T09:45:25.220-08:00,babysitter_of_ns_1@cb.local:<0.133.0>:supervisor_cushion:handle_info:54]Cushion managed supervisor for ns_server failed: {abnormal,134}[error_logger:error,2023-12-11T09:45:25.220-08:00,babysitter_of_ns_1@cb.local:<0.133.0>:ale_error_logger_handler:do_log:101]=========================ERROR REPORT=========================** Generic server <0.133.0> terminating ** Last message in was {die,{abnormal,134}}** When Server state == {state,ns_server,5000,-576460751061013937,undefined, infinity}** Reason for termination ==** {abnormal,134} |
[error_logger:error,2023-12-11T09:45:25.220-08:00,babysitter_of_ns_1@cb.local:<0.133.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: supervisor_cushion:init/1 pid: <0.133.0> registered_name: [] exception exit: {abnormal,134} in function gen_server:handle_common_reply/8 (gen_server.erl, line 1241) ancestors: [<0.132.0>,child_ns_server_sup,ns_babysitter_sup,<0.116.0>] message_queue_len: 0 messages: [] links: [<0.132.0>] dictionary: [] trap_exit: true status: running heap_size: 2586 stack_size: 28 reductions: 12864 neighbours: |
[error_logger:error,2023-12-11T09:45:25.220-08:00,babysitter_of_ns_1@cb.local:<0.132.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT========================= crasher: initial call: erlang:apply/2 pid: <0.132.0> registered_name: [] exception exit: {abnormal,134} in function restartable:loop/4 (src/restartable.erl, line 63) ancestors: [child_ns_server_sup,ns_babysitter_sup,<0.116.0>] message_queue_len: 0 messages: [] links: [<0.131.0>] dictionary: [] trap_exit: true status: running heap_size: 10958 stack_size: 28 reductions: 5194 neighbours: |
[error_logger:error,2023-12-11T09:45:25.220-08:00,babysitter_of_ns_1@cb.local:child_ns_server_sup<0.131.0>:ale_error_logger_handler:do_log:101]=========================SUPERVISOR REPORT========================= supervisor: {local,child_ns_server_sup} errorContext: child_terminated reason: {abnormal,134} offender: [{pid,<0.132.0>}, {id,{ns_server,"/opt/couchbase/lib/erlang/bin/erl", ["+A","16","-smp","enable","+sbt","u","+P","327680", "+K","true","+swt","low","+sbwt","none","+MMmcs","30", "+e102400","-setcookie","nocookie","-kernel","logger", "[{handler, default, undefined}]","-kernel", "prevent_overlapping_partitions","false","-user", "user_io","-run","child_erlang","child_start", "ns_bootstrap","--","-smp","enable","-kernel", "logger","[{handler, default, undefined}]","inetrc", "\"/opt/couchbase/etc/couchbase/hosts.cfg\"", "dist_config_file", "\"/opt/couchbase/var/lib/couchbase/config/dist_cfg\"", [...]|...], [{env, [{"NS_SERVER_BABYSITTER_PID","2325312"}, {"CHILD_ERLANG_ENV_ARGS", "[{ns_babysitter,\n [{pidfile,\"/opt/couchbase/var/lib/couchbase/couchbase-server.pid\"},\n {nodefile,\n \"/opt/couchbase/var/lib/couchbase/couchbase-server.babysitter.node\"},\n {cookiefile,\n \"/opt/couchbase/var/lib/couchbase/couchbase-server.babysitter.cookie\"}]},\n {ns_server,\n [{path_config_secdir,\"/opt/couchbase/etc/security\"},\n {loglevel_error_logger,debug},\n {path_config_tmpdir,\"/opt/couchbase/var/lib/couchbase/tmp\"},\n {config_path,\"/opt/couchbase/etc/couchbase/static_config\"},\n {path_config_datadir,\"/opt/couchbase/var/lib/couchbase\"},\n {disk_sink_opts_json_rpc,\n [{rotation,\n [{compress,true},\n {size,41943040},\n {num_files,2},\n {buffer_size_max,52428800}]}]},\n {max_t,10},\n {loglevel_ns_doctor,debug},\n {nodefile,\"/opt/couchbase/var/lib/couchbase/couchbase-server.node\"},\n {loglevel_default,debug},\n {loglevel_cluster,debug},\n {loglevel_rebalance,debug},\n {path_config_bindir,\"/opt/couchbase/bin\"},\n {loglevel_ns_server,debug},\n {loglevel_cbas,debug},\n {disk_sink_opts,\n [{rotation,\n [{compress,true},\n {size,41943040},\n {num_files,10},\n {buffer_size_max,52428800}]}]},\n {max_r,20},\n {net_kernel_verbosity,10},\n {loglevel_mapreduce_errors,debug},\n {loglevel_xdcr,debug},\n {loglevel_user,debug},\n {loglevel_couchdb,info},\n {loglevel_access,info},\n {loglevel_menelaus,debug},\n {path_config_libdir,\"/opt/couchbase/lib\"},\n {loglevel_views,debug},\n {loglevel_stats,debug},\n {path_config_etcdir,\"/opt/couchbase/etc/couchbase\"},\n {error_logger_mf_dir,\"/opt/couchbase/var/lib/couchbase/logs\"}]}]"}, {"ERL_CRASH_DUMP", "erl_crash.dump.1703714386.2325312.ns_server"}]}, exit_status,use_stdio,stream,eof]}}, {mfargs, {restartable,start_link, [{supervisor_cushion,start_link, [ns_server,5000,infinity,ns_port_server, start_link, [#Fun<ns_child_ports_sup.4.11883217>]]}, 86400000]}}, {restart_type,permanent}, {significant,false}, {shutdown,infinity}, {child_type,worker}] |
[ns_server:debug,2023-12-11T09:45:25.221-08:00,babysitter_of_ns_1@cb.local:<0.7116.0>:supervisor_cushion:init:33]starting ns_port_server with delay of 5000[ns_server:debug,2023-12-11T09:45:25.221-08:00,babysitter_of_ns_1@cb.local:<0.7115.0>:restartable:start_child:92]Started child process <0.7116.0> MFA: {supervisor_cushion,start_link, [ns_server,5000,infinity,ns_port_server,start_link, [#Fun<ns_child_ports_sup.4.11883217>]]} |
Testrunner test to reproduce
./testrunner -i /data/workspace/debian-p0-nserv-vset00-00-rebalanceinout_P1_Set2-ephemeral/testexec.49992.ini -p get-cbcollect-info=False,bucket_type=ephemeral,GROUP=P1_Set2,get-cbcollect-info=True,sirius_url=http://172.23.120.103:4000 -t rebalance.rebalanceinout.RebalanceInOutTests.test_rebalance_in_out_at_once,default_bucket=True,standard_buckets=1,sasl_buckets=1,replicas=1,nodes_init=4,nodes_in=2,nodes_out=1,items=1000000,max_verify=50000,GROUP=IN_OUT;P1;PERFORMANCE;P1_Set2 |
Job name : debian-nserv-rebalanceinout_P1_Set2-ephemeral