Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60120

[Rebalance] : Rebalance exited with reason {mover_crashed,{important_nodes_went_down,{ns_node_disco_events

    XMLWordPrintable

Details

    Description

      Steps to reproduce

      1. Created a 4 node cluster with the following services
        1. 172.23.216.174 - kv
        2. 172.23.217.66  - kv, index, n1ql
        3. 172.23.217.73 - kv
        4. 172.23.214.74 - kv
      2. Created 2 ephemeral buckets named 'default' and 'standard_bucket' and loaded some docs onto it
      3. Added 2 nodes 172.23.216.204, 172.23.216.80 and removed node 172.23.216.174 and started a rebalance

      Rebalance fails 

      2023-12-11T09:45:25.161-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.217.66) - Rebalance exited with reason {mover_crashed,                              {important_nodes_went_down,                               {ns_node_disco_events,                                ['ns_1@172.23.216.174','ns_1@172.23.216.204',                                 'ns_1@172.23.216.80','ns_1@172.23.217.66',                                 'ns_1@172.23.217.73','ns_1@172.23.217.74'],                                ['ns_1@172.23.216.174','ns_1@172.23.216.204',                                 'ns_1@172.23.217.66','ns_1@172.23.217.73',                                 'ns_1@172.23.217.74']}}}.Rebalance Operation Id = 0780830a206cd0bfa0328f47263df70e 

      Observing a crash in babysitter logs for 172.23.216.80

      [ns_server:info,2023-12-11T09:45:02.777-08:00,babysitter_of_ns_1@cb.local:<0.6324.0>:ns_port_server:log:226]memcached<0.6324.0>: WARNING: Logging before InitGoogleLogging() is written to STDERRmemcached<0.6324.0>: W1211 09:45:02.575379 2339608 HazptrDomain.h:670] Using the default inline executor for asynchronous reclamation may be susceptible to deadlock if the current thread happens to hold a resource needed by the deleter of a reclaimable object
      [error_logger:info,2023-12-11T09:45:25.209-08:00,babysitter_of_ns_1@cb.local:net_kernel<0.72.0>:ale_error_logger_handler:do_log:101]=========================NOTICE REPORT========================={net_kernel,{'EXIT',#Port<0.15>,normal}}[error_logger:info,2023-12-11T09:45:25.209-08:00,babysitter_of_ns_1@cb.local:net_kernel<0.72.0>:ale_error_logger_handler:do_log:101]=========================NOTICE REPORT========================={net_kernel,{'EXIT',<0.374.0>,connection_closed}}[error_logger:info,2023-12-11T09:45:25.209-08:00,babysitter_of_ns_1@cb.local:net_kernel<0.72.0>:ale_error_logger_handler:do_log:101]=========================NOTICE REPORT========================={net_kernel,{auto_connect,'ns_1@172.23.216.80',                          {8077698,#Ref<0.1728375175.3453878273.146421>}}}[ns_server:debug,2023-12-11T09:45:25.209-08:00,babysitter_of_ns_1@cb.local:cb_dist<0.69.0>:cb_dist:info_msg:872]cb_dist: Connection down: {con,#Ref<0.1728375175.3453747201.146418>,                               inet_tcp_dist,<0.374.0>,                               #Ref<0.1728375175.3453747201.146419>}[ns_server:info,2023-12-11T09:45:25.212-08:00,babysitter_of_ns_1@cb.local:<0.134.0>:ns_port_server:log:226]ns_server<0.134.0>: Monotonic time stepped backwards!ns_server<0.134.0>: Previous time: 8286430844667ns_server<0.134.0>: Current time:  8286430844655ns_server<0.134.0>: [os_mon] cpu supervisor port (cpu_sup): Erlang has closedns_server<0.134.0>: [os_mon] memory supervisor port (memsup): Erlang has closed
      [error_logger:error,2023-12-11T09:45:25.217-08:00,babysitter_of_ns_1@cb.local:<0.134.0>:ale_error_logger_handler:do_log:101]=========================ERROR REPORT=========================** Generic server <0.134.0> terminating ** Last message in was {#Port<0.9>,{exit_status,134}}** When Server state == {state,#Port<0.9>,2325437,                            {ns_server,"/opt/couchbase/lib/erlang/bin/erl",                                ["+A","16","-smp","enable","+sbt","u","+P",                                 "327680","+K","true","+swt","low","+sbwt",                                 "none","+MMmcs","30","+e102400","-setcookie",                                 "nocookie","-kernel","logger",                                 "[{handler, default, undefined}]","-kernel",                                 "prevent_overlapping_partitions","false",                                 "-user","user_io","-run","child_erlang",                                 "child_start","ns_bootstrap","--","-smp",                                 "enable","-kernel","logger",                                 "[{handler, default, undefined}]","inetrc",                                 "\"/opt/couchbase/etc/couchbase/hosts.cfg\"",                                 "dist_config_file",                                 "\"/opt/couchbase/var/lib/couchbase/config/dist_cfg\"",                                 [...]|...],                                [{env,                                     [{"NS_SERVER_BABYSITTER_PID","2325312"},                                      {"CHILD_ERLANG_ENV_ARGS",                                       "[{ns_babysitter,\n     [{pidfile,\"/opt/couchbase/var/lib/couchbase/couchbase-server.pid\"},\n      {nodefile,\n          \"/opt/couchbase/var/lib/couchbase/couchbase-server.babysitter.node\"},\n      {cookiefile,\n          \"/opt/couchbase/var/lib/couchbase/couchbase-server.babysitter.cookie\"}]},\n {ns_server,\n     [{path_config_secdir,\"/opt/couchbase/etc/security\"},\n      {loglevel_error_logger,debug},\n      {path_config_tmpdir,\"/opt/couchbase/var/lib/couchbase/tmp\"},\n      {config_path,\"/opt/couchbase/etc/couchbase/static_config\"},\n      {path_config_datadir,\"/opt/couchbase/var/lib/couchbase\"},\n      {disk_sink_opts_json_rpc,\n          [{rotation,\n               [{compress,true},\n                {size,41943040},\n                {num_files,2},\n                {buffer_size_max,52428800}]}]},\n      {max_t,10},\n      {loglevel_ns_doctor,debug},\n      {nodefile,\"/opt/couchbase/var/lib/couchbase/couchbase-server.node\"},\n      {loglevel_default,debug},\n      {loglevel_cluster,debug},\n      {loglevel_rebalance,debug},\n      {path_config_bindir,\"/opt/couchbase/bin\"},\n      {loglevel_ns_server,debug},\n      {loglevel_cbas,debug},\n      {disk_sink_opts,\n          [{rotation,\n               [{compress,true},\n                {size,41943040},\n                {num_files,10},\n                {buffer_size_max,52428800}]}]},\n      {max_r,20},\n      {net_kernel_verbosity,10},\n      {loglevel_mapreduce_errors,debug},\n      {loglevel_xdcr,debug},\n      {loglevel_user,debug},\n      {loglevel_couchdb,info},\n      {loglevel_access,info},\n      {loglevel_menelaus,debug},\n      {path_config_libdir,\"/opt/couchbase/lib\"},\n      {loglevel_views,debug},\n      {loglevel_stats,debug},\n      {path_config_etcdir,\"/opt/couchbase/etc/couchbase\"},\n      {error_logger_mf_dir,\"/opt/couchbase/var/lib/couchbase/logs\"}]}]"},                                      {"ERL_CRASH_DUMP",                                       "erl_crash.dump.1703714386.2325312.ns_server"}]},                                 exit_status,use_stdio,stream,eof]},                            {ringbuffer,670,1024,                                {[{<<"[os_mon] memory supervisor port (memsup): Erlang has closed">>,                                   59},                                  {<<"[os_mon] cpu supervisor port (cpu_sup): Erlang has closed">>,                                   57},                                  {<<"Current time:  8286430844655">>,28},                                  {<<"Previous time: 8286430844667">>,28},                                  {<<"Monotonic time stepped backwards!">>,33},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"working as port">>,15},                                  {<<"2325437: Booted. Wai"...>>,45}],                                 [{<<"working as port">>,15}]}},                            undefined,#Ref<0.1728375175.3453747201.173278>,                            [<<"[os_mon] memory supervisor port (memsup): Erlang has closed">>,                             <<"[os_mon] cpu supervisor port (cpu_sup): Erlang has closed">>,                             <<"Current time:  8286430844655">>,                             <<"Previous time: 8286430844667">>,                             <<"Monotonic time stepped backwards!">>],                            0}** Reason for termination ==** {abnormal,134}
      [error_logger:error,2023-12-11T09:45:25.219-08:00,babysitter_of_ns_1@cb.local:<0.134.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: ns_port_server:init/1    pid: <0.134.0>    registered_name: []    exception exit: {abnormal,134}      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1241)    ancestors: [<0.133.0>,<0.132.0>,child_ns_server_sup,ns_babysitter_sup,                  <0.116.0>]    message_queue_len: 1    messages: [{#Port<0.9>,eof}]    links: [<0.133.0>,#Port<0.9>]    dictionary: []    trap_exit: true    status: running    heap_size: 17731    stack_size: 28    reductions: 142590  neighbours:
      [ns_server:info,2023-12-11T09:45:25.220-08:00,babysitter_of_ns_1@cb.local:<0.133.0>:supervisor_cushion:handle_info:54]Cushion managed supervisor for ns_server failed:  {abnormal,134}[error_logger:error,2023-12-11T09:45:25.220-08:00,babysitter_of_ns_1@cb.local:<0.133.0>:ale_error_logger_handler:do_log:101]=========================ERROR REPORT=========================** Generic server <0.133.0> terminating ** Last message in was {die,{abnormal,134}}** When Server state == {state,ns_server,5000,-576460751061013937,undefined,                               infinity}** Reason for termination ==** {abnormal,134}
      [error_logger:error,2023-12-11T09:45:25.220-08:00,babysitter_of_ns_1@cb.local:<0.133.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: supervisor_cushion:init/1    pid: <0.133.0>    registered_name: []    exception exit: {abnormal,134}      in function  gen_server:handle_common_reply/8 (gen_server.erl, line 1241)    ancestors: [<0.132.0>,child_ns_server_sup,ns_babysitter_sup,<0.116.0>]    message_queue_len: 0    messages: []    links: [<0.132.0>]    dictionary: []    trap_exit: true    status: running    heap_size: 2586    stack_size: 28    reductions: 12864  neighbours:
      [error_logger:error,2023-12-11T09:45:25.220-08:00,babysitter_of_ns_1@cb.local:<0.132.0>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: erlang:apply/2    pid: <0.132.0>    registered_name: []    exception exit: {abnormal,134}      in function  restartable:loop/4 (src/restartable.erl, line 63)    ancestors: [child_ns_server_sup,ns_babysitter_sup,<0.116.0>]    message_queue_len: 0    messages: []    links: [<0.131.0>]    dictionary: []    trap_exit: true    status: running    heap_size: 10958    stack_size: 28    reductions: 5194  neighbours:
      [error_logger:error,2023-12-11T09:45:25.220-08:00,babysitter_of_ns_1@cb.local:child_ns_server_sup<0.131.0>:ale_error_logger_handler:do_log:101]=========================SUPERVISOR REPORT=========================    supervisor: {local,child_ns_server_sup}    errorContext: child_terminated    reason: {abnormal,134}    offender: [{pid,<0.132.0>},               {id,{ns_server,"/opt/couchbase/lib/erlang/bin/erl",                       ["+A","16","-smp","enable","+sbt","u","+P","327680",                        "+K","true","+swt","low","+sbwt","none","+MMmcs","30",                        "+e102400","-setcookie","nocookie","-kernel","logger",                        "[{handler, default, undefined}]","-kernel",                        "prevent_overlapping_partitions","false","-user",                        "user_io","-run","child_erlang","child_start",                        "ns_bootstrap","--","-smp","enable","-kernel",                        "logger","[{handler, default, undefined}]","inetrc",                        "\"/opt/couchbase/etc/couchbase/hosts.cfg\"",                        "dist_config_file",                        "\"/opt/couchbase/var/lib/couchbase/config/dist_cfg\"",                        [...]|...],                       [{env,                            [{"NS_SERVER_BABYSITTER_PID","2325312"},                             {"CHILD_ERLANG_ENV_ARGS",                              "[{ns_babysitter,\n     [{pidfile,\"/opt/couchbase/var/lib/couchbase/couchbase-server.pid\"},\n      {nodefile,\n          \"/opt/couchbase/var/lib/couchbase/couchbase-server.babysitter.node\"},\n      {cookiefile,\n          \"/opt/couchbase/var/lib/couchbase/couchbase-server.babysitter.cookie\"}]},\n {ns_server,\n     [{path_config_secdir,\"/opt/couchbase/etc/security\"},\n      {loglevel_error_logger,debug},\n      {path_config_tmpdir,\"/opt/couchbase/var/lib/couchbase/tmp\"},\n      {config_path,\"/opt/couchbase/etc/couchbase/static_config\"},\n      {path_config_datadir,\"/opt/couchbase/var/lib/couchbase\"},\n      {disk_sink_opts_json_rpc,\n          [{rotation,\n               [{compress,true},\n                {size,41943040},\n                {num_files,2},\n                {buffer_size_max,52428800}]}]},\n      {max_t,10},\n      {loglevel_ns_doctor,debug},\n      {nodefile,\"/opt/couchbase/var/lib/couchbase/couchbase-server.node\"},\n      {loglevel_default,debug},\n      {loglevel_cluster,debug},\n      {loglevel_rebalance,debug},\n      {path_config_bindir,\"/opt/couchbase/bin\"},\n      {loglevel_ns_server,debug},\n      {loglevel_cbas,debug},\n      {disk_sink_opts,\n          [{rotation,\n               [{compress,true},\n                {size,41943040},\n                {num_files,10},\n                {buffer_size_max,52428800}]}]},\n      {max_r,20},\n      {net_kernel_verbosity,10},\n      {loglevel_mapreduce_errors,debug},\n      {loglevel_xdcr,debug},\n      {loglevel_user,debug},\n      {loglevel_couchdb,info},\n      {loglevel_access,info},\n      {loglevel_menelaus,debug},\n      {path_config_libdir,\"/opt/couchbase/lib\"},\n      {loglevel_views,debug},\n      {loglevel_stats,debug},\n      {path_config_etcdir,\"/opt/couchbase/etc/couchbase\"},\n      {error_logger_mf_dir,\"/opt/couchbase/var/lib/couchbase/logs\"}]}]"},                             {"ERL_CRASH_DUMP",                              "erl_crash.dump.1703714386.2325312.ns_server"}]},                        exit_status,use_stdio,stream,eof]}},               {mfargs,                   {restartable,start_link,                       [{supervisor_cushion,start_link,                            [ns_server,5000,infinity,ns_port_server,                             start_link,                             [#Fun<ns_child_ports_sup.4.11883217>]]},                        86400000]}},               {restart_type,permanent},               {significant,false},               {shutdown,infinity},               {child_type,worker}]
      [ns_server:debug,2023-12-11T09:45:25.221-08:00,babysitter_of_ns_1@cb.local:<0.7116.0>:supervisor_cushion:init:33]starting ns_port_server with delay of 5000[ns_server:debug,2023-12-11T09:45:25.221-08:00,babysitter_of_ns_1@cb.local:<0.7115.0>:restartable:start_child:92]Started child process <0.7116.0>  MFA: {supervisor_cushion,start_link,                           [ns_server,5000,infinity,ns_port_server,start_link,                            [#Fun<ns_child_ports_sup.4.11883217>]]} 

       


       

      Testrunner test to reproduce

      ./testrunner -i /data/workspace/debian-p0-nserv-vset00-00-rebalanceinout_P1_Set2-ephemeral/testexec.49992.ini -p get-cbcollect-info=False,bucket_type=ephemeral,GROUP=P1_Set2,get-cbcollect-info=True,sirius_url=http://172.23.120.103:4000 -t rebalance.rebalanceinout.RebalanceInOutTests.test_rebalance_in_out_at_once,default_bucket=True,standard_buckets=1,sasl_buckets=1,replicas=1,nodes_init=4,nodes_in=2,nodes_out=1,items=1000000,max_verify=50000,GROUP=IN_OUT;P1;PERFORMANCE;P1_Set2

      Job name : debian-nserv-rebalanceinout_P1_Set2-ephemeral

      Job ref : http://cb-logs-qe.s3-website-us-west-2.amazonaws.com/7.2.4-7059/jenkins_logs/test_suite_executor/652721/

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            raghav.sk Raghav S K
            raghav.sk Raghav S K
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty