Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59828

[Rebalance] : Rebalance failed with reason {pre_rebalance_janitor_run_failed,"bucket6",{error,wait_for_memcached_failed

    XMLWordPrintable

Details

    Description

      Steps to repro

      1. Created a 4 node kv cluster
      2. Created 10 buckets with different configurations
      3. Created 5 scopes per bucket and 20 collections per scope
      4. Loaded data onto each collection (Around 4000 docs onto each collection)
      5. Added in another kv node and started a rebalance
      6. Stopped the rebalance
      7. Started the rebalance again - Rebalance fails at this point
      8. Rebalance was re-tried - Rebalance succeeds

      Rebalance failed with this reason

      2023-11-26T00:10:10.905-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.121.71) - Rebalance exited with reason {pre_rebalance_janitor_run_failed,"bucket6",                                 {error,wait_for_memcached_failed,                                     ['ns_1@172.23.96.168',                                      'ns_1@172.23.96.196',                                      'ns_1@172.23.96.220']}}. 

      Observing crash report in ns_server.debug.log

      [ns_server:info,2023-11-26T00:10:10.904-08:00,ns_1@172.23.121.71:rebalance_agent<0.13933.0>:rebalance_agent:handle_down:290]Rebalancer process <0.17257.166> died (reason {pre_rebalance_janitor_run_failed,                                               "bucket6",                                               {error,                                                wait_for_memcached_failed,                                                ['ns_1@172.23.96.168',                                                 'ns_1@172.23.96.196',                                                 'ns_1@172.23.96.220']}}).[ns_server:debug,2023-11-26T00:10:10.905-08:00,ns_1@172.23.121.71:leader_activities<0.13866.0>:leader_activities:handle_activity_down:457]Activity terminated with reason {shutdown,                                 {async_died,                                  {raised,                                   {exit,                                    {pre_rebalance_janitor_run_failed,                                     "bucket6",                                     {error,wait_for_memcached_failed,                                      ['ns_1@172.23.96.168',                                       'ns_1@172.23.96.196',                                       'ns_1@172.23.96.220']}},                                    [{ns_rebalancer,                                      run_janitor_pre_rebalance,1,                                      [{file,"src/ns_rebalancer.erl"},                                       {line,699}]},                                     {lists,foreach_1,2,                                      [{file,"lists.erl"},{line,1442}]},                                     {ns_rebalancer,rebalance_body,7,                                      [{file,"src/ns_rebalancer.erl"},                                       {line,482}]},                                     {async,'-async_init/4-fun-1-',3,                                      [{file,"src/async.erl"},                                       {line,199}]}]}}}}. Activity:{activity,<0.17171.166>,#Ref<0.850806963.699138052.15902>,default,          <<"dad76b2b4817f7b78cb2685e3aa20d76">>,          [rebalance],          majority,[]}[error_logger:error,2023-11-26T00:10:10.905-08:00,ns_1@172.23.121.71:<0.17080.166>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: erlang:apply/2    pid: <0.17080.166>    registered_name: []    exception exit: {pre_rebalance_janitor_run_failed,"bucket6",                        {error,wait_for_memcached_failed,                            ['ns_1@172.23.96.168','ns_1@172.23.96.196',                             'ns_1@172.23.96.220']}}      in function  ns_rebalancer:run_janitor_pre_rebalance/1 (src/ns_rebalancer.erl, line 699)      in call from lists:foreach_1/2 (lists.erl, line 1442)      in call from ns_rebalancer:rebalance_body/7 (src/ns_rebalancer.erl, line 482)      in call from async:'-async_init/4-fun-1-'/3 (src/async.erl, line 199)    ancestors: [<0.13893.0>,ns_orchestrator_child_sup,ns_orchestrator_sup,                  mb_master_sup,mb_master,leader_registry_sup,                  leader_services_sup,<0.13863.0>,ns_server_sup,                  ns_server_nodes_sup,<0.10331.0>,ns_server_cluster_sup,                  root_sup,<0.155.0>]    message_queue_len: 0    messages: []    links: [<0.13893.0>]    dictionary: []    trap_exit: false    status: running    heap_size: 196650    stack_size: 28    reductions: 11050  neighbours:
      [user:error,2023-11-26T00:10:10.905-08:00,ns_1@172.23.121.71:<0.13893.0>:ns_orchestrator:log_rebalance_completion:1660]Rebalance exited with reason {pre_rebalance_janitor_run_failed,"bucket6",                                 {error,wait_for_memcached_failed,                                     ['ns_1@172.23.96.168',                                      'ns_1@172.23.96.196',                                      'ns_1@172.23.96.220']}}.Rebalance Operation Id = e76b052c6ae2e4f24108ad3c9aa363a4 

      Rebalance succeeds when it was re-tried

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-59828
          # Subject Branch Project Status CR V

          Activity

            People

              raghav.sk Raghav S K
              raghav.sk Raghav S K
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There is 1 open Gerrit change

                  PagerDuty