Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59722

[Rebalance] : Rebalance fails with reason {pre_rebalance_janitor_run_failed,"default",{error,wait_for_memcached_failed

    XMLWordPrintable

Details

    Description

      Steps to reproduce

      1. Created a 7 node kv cluster

      2. Created a magma bucket named default with 2 replicas

      3. Loaded 100000 items onto it

      4. Rebalanced out 5 nodes

       

      Rebalance fails

      2023-11-19T20:25:03.690-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.123.44) - Rebalance exited with reason {pre_rebalance_janitor_run_failed,"default",                                 {error,wait_for_memcached_failed,                                     ['ns_1@172.23.107.26']}}.Rebalance Operation Id = 58aed04cccddc2abfde88ea0fabf15ac 

       

      Observing these messages on ns_server.debug.log

      [ns_server:info,2023-11-19T20:25:03.689-08:00,ns_1@172.23.123.44:rebalance_agent<0.11108.5>:rebalance_agent:handle_down:290]Rebalancer process <0.26893.5> died (reason {pre_rebalance_janitor_run_failed,                                             "default",                                             {error,                                              wait_for_memcached_failed,                                              ['ns_1@172.23.107.26']}}).[ns_server:debug,2023-11-19T20:25:03.689-08:00,ns_1@172.23.123.44:leader_activities<0.10959.5>:leader_activities:handle_activity_down:457]Activity terminated with reason {shutdown,                                 {async_died,                                  {raised,                                   {exit,                                    {pre_rebalance_janitor_run_failed,                                     "default",                                     {error,wait_for_memcached_failed,                                      ['ns_1@172.23.107.26']}},                                    [{ns_rebalancer,                                      run_janitor_pre_rebalance,1,                                      [{file,"src/ns_rebalancer.erl"},                                       {line,700}]},                                     {lists,foreach_1,2,                                      [{file,"lists.erl"},{line,1442}]},                                     {ns_rebalancer,rebalance_body,7,                                      [{file,"src/ns_rebalancer.erl"},                                       {line,483}]},                                     {async,'-async_init/4-fun-1-',3,                                      [{file,"src/async.erl"},                                       {line,199}]}]}}}}. Activity:{activity,<0.26892.5>,#Ref<0.1637097028.2274623492.193245>,default,          <<"6e25e56994664fe45c0efcad77caed30">>,          [rebalance],          majority,[]}[error_logger:error,2023-11-19T20:25:03.690-08:00,ns_1@172.23.123.44:<0.26889.5>:ale_error_logger_handler:do_log:101]=========================CRASH REPORT=========================  crasher:    initial call: erlang:apply/2    pid: <0.26889.5>    registered_name: []    exception exit: {pre_rebalance_janitor_run_failed,"default",                        {error,wait_for_memcached_failed,                            ['ns_1@172.23.107.26']}}      in function  ns_rebalancer:run_janitor_pre_rebalance/1 (src/ns_rebalancer.erl, line 700)      in call from lists:foreach_1/2 (lists.erl, line 1442)      in call from ns_rebalancer:rebalance_body/7 (src/ns_rebalancer.erl, line 483)      in call from async:'-async_init/4-fun-1-'/3 (src/async.erl, line 199)    ancestors: [<0.11062.5>,ns_orchestrator_child_sup,ns_orchestrator_sup,                  mb_master_sup,mb_master,leader_registry_sup,                  leader_services_sup,<0.10956.5>,ns_server_sup,                  ns_server_nodes_sup,<0.10486.5>,ns_server_cluster_sup,                  root_sup,<0.155.0>]    message_queue_len: 0    messages: []    links: [<0.11062.5>]    dictionary: []    trap_exit: false    status: running    heap_size: 28690    stack_size: 28    reductions: 3104  neighbours:
      [user:error,2023-11-19T20:25:03.690-08:00,ns_1@172.23.123.44:<0.11062.5>:ns_orchestrator:log_rebalance_completion:1660]Rebalance exited with reason {pre_rebalance_janitor_run_failed,"default",                                 {error,wait_for_memcached_failed,                                     ['ns_1@172.23.107.26']}}.Rebalance Operation Id = 58aed04cccddc2abfde88ea0fabf15ac 

       

       


       

      TAF Script to reproduce

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /data/workspace/debian-p0-durability-vset00-00-rebalance_out_persist_majority_6.5_P1/testexec.55849.ini num_items=100000,GROUP=P1;durability,durability=PERSIST_TO_MAJORITY,upgrade_version=7.6.0-1813,sirius_url=http://172.23.120.103:4000 -t rebalance_new.rebalance_out.RebalanceOutTests.rebalance_out_with_warming_up,max_verify=100000,value_size=1024,get-cbcollect-info=True,replicas=2,durability=PERSIST_TO_MAJORITY,log_level=info,upgrade_version=7.6.0-1813,GROUP=P1;durability,nodes_init=7,nodes_out=5,num_items=100000,sirius_url=http://172.23.120.103:4000,infra_log_level=info'

      Job name : debian-durability_rebalance_out_persist_majority_6.5_P1

      Job ref link : http://cb-logs-qe.s3-website-us-west-2.amazonaws.com/7.6.0-1813/jenkins_logs/test_suite_executor-TAF/287297/

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            raghav.sk Raghav S K
            raghav.sk Raghav S K
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty