Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-37198

failover + recovery + rebalance with views fails

    XMLWordPrintable

Details

    Description

      Script to Repro

      ./testrunner -i /tmp/testexec.12412.ini -p get-cbcollect-info=False,get-cbcollect-info=True -t rebalance.rebalance_progress.RebalanceProgressTests.test_progress_add_back_after_failover,nodes_init=4,nodes_out=1,GROUP=P1,blob_generator=false
      

      Steps to Repro
      1) Create a 4 node cluster

      [2019-12-05 16:12:12,135] - [rest_client:1503] INFO - rebalance params : {'password': 'password', 'ejectedNodes': '', 'user': 'Administrator', 'knownNodes': u'ns_1@172.23.105.237,ns_1@172.23.105.236,ns_1@172.23.97.66,ns_1@172.23.97.65'}
      

      2) Load data
      3) Create views

      [2019-12-05 16:12:39,668] - [rest_client:552] INFO - index query url: http://172.23.105.236:8092/default/_design/default_view/_view/default_view0?stale=ok
      [2019-12-05 16:12:39,783] - [task:2344] INFO - view : default_view0 was created successfully in ddoc: default_view
      [2019-12-05 16:12:39,790] - [rest_client:552] INFO - index query url: http://172.23.105.236:8092/default/_design/default_view/_view/default_view1?stale=ok
      [2019-12-05 16:12:39,799] - [task:2344] INFO - view : default_view1 was created successfully in ddoc: default_view
      [2019-12-05 16:12:39,806] - [rest_client:552] INFO - index query url: http://172.23.105.236:8092/default/_design/default_view/_view/default_view2?stale=ok
      [2019-12-05 16:12:39,815] - [task:2344] INFO - view : default_view2 was created successfully in ddoc: default_view
      

      4)Start failover

      [2019-12-05 16:12:44,664] - [rest_client:1448] INFO - fail_over node ns_1@172.23.97.66 successful
      [2019-12-05 16:12:44,664] - [task:3508] INFO - 0 seconds sleep after failover, for nodes to go pending....
      

      5)Do recovery

      [2019-12-05 16:12:44,695] - [rest_client:1481] INFO - add_back_node ns_1@172.23.97.66 successful
      

      6)Start rebalance

      [2019-12-05 16:12:45,704] - [rest_client:1503] INFO - rebalance params : {'password': 'password', 'ejectedNodes': '', 'user': 'Administrator', 'knownNodes': u'ns_1@172.23.105.237,ns_1@172.23.105.236,ns_1@172.23.97.66,ns_1@172.23.97.65'}
      

      While monitoring rebalance progress rebalance fails as shown below.

      [2019-12-05 16:12:55,786] - [rest_client:3330] INFO - Latest logs from UI on 172.23.105.236:
      [2019-12-05 16:12:55,786] - [rest_client:3331] ERROR - {u'node': u'ns_1@172.23.105.236', u'code': 0, u'text': u'Rebalance exited with reason {mover_crashed,\n                              {unexpected_exit,\n                               {\'EXIT\',<0.6111.53>,\n                                {{error,\n                                  {badrpc,\n                                   {\'EXIT\',\n                                    {{{{badmatch,{error,dcp_conn_closed}},\n                                       [{couch_set_view_group,\n                                         process_monitor_partition_update,4,\n                                         [{file,\n                                           "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/couch_set_view/src/couch_set_view_group.erl"},\n                                          {line,3725}]},\n                                        {couch_set_view_group,handle_call,3,\n                                         [{file,\n                                           "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/couch_set_view/src/couch_set_view_group.erl"},\n                                          {line,934}]},\n                                        {gen_server,try_handle_call,4,\n                                         [{file,"gen_server.erl"},{line,636}]},\n                                        {gen_server,handle_msg,6,\n                                         [{file,"gen_server.erl"},{line,665}]},\n                                        {proc_lib,init_p_do_apply,3,\n                                         [{file,"proc_lib.erl"},{line,247}]}]},\n                                      {gen_server,call,\n                                       [<12906.523.0>,\n                                        {monitor_partition_update,1022,\n                                         #Ref<12906.2127266913.807927810.106982>,\n                                         <12906.570.0>},\n                                        infinity]}},\n                                     {gen_server,call,\n                                      [\'capi_set_view_manager-default\',\n                                       {wait_index_updated,1020},\n                                       infinity]}}}}},\n                                 {gen_server,call,\n                                  [{\'janitor_agent-default\',\n                                    \'ns_1@172.23.97.66\'},\n                                   {if_rebalance,<0.5992.53>,\n                                    {wait_index_updated,1022}},\n                                   infinity]}}}}}.\nRebalance Operation Id = 81d9bb9bcc412a00769c9b7caf5f3683', u'shortText': u'message', u'serverTime': u'2019-12-05T16:12:50.209Z', u'module': u'ns_orchestrator', u'tstamp': 1575591170209, u'type': u'critical'}
      

      cbcollect_info attached.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Closing this as duplicate of https://issues.couchbase.com/browse/MB-37070

          ankit.prabhu Ankit Prabhu added a comment - Closing this as duplicate of  https://issues.couchbase.com/browse/MB-37070

          People

            dfinlay Dave Finlay
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty