Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-23507

Rebalance fails while adding back node after a node has been autofailed over duer to machine restart

    XMLWordPrintable

Details

    Description

      1. Create a cluster with 3+ nodes and bucket in it.
      2. Enable autofailover and set timeout to 5 sec
      3. Restart one of the machines in the cluster
      4. wait for auto failover and the machine to restart
      5. Add back the node to the cluster using either delta or full recovery
      6. Rebalance the cluster

      The rebalance fails with following error stack.

      Rebalance exited with reason \{unexpected_exit,\n                              \{\'EXIT\',<0.5036.1>,\n                               \{bulk_set_vbucket_state_failed,\n                                [\{\'ns_1@172.23.98.81\',\n                                  \{\'EXIT\',\n                                   \{\{\{\{\{case_clause,\n                                        \{error,\n                                         \{\{\{badmatch,\n                                            \{error,\n                                             \{\{badmatch,\{error,ehostunreach}},\n                                              [\{mc_replication,connect,1,\n                                                [\{file,\n                                                  "src/mc_replication.erl"},\n                                                 \{line,30}]},\n                                               \{mc_replication,connect,1,\n                                                [\{file,\n                                                  "src/mc_replication.erl"},\n                                                 \{line,49}]},\n                                               \{dcp_proxy,connect,5,\n                                                [\{file,"src/dcp_proxy.erl"},\n                                                 \{line,218}]},\n                                               \{dcp_proxy,maybe_connect,2,\n                                                [\{file,"src/dcp_proxy.erl"},\n                                                 \{line,201}]},\n                                               \{dcp_producer_conn,init,2,\n                                                [\{file,\n                                                  "src/dcp_producer_conn.erl"},\n                                                 \{line,31}]},\n                                               \{dcp_proxy,init,1,\n                                                [\{file,"src/dcp_proxy.erl"},\n                                                 \{line,50}]},\n                                               \{gen_server,init_it,6,\n                                                [\{file,"gen_server.erl"},\n                                                 \{line,304}]},\n                                               \{proc_lib,init_p_do_apply,3,\n                                                [\{file,"proc_lib.erl"},\n                                                 \{line,239}]}]}}},\n                                           [\{dcp_replicator,init,1,\n                                             [\{file,"src/dcp_replicator.erl"},\n                                              \{line,50}]},\n                                            \{gen_server,init_it,6,\n                                             [\{file,"gen_server.erl"},\n                                              \{line,304}]},\n                                            \{proc_lib,init_p_do_apply,3,\n                                             [\{file,"proc_lib.erl"},\n                                              \{line,239}]}]},\n                                          \{child,undefined,\n                                           \{\'ns_1@172.23.98.79\',true},\n                                           \{dcp_replicator,start_link,\n                                            [\'ns_1@172.23.98.79\',"default",\n                                             true]},\n                                           temporary,60000,worker,\n                                           [dcp_replicator]}}}},\n                                       [\{dcp_sup,start_replicator,2,\n                                         [\{file,"src/dcp_sup.erl"},\{line,54}]},\n                                        \{dcp_sup,\n                                         \'-manage_replicators/3-lc$^3/1-3-\',2,\n                                         [\{file,"src/dcp_sup.erl"},\{line,81}]},\n                                        \{dcp_replication_manager,handle_call,\n                                         3,\n                                         [\{file,\n                                           "src/dcp_replication_manager.erl"},\n                                          \{line,87}]},\n                                        \{gen_server,handle_msg,5,\n                                         [\{file,"gen_server.erl"},\{line,585}]},\n                                        \{proc_lib,init_p_do_apply,3,\n                                         [\{file,"proc_lib.erl"},\{line,239}]}]},\n                                      \{gen_server,call,\n                                       [\'dcp_replication_manager-default\',\n                                        \{manage_replicators,\n                                         [\'ns_1@172.23.98.79\',\n                                          \'ns_1@172.23.98.80\'],\n                                         true},\n                                        infinity]}},\n                                     \{gen_server,call,\n                                      [\'replication_manager-default\',\n                                       \{change_vbucket_replication,341,\n                                        \'ns_1@172.23.98.79\'},\n                                       infinity]}},\n                                    \{gen_server,call,\n                                     [\{\'janitor_agent-default\',\n                                       \'ns_1@172.23.98.81\'},\n                                      \{if_rebalance,<0.4886.1>,\n                                       \{update_vbucket_state,341,replica,\n                                        undefined,\'ns_1@172.23.98.79\'}},\n                                      infinity]}}}}]}}}', u'shortText': u'message', u'serverTime': u'2017-03-23T07:38:21.581Z', u'module': u'ns_orchestrator', u'tstamp': 1490279901581, u'type': u'critical'}
      [2017-03-23 07:38:27,498] - [rest_client:2800] ERROR - \{u'node': u'ns_1@172.23.98.80', u'code': 0, u'text': u'<0.4912.1> exited with \{unexpected_exit,\n                        \{\'EXIT\',<0.5036.1>,\n                         \{bulk_set_vbucket_state_failed,\n                          [\{\'ns_1@172.23.98.81\',\n                            \{\'EXIT\',\n                             \{\{\{\{\{case_clause,\n                                  \{error,\n                                   \{\{\{badmatch,\n                                      \{error,\n                                       \{\{badmatch,\{error,ehostunreach}},\n                                        [\{mc_replication,connect,1,\n                                          [\{file,"src/mc_replication.erl"},\n                                           \{line,30}]},\n                                         \{mc_replication,connect,1,\n                                          [\{file,"src/mc_replication.erl"},\n                                           \{line,49}]},\n                                         \{dcp_proxy,connect,5,\n                                          [\{file,"src/dcp_proxy.erl"},\n                                           \{line,218}]},\n                                         \{dcp_proxy,maybe_connect,2,\n                                          [\{file,"src/dcp_proxy.erl"},\n                                           \{line,201}]},\n                                         \{dcp_producer_conn,init,2,\n                                          [\{file,"src/dcp_producer_conn.erl"},\n                                           \{line,31}]},\n                                         \{dcp_proxy,init,1,\n                                          [\{file,"src/dcp_proxy.erl"},\n                                           \{line,50}]},\n                                         \{gen_server,init_it,6,\n                                          [\{file,"gen_server.erl"},\n                                           \{line,304}]},\n                                         \{proc_lib,init_p_do_apply,3,\n                                          [\{file,"proc_lib.erl"},\n                                           \{line,239}]}]}}},\n                                     [\{dcp_replicator,init,1,\n                                       [\{file,"src/dcp_replicator.erl"},\n                                        \{line,50}]},\n                                      \{gen_server,init_it,6,\n                                       [\{file,"gen_server.erl"},\{line,304}]},\n                                      \{proc_lib,init_p_do_apply,3,\n                                       [\{file,"proc_lib.erl"},\{line,239}]}]},\n                                    \{child,undefined,\n                                     \{\'ns_1@172.23.98.79\',true},\n                                     \{dcp_replicator,start_link,\n                                      [\'ns_1@172.23.98.79\',"default",true]},\n                                     temporary,60000,worker,\n                                     [dcp_replicator]}}}},\n                                 [\{dcp_sup,start_replicator,2,\n                                   [\{file,"src/dcp_sup.erl"},\{line,54}]},\n                                  \{dcp_sup,\n                                   \'-manage_replicators/3-lc$^3/1-3-\',2,\n                                   [\{file,"src/dcp_sup.erl"},\{line,81}]},\n                                  \{dcp_replication_manager,handle_call,3,\n                                   [\{file,"src/dcp_replication_manager.erl"},\n                                    \{line,87}]},\n                                  \{gen_server,handle_msg,5,\n                                   [\{file,"gen_server.erl"},\{line,585}]},\n                                  \{proc_lib,init_p_do_apply,3,\n                                   [\{file,"proc_lib.erl"},\{line,239}]}]},\n                                \{gen_server,call,\n                                 [\'dcp_replication_manager-default\',\n                                  \{manage_replicators,\n                                   [\'ns_1@172.23.98.79\',\'ns_1@172.23.98.80\'],\n                                   true},\n                                  infinity]}},\n                               \{gen_server,call,\n                                [\'replication_manager-default\',\n                                 \{change_vbucket_replication,341,\n                                  \'ns_1@172.23.98.79\'},\n                                 infinity]}},\n                              \{gen_server,call,\n                               [\{\'janitor_agent-default\',\'ns_1@172.23.98.81\'},\n                                \{if_rebalance,<0.4886.1>,\n                                 \{update_vbucket_state,341,replica,undefined,\n                                  \'ns_1@172.23.98.79\'}},\n                                infinity]}}}}]}}}
      

      The test can be found here : http://qa.sc.couchbase.com/job/cen006-nserv-autofailover-machine-restart/12/consoleFull (tests 17, 18, 19 in the suite are the ones failing due to this issue.)

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            dfinlay Dave Finlay
            bharath.gp Bharath G P
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty