Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-27491

[functional] : Eventing Rebalance hangs when rebalance is tried after eventing rebalance failure after memcache/reboot/eventing-consumer crash

    XMLWordPrintable

Details

    Description

      Script to Repro the test:

      def test_memcache_crash_on_kv_and_eventing_node_during_eventing_rebalance(self):
              gen_load_del = copy.deepcopy(self.gens_load)
              kv_node = self.get_nodes_from_services_map(service_type="kv", get_all_nodes=False)
              eventing_node = self.get_nodes_from_services_map(service_type="eventing", get_all_nodes=False)
              body = self.create_save_function_body(self.function_name, self.handler_code)
              self.deploy_function(body)
              try:
                  # load some data
                  task = self.cluster.async_load_gen_docs(self.master, self.src_bucket_name, self.gens_load,
                                                          self.buckets[0].kvs[1], 'create')
                  # rebalance in a eventing node when eventing is processing mutations
                  services_in = ["eventing"]
                  rebalance = self.cluster.async_rebalance(self.servers[:self.nodes_init], [self.servers[self.nodes_init]], [],
                                                           services=services_in)
                  self.sleep(15)
                  reached = RestHelper(self.rest).rebalance_reached(percentage=30)
                  # kill memcached on kv and eventing when eventing is processing mutations
                  for node in [kv_node, eventing_node]:
                      self.kill_memcached_service(node)
                  self.assertTrue(reached, "rebalance failed, stuck or did not complete")
                  rebalance.result()
                  task.result()
              except Exception, ex:
                  log.info("rebalance failed as expected after memcached got killed: {0}".format(str(ex)))
              # delete json documents
              self.load(gen_load_del, buckets=self.src_bucket, flag=self.item_flag, verify_data=False,
                        batch_size=self.batch_size, op_type='delete')
              # Wait for eventing to catch up with all the delete mutations and verify results
              # This is required to ensure eventing works after rebalance goes through successfully
              self.verify_eventing_results(self.function_name, 0, skip_stats_validation=True)
              self.undeploy_and_delete_function(body)
      

      Will checkin once the review is done.

      ./testrunner -i b/temp_all.ini -t eventing.eventing_rebalance.EventingRebalance.test_memcache_crash_on_kv_and_eventing_node_during_eventing_rebalance,doc-per-day=10,dataset=default,nodes_init=5,services_init=kv-kv-eventing-eventing-index:n1ql,groups=simple,reset_services=True,skip_cleanup=True,GROUP=bucket_op
      

      Logs attached.

      2018-01-09 12:20:22 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 10.111.170.102
      2018-01-09 12:20:23 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.111.170.102: kill -9 $(ps aux | pgrep 'memcached')
      2018-01-09 12:20:24 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
      2018-01-09 12:20:24 | INFO | MainProcess | test_thread | [remote_util.__init__] connecting to 10.111.170.104 with username:root password:couchbase ssh_key:
      2018-01-09 12:20:24 | ERROR | MainProcess | load_gen_task | [data_helper._parse_not_my_vbucket_error] Connection reset with error: [Errno 54] Connection reset by peer while _send_op, server is alive?
      2018-01-09 12:20:24 | INFO | MainProcess | test_thread | [remote_util.__init__] Connected to 10.111.170.104
      2018-01-09 12:20:24 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] running command.raw on 10.111.170.104: kill -9 $(ps aux | pgrep 'memcached')
      2018-01-09 12:20:24 | INFO | MainProcess | test_thread | [remote_util.execute_command_raw] command executed successfully
      2018-01-09 12:20:26 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance params : {'password': 'password', 'ejectedNodes': '', 'user': 'Administrator', 'knownNodes': u'ns_1@10.111.170.101,ns_1@10.111.170.103,ns_1@10.112.170.102,ns_1@10.112.170.101,ns_1@10.111.170.104,ns_1@10.111.170.102'}
      2018-01-09 12:20:26 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance operation started
      2018-01-09 12:20:26 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 0.00 %
      2018-01-09 12:20:26 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 0
      2018-01-09 12:20:26 | INFO | MainProcess | load_gen_task | [data_helper.direct_client] creating direct client 10.111.170.101:11210 src_bucket
      2018-01-09 12:20:27 | INFO | MainProcess | load_gen_task | [data_helper.direct_client] creating direct client 10.111.170.102:11210 src_bucket
      [('/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py', 783, '__bootstrap', 'self.__bootstrap_inner()'), ('/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py', 810, '__bootstrap_inner', 'self.run()'), ('lib/tasks/task.py', 1047, 'run', 'self.run_normal_throughput_mode()'), ('lib/tasks/task.py', 1061, 'run_normal_throughput_mode', 'self.next()'), ('lib/tasks/task.py', 967, 'next', 'self._unlocked_create(partition, key, value, is_base64_value=is_base64_value)'), ('lib/tasks/task.py', 727, '_unlocked_create', 'self.set_exception(error)'), ('lib/tasks/future.py', 264, 'set_exception', 'print traceback.extract_stack()')]
      Tue Jan  9 12:20:28 2018
      2018-01-09 12:20:37 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 11.11 %
      2018-01-09 12:20:37 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 11.1111111111
      2018-01-09 12:20:48 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 22.22 %
      2018-01-09 12:20:48 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 22.2222222222
      2018-01-09 12:20:59 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 33.00 %
      2018-01-09 12:20:59 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 33
      2018-01-09 12:21:09 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 44.74 %
      2018-01-09 12:21:09 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 44.7352854185
      2018-01-09 12:21:19 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 53.34 %
      2018-01-09 12:21:19 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 53.3422064478
      2018-01-09 12:21:29 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 65.59 %
      2018-01-09 12:21:29 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 65.587104407
      2018-01-09 12:21:40 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 72.77 %
      2018-01-09 12:21:40 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 72.7743271222
      2018-01-09 12:21:50 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 80.45 %
      2018-01-09 12:21:50 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 80.4495711328
      2018-01-09 12:22:01 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:22:01 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:22:11 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:22:11 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:22:21 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:22:21 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:22:32 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:22:32 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:22:45 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:22:45 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:22:57 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:22:57 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:23:08 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:23:08 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:23:18 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:23:18 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:23:29 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:23:29 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:23:40 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:23:40 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:23:51 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:23:51 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:24:01 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:24:01 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:24:12 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:24:12 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:24:22 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:24:22 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:24:33 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:24:33 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:24:43 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:24:43 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:24:54 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:24:54 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:25:04 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:25:04 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:25:14 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:25:14 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:25:25 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:25:25 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:25:35 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:25:35 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:25:45 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:25:45 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:25:55 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:25:55 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:26:06 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:26:06 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:26:16 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      2018-01-09 12:26:16 | INFO | MainProcess | Cluster_Thread | [task.check] Rebalance - status: running, progress: 83.288967761
      2018-01-09 12:26:27 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_status_and_progress] rebalance percentage : 83.29 %
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            asingh Abhishek Singh (Inactive)
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty