Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-55930

CDC: Rebalance failed with reason 'dcp_wait_for_data_move_failed::ns_single_vbucket_mover'

    XMLWordPrintable

Details

    Description

      Build: 7.2.0-5242

      Steps:

      • Cluster setup

        +----------------+-----------------+-----------+-----------+---------------------+
        | Node           | CPU_utilization | Mem_total | Mem_free  | Swap_mem_used       |
        +----------------+-----------------+-----------+-----------+---------------------+
        | 172.23.105.190 | 0.30401924313   | 11.74 GiB | 11.08 GiB | 0.0 Byte / 4.10 GiB |
        | 172.23.105.62  | 0.330383461797  | 11.74 GiB | 11.10 GiB | 0.0 Byte / 0.0 Byte |
        | 172.23.105.217 | 1.13476637941   | 11.74 GiB | 11.10 GiB | 0.0 Byte / 4.10 GiB |
        | 172.23.100.43  | 1.31964474002   | 11.74 GiB | 11.08 GiB | 0.0 Byte / 4.10 GiB |
        +----------------+-----------------+-----------+-----------+---------------------++---------+-----------+-----------------+----------+-----------+
        | Bucket  | Type      | Storage Backend | Replicas | RAM Quota |
        +---------+-----------+-----------------+----------+-----------+
        | bucket1 | couchbase | couchstore      | 1        | 0.0 Byte  |
        | bucket2 | couchbase | magma           | 1        | 3.91 GiB  |
        | default | couchbase | magma           | 1        | 0.0 Byte  |
        +---------+-----------+-----------------+----------+-----------+

      • Load initial load + historical data (regular updates on initial load)
      • Start dedupe data loading on few collections and
      • Rebalance-IN 2 nodes into the cluster

        +----------------+---------------+--------------+-----------------------+
        | Nodes          | CPU           | Status       | Membership / Recovery |
        +----------------+---------------+--------------+-----------------------+
        | 172.23.105.190 | 75.1820097046 | Cluster node | active / none         |
        | 172.23.105.62  | 79.7942954469 | Cluster node | active / none         |
        | 172.23.105.217 | 72.8192887838 | Cluster node | active / none         |
        | 172.23.100.43  | 64.2178897844 | Cluster node | active / none         |
        | 172.23.105.254 |               | <--- IN ---  |                       |
        | 172.23.105.47  |               | <--- IN ---  |                       |
        +----------------+---------------+--------------+-----------------------+

        Operation id: 328e71a3f744ab082224fe7a6e339a39

      Observation:

      Rebalance failed with following reason,

      {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'type': u'rebalance', u'masterRequestTimedOut': False,
       u'statusId': u'32a933abd577817e8d080eb83565e5a1', u'subtype': u'rebalance', u'statusIsStale': False,
       u'lastReportURI': u'/logs/rebalanceReport?reportID=8dfc6fa06b9407ea33b5d3a13dff4b59',
       u'status': u'notRunning'} - rebalance failed
      Latest logs from UI on 172.23.100.43:
      {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.100.43', u'tstamp': 1678550947612L, u'shortText': u'message',
        u'serverTime': u'2023-03-11T08:09:07.612Z', u'text': u'Rebalance exited with reason
          {mover_crashed,{unexpected_exit,{\'EXIT\',<0.30436.6>,
              {{dcp_wait_for_data_move_failed,"default",485,\'ns_1@172.23.105.190\',[\'ns_1@172.23.105.254\'],{error,no_stats_for_this_vbucket}},
              [{ns_single_vbucket_mover,\'-wait_dcp_data_move/5-fun-0-\',5,[{file,"src/ns_single_vbucket_mover.erl"},{line,451}]},
               {proc_lib,init_p,3,[{file,"proc_lib.erl"},{line,211}]}]}}}}.
         Operation Id = 328e71a3f744ab082224fe7a6e339a39'}
      {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.100.43', u'tstamp': 1678550947567L, u'shortText': u'message',
        u'serverTime': u'2023-03-11T08:09:07.567Z',
       u'text': u'Worker <0.30413.6> (for action {move,{485,[\'ns_1@172.23.105.190\',\'ns_1@172.23.105.62\'],[\'ns_1@172.23.105.190\',\'ns_1@172.23.105.254\'],[]}})
                  exited with reason {unexpected_exit,{\'EXIT\',<0.30436.6>,{
                      {dcp_wait_for_data_move_failed,"default",485,\'ns_1@172.23.105.190\',[\'ns_1@172.23.105.254\'],{error,no_stats_for_this_vbucket}},
                      [{ns_single_vbucket_mover,\'-wait_dcp_data_move/5-fun-0-\',5,[{file,"src/ns_single_vbucket_mover.erl"},{line,451}]},
                       {proc_lib,init_p,3,[{file,"proc_lib.erl"},{line,211}]}]}}}'}
      {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'info', u'node': u'ns_1@172.23.100.43', u'tstamp': 1678550897575L, u'shortText': u'message',
        u'serverTime': u'2023-03-11T08:08:17.575Z', u'text': u'Bucket "default" rebalance does not seem to be swap rebalance'}
      {u'code': 0, u'module': u'ns_memcached', u'type': u'info', u'node': u'ns_1@172.23.105.47', u'tstamp': 1678550894877L, u'shortText': u'message',
        u'serverTime': u'2023-03-11T08:08:14.877Z', u'text': u'Bucket "default" loaded on node \'ns_1@172.23.105.47\' in 0 seconds.'}
      {u'code': 0, u'module': u'ns_memcached', u'type': u'info', u'node': u'ns_1@172.23.105.254', u'tstamp': 1678550894874L, u'shortText': u'message',
        u'serverTime': u'2023-03-11T08:08:14.874Z', u'text': u'Bucket "default" loaded on node \'ns_1@172.23.105.254\' in 0 seconds.'}
      {u'code': 0, u'module': u'ns_rebalancer', u'type': u'info', u'node': u'ns_1@172.23.100.43', u'tstamp': 1678550894774L, u'shortText': u'message',
        u'serverTime': u'2023-03-11T08:08:14.774Z', u'text': u'Started rebalancing bucket default'}
      {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'info', u'node': u'ns_1@172.23.100.43', u'tstamp': 1678550879550L, u'shortText': u'message',
        u'serverTime': u'2023-03-11T08:07:59.550Z', u'text': u'Bucket "bucket2" rebalance does not seem to be swap rebalance'}
      {u'code': 0, u'module': u'ns_memcached', u'type': u'info', u'node': u'ns_1@172.23.105.47', u'tstamp': 1678550877146L, u'shortText': u'message',
        u'serverTime': u'2023-03-11T08:07:57.146Z', u'text': u'Bucket "bucket2" loaded on node \'ns_1@172.23.105.47\' in 0 seconds.'}
      {u'code': 0, u'module': u'ns_memcached', u'type': u'info', u'node': u'ns_1@172.23.105.254', u'tstamp': 1678550877075L, u'shortText': u'message',
        u'serverTime': u'2023-03-11T08:07:57.075Z', u'text': u'Bucket "bucket2" loaded on node \'ns_1@172.23.105.254\' in 0 seconds.'}
      {u'code': 0, u'module': u'ns_rebalancer', u'type': u'info', u'node': u'ns_1@172.23.100.43', u'tstamp': 1678550876966L, u'shortText': u'message',
        u'serverTime': u'2023-03-11T08:07:56.966Z', u'text': u'Started rebalancing bucket bucket2'}
      Rebalance Failed: {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'type': u'rebalance', u'masterRequestTimedOut': False, u'statusId': u'32a933abd577817e8d080eb83565e5a1', u'subtype': u'rebalance', u'statusIsStale': False, u'lastReportURI': u'/logs/rebalanceReport?reportID=8dfc6fa06b9407ea33b5d3a13dff4b59', u'status': u'notRunning'} - rebalance failed
      

      TAF test:

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.123746.ini GROUP=rebalance_crud_on_collections,rerun=False,disk_optimized_thread_settings=True,get-cbcollect-info=True,autoCompactionDefined=true,dedupe_update_itrs=10000,upgrade_version=7.2.0-5242 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_rebalance_in,nodes_init=4,nodes_in=2,bucket_spec=magma_dgm.10_percent_dgm.4_node_1_replica_magma_512,doc_size=512,randomize_value=True,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,skip_validations=False,default_history_retention_for_collections=false,bucket_history_retention_seconds=60,bucket_history_retention_bytes=100000000000,GROUP=rebalance_in;rebalance_crud_on_collections'
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              ashwin.govindarajulu Ashwin Govindarajulu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty