Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44420

[Collections] - Collection CRUD + DGM + Swap rebalance fails with dcp_wait_for_data_move_failed

    XMLWordPrintable

Details

    • Triaged
    • Centos 64-bit
    • 1
    • Yes

    Description

      Script to Repro

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops111111111111124244224.ini rerun=False,quota_percent=95,crash_warning=True -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_rebalance_in_out,nodes_init=4,nodes_in=2,nodes_out=1,bucket_spec=dgm.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,dgm=55,GROUP=rebalance_with_collection_crud_dgm'
      

      Note to self: Even though above test is rebalance_in_out since we had only 5 nodes in .ini file it worked as a swap rebalance.

      Steps to Repro
      1) Create a 4 node cluster

      Nodes Services Version CPU Status
      172.23.106.209 kv 7.0.0-4486-enterprise 1.16468378209 Cluster node
      172.23.106.225 None     <--- IN —
      172.23.106.232 None     <--- IN —
      172.23.106.239 None     <--- IN —

      2) Create bucket/scope/collections/data
      2021-02-18 00:56:15,555 | test | INFO | MainThread | [table_view:display:72] Bucket statistics

      Bucket Type Replicas Durability TTL Items RAM Quota RAM Used Disk Used
      default couchbase 2 none 0 0 3355443200 231591872 0

      3) Push the bucket to DGM

      2021-02-18 01:00:20,690 | test  | INFO    | pool-5-thread-11 | [task:_load_bucket_into_dgm:2073] Active_resident_items_ratio for default is 100
      2021-02-18 01:00:20,690 | test  | INFO    | pool-5-thread-11 | [task:_load_bucket_into_dgm:2075] Replica_resident_items_ratio for default is 56.9882555556
      2021-02-18 01:00:22,506 | test  | INFO    | pool-5-thread-11 | [task:_load_bucket_into_dgm:2079] Active DGM 100% Replica DGM 45.4014552547% achieved for 'default'. Loaded docs: 4240000
      

      4) Add a node(172.23.106.246) and remove a node(172.23.106.239) and start a swap rebalance.

      Swap rebalance fails with the following error.

      2021-02-18 01:00:43,309 | test  | INFO    | pool-5-thread-2 | [rest_client:print_UI_logs:2599] Latest logs from UI on 172.23.106.209:
      2021-02-18 01:00:43,309 | test  | ERROR   | pool-5-thread-2 | [rest_client:print_UI_logs:2601] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.106.209', u'tstamp': 1613638839746L, u'shortText': u'message', u'serverTime': u'2021-02-18T01:00:39.746Z', u'text': u'Rebalance exited with reason {mover_crashed,\n                              {unexpected_exit,\n                               {\'EXIT\',<0.24171.1>,\n                                {{dcp_wait_for_data_move_failed,"default",\n                                  481,\'ns_1@172.23.106.225\',\n                                  [\'ns_1@172.23.106.246\',\n                                   \'ns_1@172.23.106.232\'],\n                                  {error,no_stats_for_this_vbucket}},\n                                 [{ns_single_vbucket_mover,\n                                   \'-wait_dcp_data_move/5-fun-0-\',5,\n                                   [{file,"src/ns_single_vbucket_mover.erl"},\n                                    {line,465}]},\n                                  {proc_lib,init_p,3,\n                                   [{file,"proc_lib.erl"},{line,234}]}]}}}}.\nRebalance Operation Id = 375f8fa5c330e3fc0660b209a82b1784'}
      

      2021-02-18 01:00:43,311 | test  | ERROR   | pool-5-thread-2 | [rest_client:print_UI_logs:2601] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.106.209', u'tstamp': 1613638839653L, u'shortText': u'message', u'serverTime': u'2021-02-18T01:00:39.653Z', u'text': u'Worker <0.24154.1> (for action {move,{481,\n                                      [\'ns_1@172.23.106.225\',\n                                       \'ns_1@172.23.106.239\',\n                                       \'ns_1@172.23.106.232\'],\n                                      [\'ns_1@172.23.106.225\',\n                                       \'ns_1@172.23.106.246\',\n                                       \'ns_1@172.23.106.232\'],\n                                      []}}) exited with reason {unexpected_exit,\n                                                                {\'EXIT\',\n                                                                 <0.24171.1>,\n                                                                 {{dcp_wait_for_data_move_failed,\n                                                                   "default",\n                                                                   481,\n                                                                   \'ns_1@172.23.106.225\',\n                                                                   [\'ns_1@172.23.106.246\',\n                                                                    \'ns_1@172.23.106.232\'],\n                                                                   {error,\n                                                                    no_stats_for_this_vbucket}},\n                                                                  [{ns_single_vbucket_mover,\n                                                                    \'-wait_dcp_data_move/5-fun-0-\',\n                                                                    5,\n                                                                    [{file,\n                                                                      "src/ns_single_vbucket_mover.erl"},\n                                                                     {line,\n                                                                      465}]},\n                                                                   {proc_lib,\n                                                                    init_p,3,\n                                                                    [{file,\n                                                                      "proc_lib.erl"},\n                                                                     {line,\n                                                                      234}]}]}}}'}
      

      cbcollect_info attached. This test had passed on 7.0.0-4454.
       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty