Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44681

[Collections] - Hard failover + delta recovery + rebalance fails

    XMLWordPrintable

Details

    • Triaged
    • Centos 64-bit
    • 1
    • Yes

    Description

      Script to repro

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops.ini rerun=False,get-cbcollect-info=True,quota_percent=95,crash_warning=True,rebalance_moves_per_node=64 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_hard_failover_recovery,nodes_init=5,nodes_failover=2,recovery_type=delta,step_count=1,bucket_spec=multi_bucket.buckets_for_rebalance_tests,data_load_stage=before,GROUP=P0_step_wise_failover_and_recovery'
      

      Steps to Repro
      1) Create a 5 node cluster
      2021-03-01 17:02:46,198 | test | INFO | pool-57-thread-6 | [table_view:display:72] Rebalance Overview
      ---------------------------------------------------------------------

      Nodes Services Version CPU Status

      ---------------------------------------------------------------------

      172.23.98.196 kv 7.0.0-4565-enterprise 12.572436382 Cluster node
      172.23.98.195 None     <--- IN —
      172.23.121.10 None     <--- IN —
      172.23.104.186 None     <--- IN —
      172.23.120.206 None     <--- IN —

      ---------------------------------------------------------------------

      2) Create buckets/scopes/collections/data
      2021-03-01 17:06:37,105 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
      -------------------------------------------------------------------------

      Bucket Type Replicas Durability TTL Items RAM Quota RAM Used Disk Used

      -------------------------------------------------------------------------

      bucket1 couchbase 3 none 0 30000 524288000 152500024 144286641
      bucket2 ephemeral 3 none 0 30000 524288000 102068024 170
      default couchbase 3 none 0 500000 7864320000 631963592 566049046

      -------------------------------------------------------------------------

      3) Change rebalance settings

      2021-03-01 17:06:46,917 | test  | INFO    | MainThread | [cluster_ready_functions:set_rebalance_moves_per_nodes:119] Changed Rebalance settings: {u'rebalanceMovesPerNode': 64}
      

      4) Do a hard failover, start data load and do delta recovery (172.23.104.186 )

      2021-03-01 17:06:46,986 | test  | INFO    | MainThread | [collections_rebalance:rebalance_operation:672] failing over nodes [ip:172.23.104.186 port:8091 ssh_username:root]
      2021-03-01 17:09:07,267 | test  | INFO    | MainThread | [bucket_ready_functions:perform_tasks_from_spec:4645] Performing scope/collection specific operations
      2021-03-01 17:09:28,279 | test  | WARNING | MainThread | [rest_client:get_nodes:1718] 172.23.104.186 - Node not part of cluster inactiveFailed
      

      5) Start rebalance. It fails as shown below.

      2021-03-01 17:10:34,750 | test  | ERROR   | pool-57-thread-22 | [rest_client:_rebalance_status_and_progress:1506] {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'status': u'none'} - rebalance failed
      2021-03-01 17:10:34,839 | test  | INFO    | pool-57-thread-22 | [rest_client:print_UI_logs:2611] Latest logs from UI on 172.23.98.196:
      2021-03-01 17:10:34,841 | test  | ERROR   | pool-57-thread-22 | [rest_client:print_UI_logs:2613] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.98.196', u'tstamp': 1614647431240L, u'shortText': u'message', u'serverTime': u'2021-03-01T17:10:31.240Z', u'text': u'Rebalance exited with reason {{badmatch,\n                                  {error,marking_as_warmed_failed,\n                                      [\'ns_1@172.23.104.186\']}},\n                              [{ns_rebalancer,\n                                   \'-apply_delta_recovery_buckets/3-fun-0-\',2,\n                                   [{file,"src/ns_rebalancer.erl"},\n                                    {line,1089}]},\n                               {lists,foreach,2,\n                                   [{file,"lists.erl"},{line,1338}]},\n                               {ns_rebalancer,apply_delta_recovery_buckets,3,\n                                   [{file,"src/ns_rebalancer.erl"},\n                                    {line,1086}]},\n                               {ns_rebalancer,rebalance_body,5,\n                                   [{file,"src/ns_rebalancer.erl"},\n                                    {line,524}]},\n                               {async,\'-async_init/4-fun-1-\',3,\n                                   [{file,"src/async.erl"},{line,197}]}]}.\nRebalance Operation Id = 3249c2b20a0702b7af38f8db578af122'}
      

      cbcollect_info attached. This test had passed on 7.0.0-4502.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Balakumaran.Gopal Balakumaran Gopal
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty