Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51550

[Windows] Rebalancing IN CBAS node fails with error java.lang.Exception: replica 6@172.23.136.161:9120

    XMLWordPrintable

Details

    • Untriaged
    • Windows 64-bit
    • 1
    • Yes
    • CX Sprint 285

    Description

      This test last passed in build 7.1.0-2475
      Steps to reproduce -
      1. Have a 5 node cluster with 1 KV and 4 CBAS nodes.
      2. Load bucket, scopes and collection on KV and load data into them.
      3. Create dataverses, datasets, indexes on CBAS.
      4. Set CBAS replica to 3 and rebalance so that replica can take effect.
      5. Now add 1 more CBAS node and rebalance.
      6. Rebalance fails.

      2022-03-23 00:22:00,713 | test  | ERROR   | pool-3-thread-8 | [rest_client:_rebalance_status_and_progress:1541] {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'type': u'rebalance', u'masterRequestTimedOut': False, u'statusId': u'162b9d7f810e5663b4ec3c930cfe4406', u'subtype': u'rebalance', u'statusIsStale': False, u'lastReportURI': u'/logs/rebalanceReport?reportID=46d10028050de2693d07720d448c86c1', u'status': u'notRunning'} - rebalance failed
      2022-03-23 00:22:01,009 | test  | INFO    | pool-3-thread-8 | [rest_client:print_UI_logs:2706] Latest logs from UI on 172.23.136.160:
      2022-03-23 00:22:01,009 | test  | ERROR   | pool-3-thread-8 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.136.160', u'tstamp': 1648020114468L, u'shortText': u'message', u'serverTime': u'2022-03-23T00:21:54.468Z', u'text': u'Rebalance exited with reason {service_rebalance_failed,cbas,\n                              {worker_died,\n                               {\'EXIT\',<0.13004.0>,\n                                {rebalance_failed,\n                                 {service_error,\n                                  <<"Rebalance 01b142cc9284e45d0b9837d138c44468 failed: java.lang.Exception: replica 6@172.23.136.161:9120 failed">>}}}}}.\nRebalance Operation Id = cb5b43bcdb429ad048c19a57c1af5703'}
      2022-03-23 00:22:01,009 | test  | ERROR   | pool-3-thread-8 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'analytics', u'type': u'warning', u'node': u'ns_1@172.23.136.161', u'tstamp': 1648020113771L, u'shortText': u'message', u'serverTime': u'2022-03-23T00:21:53.771Z', u'text': u"Analytics Service unable to successfully rebalance 01b142cc9284e45d0b9837d138c44468 due to 'java.lang.Exception: replica 6@172.23.136.161:9120 failed'; see analytics_info.log for details"}
      2022-03-23 00:22:01,010 | test  | ERROR   | pool-3-thread-8 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'analytics', u'type': u'info', u'node': u'ns_1@172.23.136.161', u'tstamp': 1648020107065L, u'shortText': u'message', u'serverTime': u'2022-03-23T00:21:47.065Z', u'text': u'Analytics collection wZGmSDHQTbUtuCLeVJ7Z0O.yjPl9I1mlsPEAurWshziaULJedlT rebalanced. Rebalance progress now is 0.99. Remaining analytics collections: 0/15'}
      2022-03-23 00:22:01,010 | test  | ERROR   | pool-3-thread-8 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'analytics', u'type': u'info', u'node': u'ns_1@172.23.136.161', u'tstamp': 1648020106545L, u'shortText': u'message', u'serverTime': u'2022-03-23T00:21:46.545Z', u'text': u'Analytics collection wZGmSDHQTbUtuCLeVJ7Z0O.g7mrPc2trqmmR2YcGSISLKmnj rebalanced. Rebalance progress now is 0.9071196105150946. Remaining analytics collections: 1/15'}
      2022-03-23 00:22:01,012 | test  | ERROR   | pool-3-thread-8 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'analytics', u'type': u'info', u'node': u'ns_1@172.23.136.161', u'tstamp': 1648020106014L, u'shortText': u'message', u'serverTime': u'2022-03-23T00:21:46.014Z', u'text': u'Analytics collection wZGmSDHQTbUtuCLeVJ7Z0O.R6eb7KvKgTNrk2d9k59lzKM32HGGKi rebalanced. Rebalance progress now is 0.8493139085001821. Remaining analytics collections: 2/15'}
      2022-03-23 00:22:01,013 | test  | ERROR   | pool-3-thread-8 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'analytics', u'type': u'info', u'node': u'ns_1@172.23.136.161', u'tstamp': 1648020105656L, u'shortText': u'message', u'serverTime': u'2022-03-23T00:21:45.656Z', u'text': u'Analytics collection wZGmSDHQTbUtuCLeVJ7Z0O.K28ljoZ rebalanced. Rebalance progress now is 0.7788225558835925. Remaining analytics collections: 3/15'}
      2022-03-23 00:22:01,013 | test  | ERROR   | pool-3-thread-8 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'analytics', u'type': u'info', u'node': u'ns_1@172.23.136.161', u'tstamp': 1648020105110L, u'shortText': u'message', u'serverTime': u'2022-03-23T00:21:45.110Z', u'text': u'Analytics collection wZGmSDHQTbUtuCLeVJ7Z0O.CL6Ddt8puyZL rebalanced. Rebalance progress now is 0.7210219770511136. Remaining analytics collections: 4/15'}
      2022-03-23 00:22:01,013 | test  | ERROR   | pool-3-thread-8 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'analytics', u'type': u'info', u'node': u'ns_1@172.23.136.161', u'tstamp': 1648020104180L, u'shortText': u'message', u'serverTime': u'2022-03-23T00:21:44.180Z', u'text': u'Analytics collection JsHnB0Qsp9qvZsZOig6p.zjXxUzTUYogzecQ rebalanced. Rebalance progress now is 0.6057582565373447. Remaining analytics collections: 5/15'}
      2022-03-23 00:22:01,015 | test  | ERROR   | pool-3-thread-8 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'analytics', u'type': u'info', u'node': u'ns_1@172.23.136.161', u'tstamp': 1648020103646L, u'shortText': u'message', u'serverTime': u'2022-03-23T00:21:43.646Z', u'text': u'Analytics collection JsHnB0Qsp9qvZsZOig6p.zPXuXAW2i9y rebalanced. Rebalance progress now is 0.5479568378388932. Remaining analytics collections: 6/15'}
      2022-03-23 00:22:01,015 | test  | ERROR   | pool-3-thread-8 | [rest_client:print_UI_logs:2708] {u'code': 0, u'module': u'analytics', u'type': u'info', u'node': u'ns_1@172.23.136.161', u'tstamp': 1648020103280L, u'shortText': u'message', u'serverTime': u'2022-03-23T00:21:43.280Z', u'text': u'Analytics collection JsHnB0Qsp9qvZsZOig6p.p3xRp0zoV0niIyzrsNcR9 rebalanced. Rebalance progress now is 0.47746817804257863. Remaining analytics collections: 7/15'}
      2022-03-23 00:22:01,016 | test  | ERROR   | pool-3-thread-8 | [task:call:326] Rebalance Failed: {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'type': u'rebalance', u'masterRequestTimedOut': False, u'statusId': u'162b9d7f810e5663b4ec3c930cfe4406', u'subtype': u'rebalance', u'statusIsStale': False, u'lastReportURI': u'/logs/rebalanceReport?reportID=46d10028050de2693d07720d448c86c1', u'status': u'notRunning'} - rebalance failed
      

      Test Link - http://qa.sc.couchbase.com/job/test_suite_executor-TAF/175788/console

      Same test works perfectly fine on Centos.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            umang.agrawal Umang added a comment - Below logs are for successful run on build 7.1.0-2506. https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-51550/collectinfo-2022-03-25T054756-ns_1%40172.23.136.196.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-51550/collectinfo-2022-03-25T054756-ns_1%40172.23.136.197.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-51550/collectinfo-2022-03-25T054756-ns_1%40172.23.136.198.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-51550/collectinfo-2022-03-25T054756-ns_1%40172.23.136.199.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-51550/collectinfo-2022-03-25T054756-ns_1%40172.23.136.200.zip https://cb-jira.s3.us-east-2.amazonaws.com/logs/MB-51550/collectinfo-2022-03-25T054756-ns_1%40172.23.136.201.zip
            umang.agrawal Umang added a comment -

            Attaching logs for failure on build 7.1.0-2530. By the looks of it, it seems more of an environment issue.

            Nodes Services Version CPU Status Membership / Recovery
            172.23.136.106 index, kv, n1ql 7.1.0-2530-enterprise 1.53666666667 Cluster node active / none
            172.23.136.107 cbas 7.1.0-2530-enterprise 0.806666666667 Cluster node active / none
            172.23.136.109 cbas 7.1.0-2530-enterprise 0.676666666667 Cluster node active / none
            172.23.136.115 cbas 7.1.0-2530-enterprise 1.485 Cluster node active / none
            172.23.136.113 cbas 7.1.0-2530-enterprise 1.275 Cluster node active / none
            172.23.136.111 ['cbas']     <--- IN —  

             

            umang.agrawal Umang added a comment - Attaching logs for failure on build 7.1.0-2530. By the looks of it, it seems more of an environment issue. Nodes Services Version CPU Status Membership / Recovery 172.23.136.106 index, kv, n1ql 7.1.0-2530-enterprise 1.53666666667 Cluster node active / none 172.23.136.107 cbas 7.1.0-2530-enterprise 0.806666666667 Cluster node active / none 172.23.136.109 cbas 7.1.0-2530-enterprise 0.676666666667 Cluster node active / none 172.23.136.115 cbas 7.1.0-2530-enterprise 1.485 Cluster node active / none 172.23.136.113 cbas 7.1.0-2530-enterprise 1.275 Cluster node active / none 172.23.136.111 ['cbas']     <--- IN —    

            Umang ,
            I found a potential timing issue that could explain what is happening. I merged a path to fix it and it should be available in the next build (7.1.0-2532). Please run the failing test again with that build and attach the logs if the issue still happens.

            murtadha.hubail Murtadha Hubail added a comment - Umang , I found a potential timing issue that could explain what is happening. I merged a path to fix it and it should be available in the next build (7.1.0-2532). Please run the failing test again with that build and attach the logs if the issue still happens.
            umang.agrawal Umang added a comment -

            Verified with build 7.1.0-2534

            umang.agrawal Umang added a comment - Verified with build 7.1.0-2534

            Build couchbase-server-7.2.0-1094 contains cbas-core commit 4442c2f with commit message:
            MB-51550: Make replication ack timeout configurable

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.2.0-1094 contains cbas-core commit 4442c2f with commit message: MB-51550 : Make replication ack timeout configurable

            People

              umang.agrawal Umang
              umang.agrawal Umang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty