Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-63218

[System Test] Rebalance Failure

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • Columnar 1.0.1
    • Columnar 1.0.1
    • analytics, columnar
    • None
    • Sandbox
      Build- 1.0.1-2313
    • 0
    • Yes
    • Analytics Sprint 48

    Description

      Rebalance Failure was observed on the logs in the system tests.

      Rebalance exited with reason {{badmatch,failed}, [{ns_rebalancer,rebalance_body,7, [{file,"src/ns_rebalancer.erl"}, {line,500}]}, {async,'-async_init/4-fun-1-',3, [{file,"src/async.erl"},{line,199}]}]}. Rebalance Operation Id = eaf14e837e09b9fa8cbfa48e1123885b
      

      Analytics Service unable to successfully rebalance 966618e9ff7a1c2ebd53bda934575b9a due to 'java.lang.IllegalStateException: timed out waiting for keep nodes to join & have partitions fully active (missing nodes: [svc-da-node-002.er65w5qpwe3wffcy.sandbox.nonprod-project-avengers.com:8091 (00e8c0a07f2a167fd9505b0a135d806c), svc-da-node-004.er65w5qpwe3wffcy.sandbox.nonprod-project-avengers.com:8091 (6cfc42f243b1fd766abff7b13ab6e752)]), metadata node active: true'; see analytics_info.log for details
      

      System was trying to failover a node:

      2024-08-19T22:52:03.416+00:00 INFO CBAS.rebalance.Rebalance [HttpExecutor(port:9111)-3] keep nodes: [svc-da-node-001.er65w5qpwe3wffcy.sandbox.nonprod-project-avengers.com:8091 (a01a5c04a60206bfdffe9e7cf4b3a43f), svc-da-node-002.er65w5qpwe3wffcy.sandbox.nonprod-project-avengers.com:8091 (00e8c0a07f2a167fd9505b0a135d806c), svc-da-node-004.er65w5qpwe3wffcy.sandbox.nonprod-project-avengers.com:8091 (6cfc42f243b1fd766abff7b13ab6e752)], pending ejects: [], failedOver: [null (f066f1c77e314045e23015c5831a0263)]
      2024-08-19T22:52:03.419+00:00 INFO CBAS.rebalance.Rebalance [Rebalancer (3ca6a1a696212ac32be2bd76d4706cb7)] Failing over the following nodes: [null (f066f1c77e314045e23015c5831a0263)]
      2024-08-19T22:52:03.419+00:00 INFO CBAS.cluster.NodeManager [Rebalancer (3ca6a1a696212ac32be2bd76d4706cb7)] f066f1c77e314045e23015c5831a0263 considered dead
      2024-08-19T22:52:03.434+00:00 ERRO CBAS.executor.JobExecutor [Rebalancer (3ca6a1a696212ac32be2bd76d4706cb7)] Unexpected failure. Aborting job JID:0.1926
      org.apache.hyracks.api.exceptions.HyracksException: HYR0010: Node f066f1c77e314045e23015c5831a0263 does not exist
      	at org.apache.hyracks.api.exceptions.HyracksException.create(HyracksException.java:58) ~[hyracks-api.jar:1.0.1-2313]
      	at org.apache.hyracks.control.cc.executor.JobExecutor.assignLocation(JobExecutor.java:473) ~[hyracks-control-cc.jar:1.0.1-2313]
      	at org.apache.hyracks.control.cc.executor.JobExecutor.assignTaskLocations(JobExecutor.java:365) ~[hyracks-control-cc.jar:1.0.1-2313]
      	at org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableTaskClusters(JobExecutor.java:245) ~[hyracks-control-cc.jar:1.0.1-2313]
      	at org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableActivityClusters(JobExecutor.java:209) ~[hyracks-control-cc.jar:1.0.1-2313]
      	at org.apache.hyracks.control.cc.executor.JobExecutor.notifyNodeFailures(JobExecutor.java:733) ~[hyracks-control-cc.jar:1.0.1-2313]
      	at org.apache.hyracks.control.cc.cluster.NodeManager.failNode(NodeManager.java:204) ~[hyracks-control-cc.jar:1.0.1-2313]
      	at com.couchbase.analytics.control.rebalance.Rebalance.beforeLock(Rebalance.java:196) ~[columnar-server.jar:1.0.1-2313]
      	at com.couchbase.analytics.control.rebalance.Rebalance.lambda$start$11(Rebalance.java:541) ~[columnar-server.jar:1.0.1-2313]
      	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
      	at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
      

      Logs:
      https://cb-engineering.s3.amazonaws.com/SysTestCapella/collectinfo-2024-08-20T004620-ns_1%40svc-da-node-001.er65w5qpwe3wffcy.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestCapella/collectinfo-2024-08-20T004620-ns_1%40svc-da-node-002.er65w5qpwe3wffcy.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestCapella/collectinfo-2024-08-20T004620-ns_1%40svc-da-node-003.er65w5qpwe3wffcy.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestCapella/collectinfo-2024-08-20T004620-ns_1%40svc-da-node-004.er65w5qpwe3wffcy.sandbox.nonprod-project-avengers.com.zip

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              abhay.aggrawal Abhay Aggrawal
              abhay.aggrawal Abhay Aggrawal
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty