Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62741

[System Test] Cluster unusable post rebalance in operation/ data flow controller interrupted exceptions seen

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • Columnar 1.0.0
    • Columnar 1.0.0
    • analytics
    • 1.0.0-2209
    • Untriaged
    • 0
    • Unknown
    • Analytics Sprint 46

    Description

      It looks like post a scale up operation ( from 16 to 32 nodes), the cluster is seen to be unusable. The sequence of events -

      Rebalance from 16 to 32 gets triggered at -

      2024-07-15T16:45:11.318
      

      This completes at -

      2024-07-15T17:01:24.681Z
      

      There are some exceptions seen around 17:35 (unsure if they are of any importance/relevance)

      2024-07-15T17:35:53.442+00:00 WARN CBAS.dataflow.FeedRecordDataFlowController [SAO:JID:0.4459:TAID:TID:ANID:ODID:170:0:384:0:(linkZcZKRcJl/default1)[384]:BO] data flow controller interrupted
      java.lang.InterruptedException: null
      	at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1640) ~[?:?]
      	at java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435) ~[?:?]
      	at com.couchbase.analytics.adapter.CouchbaseConnector.pollNextMessage(CouchbaseConnector.java:844) ~[columnar-connector.jar:1.0.0-2209]
      	at com.couchbase.analytics.adapter.CouchbaseConnector.next(CouchbaseConnector.java:810) ~[columnar-connector.jar:1.0.0-2209]
      	at org.apache.asterix.external.dataflow.FeedRecordDataFlowController.next(FeedRecordDataFlowController.java:139) ~[asterix-external-data.jar:1.0.0-2209]
      	at org.apache.asterix.external.dataflow.FeedRecordDataFlowController.start(FeedRecordDataFlowController.java:88) ~[asterix-external-data.jar:1.0.0-2209]
      	at org.apache.asterix.external.dataset.adapter.FeedAdapter.start(FeedAdapter.java:41) ~[asterix-external-data.jar:1.0.0-2209]
      	at org.apache.asterix.common.external.IDataSourceAdapter.start(IDataSourceAdapter.java:75) ~[asterix-common.jar:1.0.0-2209]
      	at com.couchbase.analytics.runtime.BucketOperatorNodePushable.start(BucketOperatorNodePushable.java:50) ~[columnar-connector.jar:1.0.0-2209]
      	at org.apache.asterix.active.ActiveSourceOperatorNodePushable.initialize(ActiveSourceOperatorNodePushable.java:101) ~[asterix-active.jar:1.0.0-2209]
      	at org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$runInParallel$0(SuperActivityOperatorNodePushable.java:233) ~[hyracks-api.jar:1.0.0-2209]
      	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
      

      Then we start seeing cluster unusable messages -

      2024-07-15T17:37:24.406+00:00 WARN CBAS.server.QueryServiceServlet [HttpExecutor(port:18095)-10] handleException: ASX0032: Cannot execute request, cluster is UNUSABLE: uuid=null, clientContextID=96f8c7c8-a113-40f1-bce9-ca5867be508f
      2024-07-15T17:37:56.723+00:00 INFO CBAS.server.QueryServiceServlet [HttpExecutor(port:18095)-11] handleRequest: uuid=9cdc2d98-dae2-440f-b7ae-4cd9010d8fbd, clientContextID=null,
       
       
      2024-07-15T17:42:54.438+00:00 INFO CBAS.messaging.NCMessageBroker [Worker:9d1bf4c6302db62e3f570c2df2678cd9] Received message: ExecuteStatementResponseMessage(id=397, uuid=f20ff46c-1bb1-4eac-8925-8b86e382afb9, clientContextId=null): 0 characters
      2024-07-15T17:42:54.439+00:00 WARN CBAS.server.QueryServiceServlet [HttpExecutor(port:18095)-8] handleException: ASX0032: Cannot execute request, cluster is UNUSABLE: uuid=null, clientContextID=f20ff46c-1bb1-4eac-8925-8b86e382afb9
      2024-07-15T17:43:20.183+00:00 INFO CBAS.cbas updating 
      

      This looks different from https://issues.couchbase.com/browse/MB-62680 because this was a rebalance-in. If it's the root cause is the same please close this out as duplicate.

      cbcollect ->

      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-001.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-002.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-003.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-004.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-005.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-006.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-007.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-008.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-009.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-010.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-011.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-012.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-013.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-014.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-015.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-016.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-017.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-018.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-019.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-020.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-021.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-022.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-023.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-024.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-025.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-026.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-027.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-028.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-029.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-030.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-031.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip
      https://cb-engineering.s3.amazonaws.com/SysTestColumnar15July/collectinfo-2024-07-15T182549-ns_1%40svc-da-node-032.twi3gef5x8hk6evi.sandbox.nonprod-project-avengers.com.zip

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              pavan.pb Pavan PB
              pavan.pb Pavan PB
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty