Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-30766

[System Test] Rebalance operation for any service fails because of analytics nodes rebalance error - Datasets in different partitions have different DCP states

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 6.0.0
    • 6.0.0
    • analytics
    • centos cluster (longevity)

    Description

      Build : 6.0.0-1432
      Test : -test tests/integration/test_allFeatures_alice.yml -scope tests/integration/scope_Xattrs_Alice.yml
      Scale : 3

      Seeing this issue several times with the system test run on the latest build. The latest being Rebalance ID d958ccd453cca075ef38f72b4c8915ea.

      Analytics rebalance causes failures for rebalance operations involving other services as well. Like GSI, Analytics service should also refrain from rebalancing analytics nodes when the rebalance operation is initiated for nodes of other services.

      Also, when disconnecting the link, it would be good to ensure DCP states on all partitions are balanced, even if it delays the disconnect operation, so that issues like these can be avoided.

      Seeing the following in the analytics_error.log file on 172.23.96.145

      2018-08-05T12:45:56.027-07:00 ERRO CBAS.metadata.BucketEventsListener [Executor-571:ClusterController] Failed to connect bucket Default.Local.CUSTOMER(CouchbaseMetadataExtension)
      java.lang.NullPointerException: null
      2018-08-05T12:46:24.561-07:00 ERRO CBAS.metadata.BucketEventsListener [Executor-657:ClusterController] Failed to connect bucket Default.Local.CUSTOMER(CouchbaseMetadataExtension)
      java.lang.NullPointerException: null
      2018-08-05T12:47:09.721-07:00 ERRO CBAS.rebalance.Rebalance [Executor-586:ClusterController] rebalance failed
      com.couchbase.analytics.common.exceptions.AnalyticsHyracksException: CBAS0001: Datasets in different partitions have different DCP states. Mutations needed to catch up = 234581. User action: Connect the bucket: { "class" : "Bucket", "dataverse" : "Default", "link" : "Local", "bucket" : "default", "uuid" : "0e91fbf6d20c5b4a6456222cc2c45ab4", "running" : false } or drop the dataset: Default.ds1
              at com.couchbase.analytics.control.rebalance.ShadowStateWriteCallback.beforeRebalance(ShadowStateWriteCallback.java:89) ~[cbas-server.jar:6.0.0-1435]
              at org.apache.asterix.utils.RebalanceUtil.rebalance(RebalanceUtil.java:220) ~[asterix-app.jar:6.0.0-1435]
              at org.apache.asterix.utils.RebalanceUtil.rebalance(RebalanceUtil.java:131) ~[asterix-app.jar:6.0.0-1435]
              at com.couchbase.analytics.control.rebalance.Rebalance.rebalanceDataset(Rebalance.java:403) ~[cbas-server.jar:6.0.0-1435]
              at com.couchbase.analytics.control.rebalance.Rebalance.rebalanceDatasets(Rebalance.java:237) ~[cbas-server.jar:6.0.0-1435]
              at com.couchbase.analytics.control.rebalance.Rebalance.lambda$doRebalance$3(Rebalance.java:170) ~[cbas-server.jar:6.0.0-1435]
              at org.apache.hyracks.api.util.InvokeUtil.tryWithCleanups(InvokeUtil.java:191) ~[hyracks-api.jar:6.0.0-1435]
              at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:166) ~[cbas-server.jar:6.0.0-1435]
              at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:130) [cbas-server.jar:6.0.0-1435]
              at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:70) [cbas-server.jar:6.0.0-1435]
              at com.couchbase.analytics.runtime.WriteLockCallable.call(WriteLockCallable.java:21) [cbas-connector.jar:6.0.0-1435]
              at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
              at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
      2018-08-05T12:47:10.426-07:00 ERRO CBAS.servlet.RebalanceServlet [HttpExecutor(port:9111)-2] Rebalance d958ccd453cca075ef38f72b4c8915ea failed
      com.couchbase.analytics.common.exceptions.AnalyticsHyracksException: CBAS0001: Datasets in different partitions have different DCP states. Mutations needed to catch up = 234581. User action: Connect the bucket: { "class" : "Bucket", "dataverse" : "Default", "link" : "Local", "bucket" : "default", "uuid" : "0e91fbf6d20c5b4a6456222cc2c45ab4", "running" : false } or drop the dataset: Default.ds1
              at com.couchbase.analytics.control.rebalance.ShadowStateWriteCallback.beforeRebalance(ShadowStateWriteCallback.java:89) ~[cbas-server.jar:6.0.0-1435]
              at org.apache.asterix.utils.RebalanceUtil.rebalance(RebalanceUtil.java:220) ~[asterix-app.jar:6.0.0-1435]
              at org.apache.asterix.utils.RebalanceUtil.rebalance(RebalanceUtil.java:131) ~[asterix-app.jar:6.0.0-1435]
              at com.couchbase.analytics.control.rebalance.Rebalance.rebalanceDataset(Rebalance.java:403) ~[cbas-server.jar:6.0.0-1435]
              at com.couchbase.analytics.control.rebalance.Rebalance.rebalanceDatasets(Rebalance.java:237) ~[cbas-server.jar:6.0.0-1435]
              at com.couchbase.analytics.control.rebalance.Rebalance.lambda$doRebalance$3(Rebalance.java:170) ~[cbas-server.jar:6.0.0-1435]
              at org.apache.hyracks.api.util.InvokeUtil.tryWithCleanups(InvokeUtil.java:191) ~[hyracks-api.jar:6.0.0-1435]
              at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:166) ~[cbas-server.jar:6.0.0-1435]
              at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:130) ~[cbas-server.jar:6.0.0-1435]
              at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:70) ~[cbas-server.jar:6.0.0-1435]
              at com.couchbase.analytics.runtime.WriteLockCallable.call(WriteLockCallable.java:21) ~[cbas-connector.jar:6.0.0-1435]
              at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
              at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
      2018-08-05T12:47:28.610-07:00 ERRO CBAS.metadata.BucketEventsListener [Executor-345:ClusterController] Failed to connect bucket Default.Local.CUSTOMER(CouchbaseMetadataExtension)
      java.lang.NullPointerException: null
      2018-08-05T12:48:18.334-07:00 ERRO CBAS.metadata.BucketEventsListener [Executor-659:ClusterController] Failed to connect bucket Default.Local.CUSTOMER(CouchbaseMetadataExtension)
      java.lang.NullPointerException: null
      
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              michael.blow Michael Blow
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty