Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-30857

[System Test]: Datasets in different partitions have different DCP states

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Test Blocker
    • 6.0.0
    • 6.0.0
    • analytics
    • centos cluster

    Description

      Build: 6.0.0-1480

      Rebalance failed when we remove analytics nodeĀ 

      [2018-08-10T00:41:36-07:00, sequoiatools/couchbase-cli:facbde] rebalance -c 172.23.104.16:8091 --server-remove 172.23.104.23 -u Administrator -p password

      Debug log

      [user:error,2018-08-10T00:41:52.263-07:00,ns_1@172.23.104.16:<0.6509.0>:ns_orchestrator:do_log_rebalance_completion:1117]Rebalance exited with reason {service_rebalance_failed,cbas,
      {rebalance_failed,
      {service_error,
      <<"Rebalance 8e784bec9eff91cbfc5a5e51a50a734c failed: CBAS0001: Datasets in different partitions have different DCP states. Mutations needed to catch up = 35639. User action: Try again later">>}}}

      error logs

      2018-08-10T00:41:48.140-07:00 ERRO CBAS.executor.JobExecutor [Worker:ClusterController] Unexpected failure. Aborting job JID:0.7203
      org.apache.hyracks.api.exceptions.HyracksException: HYR0010: Node 3223dd365e3fc1d01376ed0269e54dc4 does not exist
      at org.apache.hyracks.api.exceptions.HyracksException.create(HyracksException.java:56) ~[hyracks-api.jar:6.0.0-1480]
      at org.apache.hyracks.control.cc.executor.JobExecutor.assignLocation(JobExecutor.java:473) ~[hyracks-control-cc.jar:6.0.0-1480]
      at org.apache.hyracks.control.cc.executor.JobExecutor.assignTaskLocations(JobExecutor.java:365) ~[hyracks-control-cc.jar:6.0.0-1480]
      at org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableTaskClusters(JobExecutor.java:245) ~[hyracks-control-cc.jar:6.0.0-1480]
      at org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableActivityClusters(JobExecutor.java:209) ~[hyracks-control-cc.jar:6.0.0-1480]
      at org.apache.hyracks.control.cc.executor.JobExecutor.notifyNodeFailures(JobExecutor.java:732) [hyracks-control-cc.jar:6.0.0-1480]
      at org.apache.hyracks.control.cc.cluster.NodeManager.failNode(NodeManager.java:197) [hyracks-control-cc.jar:6.0.0-1480]
      at org.apache.hyracks.control.cc.cluster.NodeManager.addNode(NodeManager.java:110) [hyracks-control-cc.jar:6.0.0-1480]
      at org.apache.hyracks.control.cc.work.RegisterNodeWork.doRun(RegisterNodeWork.java:58) [hyracks-control-cc.jar:6.0.0-1480]
      at org.apache.hyracks.control.common.work.SynchronizableWork.run(SynchronizableWork.java:43) [hyracks-control-common.jar:6.0.0-1480]
      at org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127) [hyracks-control-common.jar:6.0.0-1480]
      2018-08-10T00:41:48.141-07:00 ERRO CBAS.executor.JobExecutor [Worker:ClusterController] Unexpected failure. Aborting job JID:0.7204
      org.apache.hyracks.api.exceptions.HyracksException: HYR0010: Node 3223dd365e3fc1d01376ed0269e54dc4 does not exist
      at org.apache.hyracks.api.exceptions.HyracksException.create(HyracksException.java:56) ~[hyracks-api.jar:6.0.0-1480]
      /8e784bec9eff91cbfc5a5e51a50a734c
      at org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableTaskClusters(JobExecutor.java:245) ~[hyracks-control-cc.jar:6.0.0-1480]
      at org.apache.hyracks.control.cc.executor.JobExecutor.startRunnableActivityClusters(JobExecutor.java:209) ~[hyracks-control-cc.jar:6.0.0-1480]
      at org.apache.hyracks.control.cc.executor.JobExecutor.notifyNodeFailures(JobExecutor.java:732) [hyracks-control-cc.jar:6.0.0-1480]
      at org.apache.hyracks.control.cc.cluster.NodeManager.failNode(NodeManager.java:197) [hyracks-control-cc.jar:6.0.0-1480]
      at org.apache.hyracks.control.cc.cluster.NodeManager.addNode(NodeManager.java:110) [hyracks-control-cc.jar:6.0.0-1480]
      at org.apache.hyracks.control.cc.work.RegisterNodeWork.doRun(RegisterNodeWork.java:58) [hyracks-control-cc.jar:6.0.0-1480]
      at org.apache.hyracks.control.common.work.SynchronizableWork.run(SynchronizableWork.java:43) [hyracks-control-common.jar:6.0.0-1480]
      at org.apache.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:127) [hyracks-control-common.jar:6.0.0-1480]
      2018-08-10T00:41:50.200-07:00 ERRO CBAS.active.ActiveEntityEventsListener [ActiveNotificationHandler] Active Job JID:0.7074 failed
      org.apache.hyracks.api.exceptions.HyracksDataException: HYR0115: Local network error
      at org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:60) ~[hyracks-api.jar:6.0.0-1480]
      at org.apache.hyracks.dataflow.std.collectors.NonDeterministicChannelReader.findNextSender(NonDeterministicChannelReader.java:115) ~[hyracks-dataflow-std.jar:6.0.0-1480]
      at org.apache.hyracks.dataflow.std.collectors.NonDeterministicFrameReader.nextFrame(NonDeterministicFrameReader.java:43) ~[hyracks-dataflow-std.jar:6.0.0-1480]
      at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:391) ~[hyracks-control-nc.jar:6.0.0-1480]
      at org.apache.hyracks.control.nc.Task.run(Task.java:330) ~[hyracks-control-nc.jar:6.0.0-1480]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_181]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_181]
      at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
      2018-08-10T00:41:51.924-07:00 ERRO CBAS.rebalance.Rebalance [Executor-99:ClusterController] rebalance failed
      com.couchbase.analytics.common.exceptions.AnalyticsHyracksException: CBAS0001: Datasets in different partitions have different DCP states. Mutations needed to catch up = 35639. User action: Try again later
      at com.couchbase.analytics.control.rebalance.ShadowStateWriteCallback.beforeRebalance(ShadowStateWriteCallback.java:80) ~[cbas-server.jar:6.0.0-1480]
      at org.apache.asterix.utils.RebalanceUtil.rebalance(RebalanceUtil.java:220) ~[asterix-app.jar:6.0.0-1480]
      at org.apache.asterix.utils.RebalanceUtil.rebalance(RebalanceUtil.java:131) ~[asterix-app.jar:6.0.0-1480]
      at com.couchbase.analytics.control.rebalance.Rebalance.rebalanceDataset(Rebalance.java:426) ~[cbas-server.jar:6.0.0-1480]
      at com.couchbase.analytics.control.rebalance.Rebalance.rebalanceDatasets(Rebalance.java:251) ~[cbas-server.jar:6.0.0-1480]
      at com.couchbase.analytics.control.rebalance.Rebalance.lambda$doRebalance$3(Rebalance.java:183) ~[cbas-server.jar:6.0.0-1480]
      at org.apache.hyracks.api.util.InvokeUtil.tryWithCleanups(InvokeUtil.java:191) ~[hyracks-api.jar:6.0.0-1480]
      at com.couchbase.analytics.control.rebalance.Rebalance.doRebalance(Rebalance.java:179) ~[cbas-server.jar:6.0.0-1480]
      at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:139) [cbas-server.jar:6.0.0-1480]
      at com.couchbase.analytics.control.rebalance.Rebalance.doCall(Rebalance.java:74) [cbas-server.jar:6.0.0-1480]
      at com.couchbase.analytics.runtime.WriteLockCallable.call(WriteLockCallable.java:21) [cbas-connector.jar:6.0.0-1480]
      at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_181]
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_181]
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_181]
      at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
      2018-08-10T00:41:52.237-07:00 ERRO CBAS.servlet.RebalanceServlet [HttpExecutor(port:9111)-1] Rebalance 8e784bec9eff91cbfc5a5e51a50a734c failed

      Subsequent rebalances also failed with same error

      Attachments

        Issue Links

          Activity

            People

              michael.blow Michael Blow
              vikas.chaudhary Vikas Chaudhary
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty