Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-545

Xdcr: Operator tries to add back killed node using delta recovery and failing to reconcile

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 1.0.0
    • 1.0.0
    • operator

    Description

      Testcase: TestXdcrNodeDownDuringSetupAfterConfigure

      Scenario:

      1. After XDCR replication started successfully, XDCR source node (0001) goes down
      2. New node (0005) is created to replace the killed node
      3. But operator tries to do delta recovery of killed node 0001
      4. And reconcile failed after that and operator is trying to rebalance continuously

      Operator log prints:

      time="2018-08-16T02:48:36Z" level=info msg="needs rebalance: true" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:48:37Z" level=info msg="Add back node `test-couchbase-9br7k-0001` is being marked for delta recovery" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:48:40Z" level=info msg="Creating a pod (test-couchbase-9br7k-0005) running Couchbase enterprise-5.5.0" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:24Z" level=info msg="added member (test-couchbase-9br7k-0005)" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:24Z" level=warning msg="unable to poll external addresses for pod test-couchbase-9br7k-0001" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:25Z" level=warning msg="rebalance: failed with error [Server Error 400 (test-couchbase-9br7k-0000.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Post http://test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc:8091/controller/rebalance: dial tcp: lookup test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc on 10.96.0.10:53: no such host], [Server Error 400 (test-couchbase-9br7k-0002.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0003.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0004.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0005.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]] ...retrying" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:30Z" level=warning msg="rebalance: failed with error [Server Error 400 (test-couchbase-9br7k-0000.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Post http://test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc:8091/controller/rebalance: dial tcp: lookup test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc on 10.96.0.10:53: no such host], [Server Error 400 (test-couchbase-9br7k-0002.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0003.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0004.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0005.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]] ...retrying" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:35Z" level=warning msg="rebalance: failed with error [Server Error 400 (test-couchbase-9br7k-0000.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Post http://test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc:8091/controller/rebalance: dial tcp: lookup test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc on 10.96.0.10:53: no such host], [Server Error 400 (test-couchbase-9br7k-0002.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0003.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0004.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0005.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]] ...retrying" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:40Z" level=warning msg="rebalance: failed with error [Server Error 400 (test-couchbase-9br7k-0000.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Post http://test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc:8091/controller/rebalance: dial tcp: lookup test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc on 10.96.0.10:53: no such host], [Server Error 400 (test-couchbase-9br7k-0002.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0003.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0004.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0005.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]] ...retrying" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:45Z" level=warning msg="rebalance: failed with error [Server Error 400 (test-couchbase-9br7k-0000.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Post http://test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc:8091/controller/rebalance: dial tcp: lookup test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc on 10.96.0.10:53: no such host], [Server Error 400 (test-couchbase-9br7k-0002.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0003.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0004.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0005.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]] ...retrying" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:50Z" level=warning msg="rebalance: failed with error [Server Error 400 (test-couchbase-9br7k-0000.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Post http://test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc:8091/controller/rebalance: dial tcp: lookup test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc on 10.96.0.10:53: no such host], [Server Error 400 (test-couchbase-9br7k-0002.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0003.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0004.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0005.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]] ...retrying" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:50Z" level=error msg="Could not Rebalance because requested delta recovery is not possible. You probably added more nodes to the cluster or changed server groups configuration: still failing after 5 retries: [Server Error 400 (test-couchbase-9br7k-0000.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Post http://test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc:8091/controller/rebalance: dial tcp: lookup test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc on 10.96.0.10:53: no such host], [Server Error 400 (test-couchbase-9br7k-0002.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0003.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0004.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]], [Server Error 400 (test-couchbase-9br7k-0005.test-couchbase-9br7k.default.svc:8091/controller/rebalance): [deltaRecoveryNotPossible - requireDeltaRecovery was set to true but delta recovery cannot be performed]]" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:50Z" level=info msg="deleted pod (test-couchbase-9br7k-0001)" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:58Z" level=info msg="server config test_config_1: test-couchbase-9br7k-0000,test-couchbase-9br7k-0002,test-couchbase-9br7k-0003,test-couchbase-9br7k-0004" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:58Z" level=info msg="running members: test-couchbase-9br7k-0000,test-couchbase-9br7k-0002,test-couchbase-9br7k-0003,test-couchbase-9br7k-0004,test-couchbase-9br7k-0005" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:58Z" level=info msg="cluster membership: test-couchbase-9br7k-0000,test-couchbase-9br7k-0002,test-couchbase-9br7k-0003,test-couchbase-9br7k-0004,test-couchbase-9br7k-0005" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:58Z" level=info msg="active nodes: test-couchbase-9br7k-0002,test-couchbase-9br7k-0003,test-couchbase-9br7k-0004,test-couchbase-9br7k-0000" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:58Z" level=info msg="pending add nodes: test-couchbase-9br7k-0005" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:58Z" level=info msg="unmanaged nodes: test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc:8091" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:58Z" level=info msg="is rebalancing: false" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:49:58Z" level=info msg="needs rebalance: true" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:03Z" level=info msg="Rebalance progress: 0.000000" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:11Z" level=error msg="failed to reconcile: Failed to rebalance: cluster reports rebalance incomplete" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:19Z" level=info msg="running members: test-couchbase-9br7k-0000,test-couchbase-9br7k-0002,test-couchbase-9br7k-0003,test-couchbase-9br7k-0004,test-couchbase-9br7k-0005" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:19Z" level=info msg="cluster membership: test-couchbase-9br7k-0002,test-couchbase-9br7k-0003,test-couchbase-9br7k-0004,test-couchbase-9br7k-0005,test-couchbase-9br7k-0000" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:19Z" level=info msg="active nodes: test-couchbase-9br7k-0005,test-couchbase-9br7k-0000,test-couchbase-9br7k-0002,test-couchbase-9br7k-0003,test-couchbase-9br7k-0004" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:19Z" level=info msg="unmanaged nodes: test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc:8091" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:19Z" level=info msg="is rebalancing: false" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:19Z" level=info msg="needs rebalance: true" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:25Z" level=info msg="Rebalance progress: 0.000000" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:33Z" level=error msg="failed to reconcile: Failed to rebalance: cluster reports rebalance incomplete" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:41Z" level=info msg="running members: test-couchbase-9br7k-0000,test-couchbase-9br7k-0002,test-couchbase-9br7k-0003,test-couchbase-9br7k-0004,test-couchbase-9br7k-0005" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:41Z" level=info msg="cluster membership: test-couchbase-9br7k-0000,test-couchbase-9br7k-0002,test-couchbase-9br7k-0003,test-couchbase-9br7k-0004,test-couchbase-9br7k-0005" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:41Z" level=info msg="active nodes: test-couchbase-9br7k-0000,test-couchbase-9br7k-0002,test-couchbase-9br7k-0003,test-couchbase-9br7k-0004,test-couchbase-9br7k-0005" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:41Z" level=info msg="unmanaged nodes: test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc:8091" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:41Z" level=info msg="is rebalancing: false" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:41Z" level=info msg="needs rebalance: true" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:50:46Z" level=info msg="Rebalance progress: 0.000000" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:51:01Z" level=error msg="failed to reconcile: Failed to rebalance: cluster reports rebalance incomplete" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:51:09Z" level=info msg="running members: test-couchbase-9br7k-0000,test-couchbase-9br7k-0002,test-couchbase-9br7k-0003,test-couchbase-9br7k-0004,test-couchbase-9br7k-0005" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:51:09Z" level=info msg="cluster membership: test-couchbase-9br7k-0002,test-couchbase-9br7k-0003,test-couchbase-9br7k-0004,test-couchbase-9br7k-0005,test-couchbase-9br7k-0000" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:51:09Z" level=info msg="active nodes: test-couchbase-9br7k-0003,test-couchbase-9br7k-0004,test-couchbase-9br7k-0005,test-couchbase-9br7k-0000,test-couchbase-9br7k-0002" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:51:09Z" level=info msg="unmanaged nodes: test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc:8091" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:51:09Z" level=info msg="is rebalancing: false" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:51:09Z" level=info msg="needs rebalance: true" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:51:14Z" level=info msg="Rebalance progress: 0.000000" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:51:22Z" level=error msg="failed to reconcile: Failed to rebalance: cluster reports rebalance incomplete" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:51:31Z" level=info msg="running members: test-couchbase-9br7k-0003,test-couchbase-9br7k-0004,test-couchbase-9br7k-0005,test-couchbase-9br7k-0000,test-couchbase-9br7k-0002" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:51:31Z" level=info msg="cluster membership: test-couchbase-9br7k-0000,test-couchbase-9br7k-0002,test-couchbase-9br7k-0003,test-couchbase-9br7k-0004,test-couchbase-9br7k-0005" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:51:31Z" level=info msg="active nodes: test-couchbase-9br7k-0000,test-couchbase-9br7k-0002,test-couchbase-9br7k-0003,test-couchbase-9br7k-0004,test-couchbase-9br7k-0005" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:51:31Z" level=info msg="unmanaged nodes: test-couchbase-9br7k-0001.test-couchbase-9br7k.default.svc:8091" cluster-name=test-couchbase-9br7k module=cluster
      time="2018-08-16T02:51:31Z" level=info msg="is rebalancing: false" cluster-name=test-couchbase-9br7k module=cluster
      

      Attachments

        Activity

          People

            tommie Tommie McAfee (Inactive)
            ashwin.govindarajulu Ashwin Govindarajulu
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty