Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-615

Ephemeral pod with log PV: Seeing multiple rebalance failure when server pod is killed

    XMLWordPrintable

Details

    Description

      Operator image used 1.1.0-108

      Scenario:

      1. Create 3 member Cb cluster with log PV defined for all pods
      2. Killed Cb server pod 0001 using `kubectl delete pods/cb-example-0001`

      Observation:

      New pod 0003 was created to replace pod 0001. But before the pod 0001 to get removed and cluster undergone multiple rebalance failures.

      Operator console prints:

      time="2018-10-05T06:34:39Z" level=warning msg="cb-example-0001 is unrecoverable: No volume mounts defined" cluster-name=cb-example module=cluster
      time="2018-10-05T06:34:39Z" level=info msg="planning removal of http://cb-example-0001.cb-example.ashwin.svc:8091" cluster-name=cb-example module=cluster
      time="2018-10-05T06:34:41Z" level=warning msg="unable to poll external addresses for pod cb-example-0001" cluster-name=cb-example module=cluster
      time="2018-10-05T06:34:42Z" level=info msg="Rebalance progress: 0.000000" cluster-name=cb-example module=cluster
      time="2018-10-05T06:34:46Z" level=info msg="Rebalance progress: 0.000000" cluster-name=cb-example module=cluster
      time="2018-10-05T06:34:54Z" level=error msg="failed to reconcile: Failed to rebalance: cluster reports rebalance incomplete" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:02Z" level=info msg="server config all_services: cb-example-0000,cb-example-0002" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:02Z" level=info msg="Cluster status: unbalanced" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:02Z" level=info msg="Node status:" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:02Z" level=info msg="┌─────────────────┬──────────────┬─────────────────────┐" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:02Z" level=info msg="│ Server          │ Class        │ Status              │" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:02Z" level=info msg="├─────────────────┼──────────────┼─────────────────────┤" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:02Z" level=info msg="│ cb-example-0000 │ all_services │ managed+active      │" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:02Z" level=info msg="│ cb-example-0001 │ all_services │ managed+failed      │" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:02Z" level=info msg="│ cb-example-0002 │ all_services │ managed+active      │" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:02Z" level=info msg="│ cb-example-0003 │ all_services │ managed+pending_add │" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:02Z" level=info msg="└─────────────────┴──────────────┴─────────────────────┘" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:02Z" level=info cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:04Z" level=info msg="An auto-failover has taken place" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:04Z" level=warning msg="cb-example-0001 is unrecoverable: No volume mounts defined" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:04Z" level=info msg="planning removal of http://cb-example-0001.cb-example.ashwin.svc:8091" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:06Z" level=warning msg="unable to poll external addresses for pod cb-example-0001" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:07Z" level=info msg="Rebalance progress: 0.000000" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:11Z" level=info msg="Rebalance progress: 0.000000" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:19Z" level=error msg="failed to reconcile: Failed to rebalance: cluster reports rebalance incomplete" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:27Z" level=info msg="server config all_services: cb-example-0000,cb-example-0002" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:27Z" level=info msg="Cluster status: unbalanced" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:27Z" level=info msg="Node status:" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:27Z" level=info msg="┌─────────────────┬──────────────┬─────────────────────┐" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:27Z" level=info msg="│ Server          │ Class        │ Status              │" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:27Z" level=info msg="├─────────────────┼──────────────┼─────────────────────┤" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:27Z" level=info msg="│ cb-example-0000 │ all_services │ managed+active      │" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:27Z" level=info msg="│ cb-example-0001 │ all_services │ managed+failed      │" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:27Z" level=info msg="│ cb-example-0002 │ all_services │ managed+active      │" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:27Z" level=info msg="│ cb-example-0003 │ all_services │ managed+pending_add │" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:27Z" level=info msg="└─────────────────┴──────────────┴─────────────────────┘" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:27Z" level=info cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:29Z" level=info msg="An auto-failover has taken place" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:29Z" level=warning msg="cb-example-0001 is unrecoverable: No volume mounts defined" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:29Z" level=info msg="planning removal of http://cb-example-0001.cb-example.ashwin.svc:8091" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:32Z" level=warning msg="unable to poll external addresses for pod cb-example-0001" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:33Z" level=info msg="Rebalance progress: 0.000000" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:37Z" level=info msg="Rebalance progress: 0.000000" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:45Z" level=error msg="failed to reconcile: Failed to rebalance: cluster reports rebalance incomplete" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:53Z" level=info msg="server config all_services: cb-example-0000,cb-example-0002" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:53Z" level=info msg="Cluster status: unbalanced" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:53Z" level=info msg="Node status:" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:53Z" level=info msg="┌─────────────────┬──────────────┬─────────────────────┐" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:53Z" level=info msg="│ Server          │ Class        │ Status              │" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:53Z" level=info msg="├─────────────────┼──────────────┼─────────────────────┤" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:53Z" level=info msg="│ cb-example-0000 │ all_services │ managed+active      │" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:53Z" level=info msg="│ cb-example-0001 │ all_services │ managed+failed      │" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:53Z" level=info msg="│ cb-example-0002 │ all_services │ managed+active      │" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:53Z" level=info msg="│ cb-example-0003 │ all_services │ managed+pending_add │" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:53Z" level=info msg="└─────────────────┴──────────────┴─────────────────────┘" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:53Z" level=info cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:55Z" level=info msg="An auto-failover has taken place" cluster-name=cb-example module=cluster
      time="2018-10-05T06:35:55Z" level=warning msg="cb-example-0001 is unrecoverable: No volume mounts defined" cluster-name=cb-example module=cluster
      

      Attachments

        Issue Links

          Activity

            People

              mikew Mike Wiederhold [X] (Inactive)
              ashwin.govindarajulu Ashwin Govindarajulu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty