Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-77

Operator breaks if a node is manually removed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 0.7.0
    • None
    • operator
    • None

    Description

      If you manually rebalance a node out of a cluster, the operator will keep checking for its existence and never add it back in.

      Eventually the round-robin of REST requests will hit the removed node and fail with 404s (as the node is not in a cluster).

      Snippet of operator logs:

      time="2017-12-31T00:42:16Z" level=info msg="Start reconciling" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:16Z" level=info msg="server config all_services: cb-example-0000,cb-example-0001" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:16Z" level=info msg="running members: cb-example-0000,cb-example-0001,cb-example-0002" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:16Z" level=info msg="cluster membership: cb-example-0001,cb-example-0002,cb-example-0000" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:16Z" level=info msg="active nodes: cb-example-0000,cb-example-0001" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:16Z" level=info msg="unknown nodes: cb-example-0002" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:16Z" level=info msg="Finish reconciling" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:24Z" level=info msg="Start reconciling" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:24Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:42:29Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:42:34Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:42:39Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:42:44Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:42:49Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:42:54Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:42:59Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:04Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:09Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:14Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:19Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:24Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:29Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:34Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:39Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:44Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:49Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:54Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:59Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:44:04Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:44:09Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:44:14Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      

      Attachments

        Issue Links

          For Gerrit Dashboard: K8S-77
          # Subject Branch Project Status CR V

          Activity

            People

              mikew Mike Wiederhold [X] (Inactive)
              matt.carabine Matt Carabine (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty