Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-77

Operator breaks if a node is manually removed

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.7.0
    • operator
    • None

    Description

      If you manually rebalance a node out of a cluster, the operator will keep checking for its existence and never add it back in.

      Eventually the round-robin of REST requests will hit the removed node and fail with 404s (as the node is not in a cluster).

      Snippet of operator logs:

      time="2017-12-31T00:42:16Z" level=info msg="Start reconciling" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:16Z" level=info msg="server config all_services: cb-example-0000,cb-example-0001" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:16Z" level=info msg="running members: cb-example-0000,cb-example-0001,cb-example-0002" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:16Z" level=info msg="cluster membership: cb-example-0001,cb-example-0002,cb-example-0000" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:16Z" level=info msg="active nodes: cb-example-0000,cb-example-0001" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:16Z" level=info msg="unknown nodes: cb-example-0002" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:16Z" level=info msg="Finish reconciling" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:24Z" level=info msg="Start reconciling" cluster-name=cb-example module=cluster
      time="2017-12-31T00:42:24Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:42:29Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:42:34Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:42:39Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:42:44Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:42:49Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:42:54Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:42:59Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:04Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:09Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:14Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:19Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:24Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:29Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:34Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:39Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:44Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:49Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:54Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:43:59Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:44:04Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:44:09Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      time="2017-12-31T00:44:14Z" level=warning msg="cluster status: failed with error Code: 404, Error:  ...retrying" cluster-name=cb-example module=retryutil
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            We don't allow manual removal of nodes and don't have plans to support it right now. We can revisit this topic in the future, but at the moment we want Kubernetes to control the cluster entirely. Users should not generally be making any changes to the cluster other than updating the Kubernetes configuration.

            mikew Mike Wiederhold [X] (Inactive) added a comment - We don't allow manual removal of nodes and don't have plans to support it right now. We can revisit this topic in the future, but at the moment we want Kubernetes to control the cluster entirely. Users should not generally be making any changes to the cluster other than updating the Kubernetes configuration.

            Actually reopening this since the operator should handle this case and not break.

            mikew Mike Wiederhold [X] (Inactive) added a comment - Actually reopening this since the operator should handle this case and not break.

            People

              mikew Mike Wiederhold [X] (Inactive)
              matt.carabine Matt Carabine (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty