Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-577

Stop rebalance if progress = "none"

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1.0
    • 1.1.0
    • operator
    • None

    Description

      Sometimes couchbase reports that rebalance is running:

      "rebalanceStatus":"running"

      Although it's not actually progressing:

      {"status":"none"}

       

      This causes the operator to skip reconcile because it assumes rebalance is happening and will eventually finish when it won't:

      time="2018-09-14T23:17:20Z" level=error msg="failed to reconcile: Skipping reconcile loop because the cluster is currently rebalancing" cluster-name=cb-example2 module=cluster

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          tommie Tommie McAfee created issue -
          simon.murray Simon Murray made changes -
          Field Original Value New Value
          Rank Ranked higher
          simon.murray Simon Murray made changes -
          Rank Ranked higher

          Rebalance was running

          time="2018-09-27T20:24:44Z" level=info msg="Rebalance progress: 40.000000" cluster-name=cb-example module=cluster
          time="2018-09-27T20:24:48Z" level=info msg="Rebalance progress: 40.000000" cluster-name=cb-example module=cluster 

          Then a Pod is killed

           time="2018-09-27T20:24:50Z" level=info msg="killed pod cb-example-0001 for selector app=couchbase" module=chaos

          Cluster reports rebalance is incomplete

          time="2018-09-27T20:24:59Z" level=info msg="Rebalance progress: unknown" cluster-name=cb-example module=cluster
          time="2018-09-27T20:25:12Z" level=error msg="failed to reconcile: Failed to rebalance: cluster reports rebalance incomplete" cluster-name=cb-example module=cluster 

          ns_server reports rebalance died unexpectedly

          [ns_server:error,2018-09-27T20:24:50.101Z,ns_1@cb-example-0000.cb-example.default.svc:service_agent-cbas<0.386.0>:service_agent:handle_info:235]Rebalancer <13615.27110.0> died unexpectedly: noconnection 

          Consequently rebalanceStatus remains as 'running' although progress reports "none"

           

          tommie Tommie McAfee added a comment - Rebalance was running time= "2018-09-27T20:24:44Z" level=info msg= "Rebalance progress: 40.000000" cluster-name=cb-example module=cluster time= "2018-09-27T20:24:48Z" level=info msg= "Rebalance progress: 40.000000" cluster-name=cb-example module=cluster Then a Pod is killed time= "2018-09-27T20:24:50Z" level=info msg= "killed pod cb-example-0001 for selector app=couchbase" module=chaos Cluster reports rebalance is incomplete time= "2018-09-27T20:24:59Z" level=info msg= "Rebalance progress: unknown" cluster-name=cb-example module=cluster time= "2018-09-27T20:25:12Z" level=error msg= "failed to reconcile: Failed to rebalance: cluster reports rebalance incomplete" cluster-name=cb-example module=cluster ns_server reports rebalance died unexpectedly [ns_server:error, 2018 - 09 -27T20: 24 : 50 .101Z,ns_1 @cb -example- 0000 .cb-example. default .svc:service_agent-cbas< 0.386 . 0 >:service_agent:handle_info: 235 ]Rebalancer < 13615.27110 . 0 > died unexpectedly: noconnection Consequently rebalanceStatus remains as 'running' although progress reports "none"  
          tommie Tommie McAfee made changes -
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Closed [ 6 ]

          People

            tommie Tommie McAfee
            tommie Tommie McAfee
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty