Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-37025

Analytics Rebalance Failure After Pod Down and Replacement

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • None
    • 6.0.3
    • analytics
    • Kubernetes 1.13, Operator 2.0
    • Untriaged
    • Unknown

    Description

      Test:

      3 node cluster created with analytics enabled.  Pods are killed sequentially, allowing for the node to be reported as down and failed over and the operator to balance in a replacement.

      Observation:

      Rebalance fails to complete fairly consistently, but succeeds on a second rebalance.

       

      | NewMemberAdded      | New member test-couchbase-rv58w-0000 added to cluster              |
      | NewMemberAdded      | New member test-couchbase-rv58w-0001 added to cluster              |
      | NewMemberAdded      | New member test-couchbase-rv58w-0002 added to cluster              |
      | RebalanceStarted    | A rebalance has been started to balance data across the cluster    |
      | RebalanceCompleted  | A rebalance has completed                                          |
      | BucketCreated       | A new bucket `default` was created                                 |
      | MemberDown          | Existing member test-couchbase-rv58w-0000 down                     |
      | MemberFailedOver    | Existing member test-couchbase-rv58w-0000 failed over              |
      | NewMemberAdded      | New member test-couchbase-rv58w-0003 added to cluster              |
      | RebalanceStarted    | A rebalance has been started to balance data across the cluster    |
      | RebalanceIncomplete | A rebalance is incomplete                                          | <== event reason mismatch, expected 'MemberRemoved', actual 'RebalanceIncomplete'
      | MemberRemoved       | Existing member test-couchbase-rv58w-0000 removed from the cluster |
      | RebalanceStarted    | A rebalance has been started to balance data across the cluster    |
      | RebalanceCompleted  | A rebalance has completed                                          |
      | MemberDown          | Existing member test-couchbase-rv58w-0001 down                     |
      | MemberFailedOver    | Existing member test-couchbase-rv58w-0001 failed over              |
      | NewMemberAdded      | New member test-couchbase-rv58w-0004 added to cluster              |
      | RebalanceStarted    | A rebalance has been started to balance data across the cluster    |
      | MemberRemoved       | Existing member test-couchbase-rv58w-0001 removed from the cluster |
      | RebalanceCompleted  | A rebalance has completed                                          |
      | MemberDown          | Existing member test-couchbase-rv58w-0002 down                     |
      | MemberFailedOver    | Existing member test-couchbase-rv58w-0002 failed over              |
      | NewMemberAdded      | New member test-couchbase-rv58w-0005 added to cluster              |
      | RebalanceStarted    | A rebalance has been started to balance data across the cluster    |
      | RebalanceIncomplete | A rebalance is incomplete                                          |
      | MemberRemoved       | Existing member test-couchbase-rv58w-0002 removed from the cluster |
      | RebalanceStarted    | A rebalance has been started to balance data across the cluster    |
      | RebalanceCompleted  | A rebalance has completed                                          |

      Expectation:

      Rebalances work consistently first time, every time

      Timeline:

      Server reports node 0 as failed over

       

      {"level":"info","ts":1574417906.722955,"logger":"couchbaseutil","msg":"Node status","cluster":"default/test-couchbase-rv58w","name":"test-couchbase-rv58w-0000","version":"6.0.3","class":"test_config_1","managed":true,"status":"failed"}
      {"level":"info","ts":1574417906.722963,"logger":"couchbaseutil","msg":"Node status","cluster":"default/test-couchbase-rv58w","name":"test-couchbase-rv58w-0001","version":"6.0.3","class":"test_config_1","managed":true,"status":"active"}
      {"level":"info","ts":1574417906.7229683,"logger":"couchbaseutil","msg":"Node status","cluster":"default/test-couchbase-rv58w","name":"test-couchbase-rv58w-0002","version":"6.0.3","class":"test_config_1","managed":true,"status":"active"}

      A replacement is scheduled in

       

       

      {"level":"info","ts":1574417908.4882917,"logger":"cluster","msg":"Pod unrecoverable","cluster":"default/test-couchbase-rv58w","name":"test-couchbase-rv58w-0000","reason":"No volume mounts defined"}
      {"level":"info","ts":1574417908.4883049,"logger":"cluster","msg":"Pod failed, deleting","cluster":"default/test-couchbase-rv58w","name":"test-couchbase-rv58w-0000"}
      {"level":"info","ts":1574417909.7209096,"logger":"cluster","msg":"Creating pod","cluster":"default/test-couchbase-rv58w","name":"test-couchbase-rv58w-0003","image":"couchbase/server:6.0.3"}

      The rebalance completes.  We poll Server for 10s to see if the unbalanced status is cleared

       

       

      {"level":"info","ts":1574417952.9106264,"logger":"couchbaseutil","msg":"Rebalancing","cluster":"default/test-couchbase-rv58w","progress":66.66666666666667}
      {"level":"debug","ts":1574417956.915534,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0003.test-couchbase-rv58w.default.svc:8091/pools/default/tasks","status":"200 OK","time_ms":3.624779}
      {"level":"debug","ts":1574417956.9389496,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":4.94814}
      {"level":"debug","ts":1574417957.9330237,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":3.774437}
      {"level":"debug","ts":1574417958.9282775,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":5.232787}
      {"level":"debug","ts":1574417959.929354,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":6.612067}
      {"level":"debug","ts":1574417960.9301717,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":6.776474}
      {"level":"debug","ts":1574417961.9228926,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0003.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":6.63882}
      {"level":"debug","ts":1574417962.9338944,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":5.510516}
      {"level":"debug","ts":1574417963.9275134,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":1.867191}
      {"level":"debug","ts":1574417964.9334671,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":7.705285}
      {"level":"debug","ts":1574417965.9380221,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":7.344765}
      {"level":"debug","ts":1574417966.9293282,"logger":"client","msg":"http","method":"GET","url":"http://test-couchbase-rv58w-0001.test-couchbase-rv58w.default.svc:8091/pools/default","status":"200 OK","time_ms":1.656747}

      Next time around the reconcile loop the status is still flagged as unbalanced

       

       

      {"level":"info","ts":1574417967.091391,"logger":"couchbaseutil","msg":"Node status","cluster":"default/test-couchbase-rv58w","name":"test-couchbase-rv58w-0000","version":"6.0.3","class":"test_config_1","managed":true,"status":"unclustered"}

       

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              michael.blow Michael Blow
              simon.murray Simon Murray
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty