Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-597

Failover: MemberDown event not caught for one of the node during multinode failover

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.1.0
    • 1.1.0
    • operator

    Description

       

      Testcase:  TestMultiNodeAutoFailover

      Operator logs:

      time="2018-09-25T17:39:31Z" level=info msg="Rebalance progress: 0.000000" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:40:04Z" level=info msg="Created bucket default" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:40:41Z" level=info msg="server config test_config_1: test-couchbase-m7ng4-0006,test-couchbase-m7ng4-0007,test-couchbase-m7ng4-0008,test-couchbase-m7ng4-0000,test-couchbase-m7ng4-0001,test-couchbase-m7ng4-0003,test-couchbase-m7ng4-0004,test-couchbase-m7ng4-0005" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:40:41Z" level=info msg="running members: test-couchbase-m7ng4-0004,test-couchbase-m7ng4-0005,test-couchbase-m7ng4-0006,test-couchbase-m7ng4-0007,test-couchbase-m7ng4-0008,test-couchbase-m7ng4-0000,test-couchbase-m7ng4-0001,test-couchbase-m7ng4-0003" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:40:41Z" level=info msg="cluster membership: test-couchbase-m7ng4-0002,test-couchbase-m7ng4-0004,test-couchbase-m7ng4-0006,test-couchbase-m7ng4-0007,test-couchbase-m7ng4-0000,test-couchbase-m7ng4-0001,test-couchbase-m7ng4-0003,test-couchbase-m7ng4-0005,test-couchbase-m7ng4-0008" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:40:41Z" level=info msg="active nodes: test-couchbase-m7ng4-0005,test-couchbase-m7ng4-0006,test-couchbase-m7ng4-0007,test-couchbase-m7ng4-0008,test-couchbase-m7ng4-0000,test-couchbase-m7ng4-0001,test-couchbase-m7ng4-0003,test-couchbase-m7ng4-0004" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:40:41Z" level=info msg="down nodes: test-couchbase-m7ng4-0002" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:40:41Z" level=info msg="is rebalancing: false" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:40:41Z" level=info msg="needs rebalance: false" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:40:42Z" level=warning msg="test-couchbase-m7ng4-0002 is unrecoverable: No volume mounts defined" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:40:42Z" level=warning msg="Waiting for auto-failover of down node `test-couchbase-m7ng4-0002`" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:40:42Z" level=error msg="failed to reconcile: Unable to reconcile cluster because some nodes are down" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:05Z" level=info msg="server config test_config_1: test-couchbase-m7ng4-0000,test-couchbase-m7ng4-0001,test-couchbase-m7ng4-0005,test-couchbase-m7ng4-0006,test-couchbase-m7ng4-0007,test-couchbase-m7ng4-0008" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:05Z" level=info msg="running members: test-couchbase-m7ng4-0000,test-couchbase-m7ng4-0001,test-couchbase-m7ng4-0005,test-couchbase-m7ng4-0006,test-couchbase-m7ng4-0007,test-couchbase-m7ng4-0008" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:05Z" level=info msg="cluster membership: test-couchbase-m7ng4-0002,test-couchbase-m7ng4-0004,test-couchbase-m7ng4-0006,test-couchbase-m7ng4-0007,test-couchbase-m7ng4-0000,test-couchbase-m7ng4-0001,test-couchbase-m7ng4-0003,test-couchbase-m7ng4-0005,test-couchbase-m7ng4-0008" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:05Z" level=info msg="active nodes: test-couchbase-m7ng4-0006,test-couchbase-m7ng4-0007,test-couchbase-m7ng4-0008,test-couchbase-m7ng4-0000,test-couchbase-m7ng4-0001,test-couchbase-m7ng4-0005" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:05Z" level=info msg="down nodes: test-couchbase-m7ng4-0004" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:05Z" level=info msg="failed nodes: test-couchbase-m7ng4-0002,test-couchbase-m7ng4-0003" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:05Z" level=info msg="is rebalancing: false" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:05Z" level=info msg="needs rebalance: true" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:06Z" level=warning msg="test-couchbase-m7ng4-0004 is unrecoverable: No volume mounts defined" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:06Z" level=warning msg="Waiting for auto-failover of down node `test-couchbase-m7ng4-0004`" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:06Z" level=error msg="failed to reconcile: Unable to reconcile cluster because some nodes are down" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:15Z" level=info msg="server config test_config_1: test-couchbase-m7ng4-0007,test-couchbase-m7ng4-0008,test-couchbase-m7ng4-0000,test-couchbase-m7ng4-0001,test-couchbase-m7ng4-0005,test-couchbase-m7ng4-0006" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:15Z" level=info msg="running members: test-couchbase-m7ng4-0008,test-couchbase-m7ng4-0000,test-couchbase-m7ng4-0001,test-couchbase-m7ng4-0005,test-couchbase-m7ng4-0006,test-couchbase-m7ng4-0007" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:15Z" level=info msg="cluster membership: test-couchbase-m7ng4-0006,test-couchbase-m7ng4-0007,test-couchbase-m7ng4-0002,test-couchbase-m7ng4-0004,test-couchbase-m7ng4-0003,test-couchbase-m7ng4-0005,test-couchbase-m7ng4-0008,test-couchbase-m7ng4-0000,test-couchbase-m7ng4-0001" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:15Z" level=info msg="active nodes: test-couchbase-m7ng4-0007,test-couchbase-m7ng4-0008,test-couchbase-m7ng4-0000,test-couchbase-m7ng4-0001,test-couchbase-m7ng4-0005,test-couchbase-m7ng4-0006" cluster-name=test-couchbase-m7ng4 module=cluster
      time="2018-09-25T17:41:15Z" level=info msg="failed nodes: test-couchbase-m7ng4-0002,test-couchbase-m7ng4-0003,test-couchbase-m7ng4-0004" cluster-name=test-couchbase-m7ng4 module=cluster
      

       

      Events:

       

      BucketCreated      | A new bucket `default` was created                                 
      MemberDown         | Existing member test-couchbase-m7ng4-0002 down                     
      MemberDown         | Existing member test-couchbase-m7ng4-0004 down                     
      MemberFailedOver   | Existing member test-couchbase-m7ng4-0002 failed over              | <== no set members matched
      MemberFailedOver   | Existing member test-couchbase-m7ng4-0003 failed over              
      MemberFailedOver   | Existing member test-couchbase-m7ng4-0004 failed over              
      NewMemberAdded     | New member test-couchbase-m7ng4-0009 added to cluster              
      NewMemberAdded     | New member test-couchbase-m7ng4-0010 added to cluster              
      NewMemberAdded     | New member test-couchbase-m7ng4-0011 added to cluster              
      RebalanceStarted   | A rebalance has been started to balance data across the cluster
      MemberRemoved      | Existing member test-couchbase-m7ng4-0002 removed from the cluster
      MemberRemoved      | Existing member test-couchbase-m7ng4-0003 removed from
      the cluster 
      MemberRemoved      | Existing member test-couchbase-m7ng4-0004 removed from the cluster
      RebalanceCompleted | A rebalance has completed                                          

       

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            simon.murray Simon Murray added a comment -

            Can you increase the auto-failover time out to ~30s seconds, it's entirely possible that given an 8 second reconcile with your 10 second auto-failover is too aggressive for this test.

            simon.murray Simon Murray added a comment - Can you increase the auto-failover time out to ~30s seconds, it's entirely possible that given an 8 second reconcile with your 10 second auto-failover is too aggressive for this test.

            Updated default auto failover timeout to 30sec.

            Review: http://review.couchbase.org/c/99971

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Updated default auto failover timeout to 30sec. Review: http://review.couchbase.org/c/99971

            After updating auto failover timeout to 30sec, test works fine.

            Closing this ticket.

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - After updating auto failover timeout to 30sec, test works fine. Closing this ticket.

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              ashwin.govindarajulu Ashwin Govindarajulu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty