Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-37842

Partition-failover of the first node in the cluster fails with "500 Internal server error; config sync failed"

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Not a Bug
    • Major
    • 7.1.0
    • Cheshire-Cat
    • ns_server

    Description

      Steps to reproduce this bug is as follows:

      1. Create a 4 node cluster (10.112.194.101, 10.112.194.102, 10.112.194.103, 10.112.194.104, with 10.112.194.101 being the node that initiates the cluster creation)
      2.  Isolate two nodes  10.112.194.101 and 10.112.194.102 from each other. So this introduces a network partition such that these two nodes cannot communicate with
        each other, but are able to communicate with all other nodes.
        This can be done by executing the following commands inside each of the two nodes above
        Execute in node 1 and node 2 respectively,

      iptables -A INPUT -s 10.112.194.102 -j DROP 

      iptables -A INPUT -s 10.112.194.101 -j DROP

       3. Hard failover the first node with a rest call to third node. So in node 1 execute:

      curl -v -X POST -u Administrator:password http://10.112.194.103:8091/controller/failOver -d 'otpNode=ns_1@10.112.194.101'

      Failover fails with the above mentioned error. Screenshots are attached.

      I found this intermittent bug originally when we run jepsen-durability-misc-daily-new tests (http://qa.sc.couchbase.com/job/jepsen-durability-misc-daily-new/) for the partition-failover workload(and when the failed over node happens to be the first node in the cluster).
      Nemesis crashes because failover fails with the above mentioned error and results in "unknown" error. The config for that is as follows:

      workload=partition-failover,node-count=6,replicas=2,no-autofailover,kv-timeout=30,durability=0:100:0:0

      Node that this may fail or succeed depending upon whether the failed over node is first node of the cluster or not.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            sumedh.basarkod Sumedh Basarkod (Inactive)
            sumedh.basarkod Sumedh Basarkod (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty