Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45086

[Chronicle] Provide better REST API / UI error message in the case that the current chronicle leader is in the quorum failover nodes (was:Unsafe failover of nodes involving orch fails with error not in peers)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.0.0
    • Cheshire-Cat
    • ns_server
    • Centos 7 64 bit; Couchbase EE 7.0.0-4721

    Description

      Script to Repo

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/durability_volume.ini rerun=False,get-cbcollect-info=True -t bucket_collections.collections_quorum_loss.CollectionsQuorumLoss.test_quorum_loss_failover,nodes_init=5,bucket_spec=multi_bucket.buckets_all_membase_for_quorum_loss,replicas=3,num_node_failures=3,failover_orchestrator=True,quota_percent=80,GROUP=P2'

      Steps to Reproduce
      1. Create a 5 node cluster

      +----------------+----------+-----------------------+---------------+--------------+
      | Nodes          | Services | Version               | CPU           | Status       |
      +----------------+----------+-----------------------+---------------+--------------+
      | 172.23.105.215 | kv       | 7.0.0-4721-enterprise | 3.56150543066 | Cluster node |
      | 172.23.105.217 | None     |                       |               | <--- IN ---  |
      | 172.23.105.219 | None     |                       |               | <--- IN ---  |
      | 172.23.105.220 | None     |                       |               | <--- IN ---  |
      | 172.23.106.237 | None     |                       |               | <--- IN ---  |
      +----------------+----------+-----------------------+---------------+--------------+

      2. Create bucket with 3 replicas
      3. Hard unsafe Failover nodes:  .215(orch), .217, .219 all at once by making rest call to .237

      2021-03-19 01:37:19,776 | test  | INFO    | MainThread | [collections_quorum_loss:test_quorum_loss_failover:272] Failing over nodes explicitly [ip:172.23.105.215 port:8091 ssh_username:root, ip:172.23.105.217 port:8091 ssh_username:root, ip:172.23.105.219 port:8091 ssh_username:root]

      Fails with unexpected server error

      2021-03-19 01:37:20,200 | test  | ERROR   | pool-1-thread-17 | [rest_client:_http_request:748] POST http://172.23.106.237:8091/controller/failOver body: otpNode=ns_1%40172.23.105.215&otpNode=ns_1%40172.23.105.217&otpNode=ns_1%40172.23.105.219&allowUnsafe=true headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==\n', 'Content-Type': 'application/x-www-form-urlencoded'} error: 500 reason: status: 500, content: Unexpected server error: {error,
                                   {not_in_peers,'ns_1@172.23.105.215',
                                       ['ns_1@172.23.105.220',
                                        'ns_1@172.23.106.237']}} Unexpected server error: {error,
                                   {not_in_peers,'ns_1@172.23.105.215',
                                       ['ns_1@172.23.105.220',
                                        'ns_1@172.23.106.237']}} auth: Administrator:password
      

      Note that there are no failures induced on any of the nodes. 

      On previous builds, it seems to work fine. 

      on .215 error.log

      [ns_server:error,2021-03-19T01:37:20.188-07:00,ns_1@172.23.105.215:<0.6898.0>:chronicle_master:handle_call:175]Unsuccesfull quorum loss failover. ({not_in_peers,'ns_1@172.23.105.215',
                                           ['ns_1@172.23.105.220',
                                            'ns_1@172.23.106.237']}).

      Attachments

        For Gerrit Dashboard: MB-45086
        # Subject Branch Project Status CR V

        Activity

          People

            sumedh.basarkod Sumedh Basarkod (Inactive)
            sumedh.basarkod Sumedh Basarkod (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty