Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
Centos 7 64 bit; Couchbase EE 7.0.0-4721
-
Untriaged
-
Centos 64-bit
-
-
1
-
Yes
Description
Script to Repo
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/durability_volume.ini rerun=False,get-cbcollect-info=True -t bucket_collections.collections_quorum_loss.CollectionsQuorumLoss.test_quorum_loss_failover,nodes_init=5,bucket_spec=multi_bucket.buckets_all_membase_for_quorum_loss,replicas=3,num_node_failures=3,failover_orchestrator=True,quota_percent=80,GROUP=P2'
|
Steps to Reproduce
1. Create a 5 node cluster
+----------------+----------+-----------------------+---------------+--------------+
|
| Nodes | Services | Version | CPU | Status |
|
+----------------+----------+-----------------------+---------------+--------------+
|
| 172.23.105.215 | kv | 7.0.0-4721-enterprise | 3.56150543066 | Cluster node |
|
| 172.23.105.217 | None | | | <--- IN --- |
|
| 172.23.105.219 | None | | | <--- IN --- |
|
| 172.23.105.220 | None | | | <--- IN --- |
|
| 172.23.106.237 | None | | | <--- IN --- |
|
+----------------+----------+-----------------------+---------------+--------------+
|
2. Create bucket with 3 replicas
3. Hard unsafe Failover nodes: .215(orch), .217, .219 all at once by making rest call to .237
2021-03-19 01:37:19,776 | test | INFO | MainThread | [collections_quorum_loss:test_quorum_loss_failover:272] Failing over nodes explicitly [ip:172.23.105.215 port:8091 ssh_username:root, ip:172.23.105.217 port:8091 ssh_username:root, ip:172.23.105.219 port:8091 ssh_username:root]
|
Fails with unexpected server error
2021-03-19 01:37:20,200 | test | ERROR | pool-1-thread-17 | [rest_client:_http_request:748] POST http://172.23.106.237:8091/controller/failOver body: otpNode=ns_1%40172.23.105.215&otpNode=ns_1%40172.23.105.217&otpNode=ns_1%40172.23.105.219&allowUnsafe=true headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==\n', 'Content-Type': 'application/x-www-form-urlencoded'} error: 500 reason: status: 500, content: Unexpected server error: {error, |
{not_in_peers,'ns_1@172.23.105.215', |
['ns_1@172.23.105.220', |
'ns_1@172.23.106.237']}} Unexpected server error: {error, |
{not_in_peers,'ns_1@172.23.105.215', |
['ns_1@172.23.105.220', |
'ns_1@172.23.106.237']}} auth: Administrator:password |
Note that there are no failures induced on any of the nodes.
On previous builds, it seems to work fine.
on .215 error.log
[ns_server:error,2021-03-19T01:37:20.188-07:00,ns_1@172.23.105.215:<0.6898.0>:chronicle_master:handle_call:175]Unsuccesfull quorum loss failover. ({not_in_peers,'ns_1@172.23.105.215',
|
['ns_1@172.23.105.220',
|
'ns_1@172.23.106.237']}).
|
Attachments
For Gerrit Dashboard: MB-45086 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
152722,5 | MB-45086: Better message for some quorum failover errors | master | ns_server | Status: MERGED | +2 | +1 |