Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.6.2, 6.5.2, Cheshire-Cat
-
Centos 7 64 bit; Couchbase EE 7.0.0-4721
-
Untriaged
-
Centos 64-bit
-
-
1
-
No
Description
Script to Repo
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/durability_volume.ini rerun=False -t bucket_collections.collections_quorum_loss.CollectionsQuorumLoss.test_quorum_loss_failover,nodes_init=5,bucket_spec=multi_bucket.buckets_all_membase_for_quorum_loss,replicas=3,failover_action=firewall,num_node_failures=3,quota_percent=80,GROUP=P1'
|
Steps to Repro
1. Create a 5 node init cluster
2021-03-18 05:53:37,832 | test | INFO | pool-1-thread-7 | [table_view:display:72] Rebalance Overview
|
+----------------+----------+-----------------------+---------------+--------------+
|
| Nodes | Services | Version | CPU | Status |
|
+----------------+----------+-----------------------+---------------+--------------+
|
| 172.23.105.215 | kv | 7.0.0-4721-enterprise | 4.46293494705 | Cluster node |
|
| 172.23.105.217 | None | | | <--- IN --- |
|
| 172.23.105.219 | None | | | <--- IN --- |
|
| 172.23.105.220 | None | | | <--- IN --- |
|
| 172.23.106.237 | None | | | <--- IN --- |
|
+----------------+----------+-----------------------+---------------+--------------+
|
2. Create 2 buckets
2021-03-18 05:55:56,053 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
|
+-------------------------------------+-----------+----------+------------+-----+--------+------------+-----------+-----------+
|
| Bucket | Type | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used |
|
+-------------------------------------+-----------+----------+------------+-----+--------+------------+-----------+-----------+
|
| A9%1Zc1YY4wOO3%lD-48-277000 | couchbase | 3 | none | 0 | 175000 | 4194304000 | 576985160 | 844152192 |
|
| DMOCEALabmUOiFZj6L_ccczdP-48-189000 | couchbase | 3 | none | 0 | 175000 | 4194304000 | 637234472 | 802156547 |
|
+-------------------------------------+-----------+----------+------------+-----+--------+------------+-----------+-----------+
|
3. Induce firewall on majority (3) of nodes
2021-03-18 05:55:56,092 | test | INFO | MainThread | [collections_quorum_loss:test_quorum_loss_failover:261] Inducing failure firewall on nodes: [ip:172.23.105.217 port:8091 ssh_username:root, ip:172.23.105.219 port:8091 ssh_username:root, ip:172.23.105.220 port:8091 ssh_username:root]
|
4. Quorum failover the above nodes
2021-03-18 05:56:58,549 | test | INFO | MainThread | [collections_quorum_loss:test_quorum_loss_failover:266] Failing over nodes explicitly [ip:172.23.105.217 port:8091 ssh_username:root, ip:172.23.105.219 port:8091 ssh_username:root, ip:172.23.105.220 port:8091 ssh_username:root]
|
2021-03-18 05:57:15,174 | test | ERROR | pool-1-thread-14 | [rest_client:_http_request:748] POST http://172.23.105.215:8091/controller/failOver body: otpNode=ns_1%40172.23.105.217&otpNode=ns_1%40172.23.105.219&otpNode=ns_1%40172.23.105.220&allowUnsafe=true headers: {'Accept': '*/*', 'Connection': 'close', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==\n', 'Content-Type': 'application/x-www-form-urlencoded'} error: 500 reason: unknown ["Unexpected server error, request logged."] auth: Administrator:password
|
2021-03-18 05:57:15,177 | test | ERROR | pool-1-thread-14 | [rest_client:fail_over:1291] [u'ns_1@172.23.105.217', u'ns_1@172.23.105.219', u'ns_1@172.23.105.220'] - Failover error: ["Unexpected server error, request logged."]
|
Fails with "Unexpected server error, request logged" error.
Also worth noting that the cluster becomes unusable : teardown fails, with orphan buckets and we see errors like "Unfinished failover of nodes was found" on UI.
Logs attached.
Attachments
Issue Links
- backports to
-
MB-46253 [Backport to 6.6.3] Replicators sometimes get stuck during failover
- Closed