Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
Cheshire-Cat
-
6.6.2-9588 ----> 7.0.0-4979
-
Untriaged
-
1
-
No
Description
Steps to Repro
It is an essentially an upgrade of the system test cluster.
1. Start a 6.6.2 system test longevity run.
2. It has following cluster setup
- * 9 data nodes
- * 3 analytics nodes
- * 3 eventing nodes
- * 4 indexing nodes
- * 3 search nodes
- * 3 query nodes
3. It has 10 buckets, fts indexes, analytics datasets, 2i indexes, eventing functions.
4. We do a swap rebalance of 6 node(1 data, 1 index, 1 analytics, 1 fts, 1 query, 1 eventing) which has 6.6.2-9588 with 7.0.0-4979. This woks fine.
5. Failover one fts node 6.6.2-9588 - 172.23.106.207
6. Failover one n1ql node 6.6.2-9588 - 172.23.106.191
7. Now try to graceful failover one 6.6.2-9588 - 172.23.105.90
8. Now I hit into MB-45767.
9. To proceed with the upgrade of the cluster at this point I do multi node hard failover of the following nodes.
172.23.105.90
|
172.23.105.62
|
172.23.105.118
|
172.23.105.25
|
10. Run the following command on all the nodes (172.23.105.90,172.23.105.62,172.23.105.118,172.23.105.25,172.23.106.207,172.23.106.191).
systemctl stop couchbase-server
|
rpm -U http://172.23.126.166/builds/latestbuilds/couchbase-server/cheshire-cat/4979/couchbase-server-enterprise-7.0.0-4979-centos7.x86_64.rpm
|
Now I recover all the nodes and do a rebalance. Apart from the node 172.23.105.90 which is a kv node rebalance works for all the other nodes. I retried rebalance multiple times in the hope that I can continue upgrading the cluster. Unfortunately all the rebalances failed with the following error. See rebalanceReport (1).json
172.23.104.244 - 8:32:33 AM 19 Apr, 2021
Rebalance exited with reason {pre_rebalance_janitor_run_failed,"DISTRICT",
|
{error,
|
{config_sync_failed,push,
|
{error,[{'ns_1@172.23.106.225',timeout}]}}}}.
|
Rebalance Operation Id = 3c8c387d7a88daf60ffe335be82d46c4
|
It would be good to have a work around this so that I can continue to upgrade the cluster.
cbcollect_info attached. See also MB-45646 and MB-45767.
Attachments
Issue Links
For Gerrit Dashboard: MB-45769 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
151825,3 | MB-45769 allow chronicle_compat:get_snapshot to use ns_config that | master | ns_server | Status: MERGED | +2 | +1 |
151826,3 | MB-45769 eliminate extra calls of ns_config:get from api's that | master | ns_server | Status: MERGED | +2 | +1 |
151827,4 | MB-45769 do not do additional ns_config:get on mixed clusters when | master | ns_server | Status: MERGED | +2 | +1 |
151828,4 | MB-45769 do not refetch snapshot in ns_bucket:failover_warnings | master | ns_server | Status: MERGED | +2 | +1 |
151829,4 | MB-45769 get rid of extra ns_config:get when calculating | master | ns_server | Status: MERGED | +2 | +1 |