Details
-
Bug
-
Resolution: Fixed
-
Major
-
6.6.3
-
None
-
6.6.2-9588. --> 6.6.3-9796
-
Untriaged
-
Centos 64-bit
-
1
-
No
Description
Steps to Repro
1. Run the following 6.6.2 longevity test for 4 days.
./sequoia -client 172.23.96.162:2375 -provider file:centos_third_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.2-9588 -skip_setup=true -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
|
2. At this point we would have 27 node cluster(3 analytics, 3 index, 3 fts, 3 query, 6 index, 9 data).
3. Do a swap rebalance of 6 nodes (1 of each service type). Worked fine.
4. Do a failover(graceful for kv) of 6 nodes (1 of each service type). Do an upgrade, recovery(delta for kv) and do a rebalance. Worked fine.
5. Do a failover(graceful for kv) of 6 nodes (1 of each service type). This graceful failover on kv node(172.23.105.164) failed as shown below.
"completionMessage": "Graceful failover exited with reason {mover_crashed,\n {unexpected_exit,\n {'EXIT',<0.1591.25>,\n {failed_to_update_vbucket_map,\n \"NEW_ORDER\",977,\n {error,\n [{'ns_1@172.23.106.54',\n {exit,\n {{nodedown,'ns_1@172.23.106.54'},\n {gen_server,call,\n [{ns_config_rep,\n 'ns_1@172.23.106.54'},\n synchronize_everything,\n infinity]}}}}]}}}}}."
|
This reminds me of the bug I hit into during 6.6.2 -> 7.0.0 upgrade because of bloated metakv tombstones. Notably MB-46778 and MB-46787. Not sure if it's the same though.
Some important things to note.
1. This is the first time we are doing system upgrade from 6.6.2 -> 6.6.3. So there is no baseline to speak of. This test was done for the first time in 7.0.0 using 6.6.2 -> 7.0.0 upgrade
2. Number of metakv tombstone are
[root@localhost ~]# curl --silent -u Administrator:password http://localhost:8091/diag/eval -d 'ns_config:get()' | grep '_deleted' | wc -l
|
17539
|
[root@localhost ~]#
|
Please not these are organically created tombstones unlike the ones we used to do in 7.0.0 using a shell script to test metakv purge testing for system test upgrade. No changes were done to the longevity test. These were written during MH time frame possibly around 2 years ago.
cbcollect_info attached.