Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.0.2
-
6.6.3-9808 -> 7.0.2-6668
-
Triaged
-
1
-
No
Description
Steps to Repro
1. Run the following longevity script on 6.6.3 for 5 days.
./sequoia -client 172.23.104.254:2375 -provider file:centos_second_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.3-9808 -skip_setup=true -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
|
At this point it should have a 27 node cluster ( 9 Kv, 6 Index, 3 analytics, 3 fts, 3 eventing and 3 n1ql)
2. Create 10k metakv tombstones. This has been part of our testing since MB-44838 was fixed. We used to have a total of around 25k for CC, have reduced it here to around 12k.
#!/bin/sh
|
for i in {0..10000}
|
do
|
`curl -X PUT -u Administrator:password http://localhost:8091/_metakv/key{$i} -d 'value=foo1'`
|
`curl -X DELETE -v -u Administrator:password http://localhost:8091/_metakv/key{$i}`
|
done
|
3. Swap rebalance 6 nodes , 1 of each service with that of 7.0.2 nodes. Rebalance goes through successfully.
4. Failover 6 nodes(6.6.3 nodes)1 of each service(kv is graceful failover), Upgrade these nodes to 7.0.2, do a recovery of all the 6 node(kv is delta recovery) and rebalance.
5. Repeat step no 4 until all the nodes in cluster are upgraded to 7.0.2.
After upgrade I validated that all the metakv tombstones are purged and enabled IPv4 only + enforce tls using the below commands.
[root@localhost logs]# grep 'ns_config tombstone' debug.log
|
[ns_server:debug,2021-09-14T04:26:34.722-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 11869 ns_config tombstone(s) up to timestamp 63798837690. Tombstones:
|
[ns_server:debug,2021-09-14T04:30:38.296-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798837934. Tombstones:
|
[ns_server:debug,2021-09-14T04:31:41.731-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 127 ns_config tombstone(s) up to timestamp 63798837998. Tombstones:
|
[ns_server:debug,2021-09-14T04:39:42.419-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838481. Tombstones:
|
[ns_server:debug,2021-09-14T04:40:43.040-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838542. Tombstones:
|
[root@localhost logs]#
|
[root@localhost logs]# /opt/couchbase/bin/couchbase-cli ip-family -c http://localhost:8091 -u Administrator -p password --set --ipv4only
|
Switched IP family for node: http://172.23.106.134:8091
|
Switched IP family for node: http://172.23.106.136:8091
|
Switched IP family for node: http://172.23.106.137:8091
|
Switched IP family for node: http://172.23.106.138:8091
|
Switched IP family for node: http://172.23.120.58:8091
|
Switched IP family for node: http://172.23.120.73:8091
|
Switched IP family for node: http://172.23.120.74:8091
|
Switched IP family for node: http://172.23.120.75:8091
|
Switched IP family for node: http://172.23.120.77:8091
|
Switched IP family for node: http://172.23.120.81:8091
|
Switched IP family for node: http://172.23.120.86:8091
|
Switched IP family for node: http://172.23.121.118:8091
|
Switched IP family for node: http://172.23.121.77:8091
|
Switched IP family for node: http://172.23.123.24:8091
|
Switched IP family for node: http://172.23.123.25:8091
|
Switched IP family for node: http://172.23.123.26:8091
|
Switched IP family for node: http://172.23.123.31:8091
|
Switched IP family for node: http://172.23.123.32:8091
|
Switched IP family for node: http://172.23.123.33:8091
|
Switched IP family for node: http://172.23.96.122:8091
|
Switched IP family for node: http://172.23.96.14:8091
|
Switched IP family for node: http://172.23.96.243:8091
|
Switched IP family for node: http://172.23.97.105:8091
|
Switched IP family for node: http://172.23.97.148:8091
|
Switched IP family for node: http://172.23.97.149:8091
|
Switched IP family for node: http://172.23.97.150:8091
|
Switched IP family for node: http://172.23.97.151:8091
|
SUCCESS: Switched IP family of the cluster
|
[root@localhost logs]# /opt/couchbase/bin/couchbase-cli node-to-node-encryption -c http://localhost:8091 -u Administrator -p password --enable
|
Turned on encryption for node: http://172.23.106.134:8091
|
Turned on encryption for node: http://172.23.106.136:8091
|
Turned on encryption for node: http://172.23.106.137:8091
|
Turned on encryption for node: http://172.23.106.138:8091
|
Turned on encryption for node: http://172.23.120.58:8091
|
Turned on encryption for node: http://172.23.120.73:8091
|
Turned on encryption for node: http://172.23.120.74:8091
|
Turned on encryption for node: http://172.23.120.75:8091
|
Turned on encryption for node: http://172.23.120.77:8091
|
Turned on encryption for node: http://172.23.120.81:8091
|
Turned on encryption for node: http://172.23.120.86:8091
|
Turned on encryption for node: http://172.23.121.118:8091
|
Turned on encryption for node: http://172.23.121.77:8091
|
Turned on encryption for node: http://172.23.123.24:8091
|
Turned on encryption for node: http://172.23.123.25:8091
|
Turned on encryption for node: http://172.23.123.26:8091
|
Turned on encryption for node: http://172.23.123.31:8091
|
Turned on encryption for node: http://172.23.123.32:8091
|
Turned on encryption for node: http://172.23.123.33:8091
|
Turned on encryption for node: http://172.23.96.122:8091
|
Turned on encryption for node: http://172.23.96.14:8091
|
Turned on encryption for node: http://172.23.96.243:8091
|
Turned on encryption for node: http://172.23.97.105:8091
|
Turned on encryption for node: http://172.23.97.148:8091
|
Turned on encryption for node: http://172.23.97.149:8091
|
Turned on encryption for node: http://172.23.97.150:8091
|
Turned on encryption for node: http://172.23.97.151:8091
|
SUCCESS: Switched node-to-node encryption on
|
[root@localhost logs]# /opt/couchbase/bin/couchbase-cli setting-security -c http://localhost:8091 -u Administrator -p password --set --cluster-encryption-level strict
|
SUCCESS: Security settings updated
|
[root@localhost logs]#
|
At this point I noticed that Rebalance button was enabled. There was nothing to rebalance afaik. I did rebalance , It failed (with some nodes down error, possibly from setting IPV4 only and enforce tls when we restart services) and next time failed with eventing hang(that is tracked by MB-48449). This bug is to figure out why rebalance button was enabled in the first place. I don't particularly remember if the rebalance button was enabled after upgrade but before the enablement of IPV4-only and enfore-tls.
cbcollect_info attached. This the first time we are running this system test upgrade on 7.0.2, hence there is no baseline as such and no last working build.