Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Duplicate
-
7.0.2
-
6.6.3-9808 -> 7.0.2-6668
-
Untriaged
-
Centos 64-bit
-
1
-
No
Description
Steps to Repro
1. Run the following longevity script on 6.6.3 for 5 days.
./sequoia -client 172.23.104.254:2375 -provider file:centos_second_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.3-9808 -skip_setup=true -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
|
At this point it should have a 27 node cluster ( 9 Kv, 6 Index, 3 analytics, 3 fts, 3 eventing and 3 n1ql)
2. Create 10k metakv tombstones. This has been part of our testing since MB-44838 was fixed. We used to have a total of around 25k for CC, have reduced it here to around 12k.
#!/bin/sh
|
for i in {0..10000}
|
do
|
`curl -X PUT -u Administrator:password http://localhost:8091/_metakv/key{$i} -d 'value=foo1'`
|
`curl -X DELETE -v -u Administrator:password http://localhost:8091/_metakv/key{$i}`
|
done
|
3. Swap rebalance 6 nodes , 1 of each service with that of 7.0.2 nodes. Rebalance goes through successfully.
4. Failover 6 nodes(6.6.3 nodes)1 of each service(kv is graceful failover), Upgrade these nodes to 7.0.2, do a recovery of all the 6 node(kv is delta recovery) and rebalance.
5. Repeat step no 4 until all the nodes in cluster are upgraded to 7.0.2.
After upgrade I validated that all the metakv tombstones are purged and enabled IPv4 only + enforce tls using the below commands.
[root@localhost logs]# grep 'ns_config tombstone' debug.log
|
[ns_server:debug,2021-09-14T04:26:34.722-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 11869 ns_config tombstone(s) up to timestamp 63798837690. Tombstones:
|
[ns_server:debug,2021-09-14T04:30:38.296-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798837934. Tombstones:
|
[ns_server:debug,2021-09-14T04:31:41.731-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 127 ns_config tombstone(s) up to timestamp 63798837998. Tombstones:
|
[ns_server:debug,2021-09-14T04:39:42.419-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838481. Tombstones:
|
[ns_server:debug,2021-09-14T04:40:43.040-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838542. Tombstones:
|
[root@localhost logs]#
|
[root@localhost logs]# /opt/couchbase/bin/couchbase-cli ip-family -c http://localhost:8091 -u Administrator -p password --set --ipv4only
|
Switched IP family for node: http://172.23.106.134:8091
|
Switched IP family for node: http://172.23.106.136:8091
|
Switched IP family for node: http://172.23.106.137:8091
|
Switched IP family for node: http://172.23.106.138:8091
|
Switched IP family for node: http://172.23.120.58:8091
|
Switched IP family for node: http://172.23.120.73:8091
|
Switched IP family for node: http://172.23.120.74:8091
|
Switched IP family for node: http://172.23.120.75:8091
|
Switched IP family for node: http://172.23.120.77:8091
|
Switched IP family for node: http://172.23.120.81:8091
|
Switched IP family for node: http://172.23.120.86:8091
|
Switched IP family for node: http://172.23.121.118:8091
|
Switched IP family for node: http://172.23.121.77:8091
|
Switched IP family for node: http://172.23.123.24:8091
|
Switched IP family for node: http://172.23.123.25:8091
|
Switched IP family for node: http://172.23.123.26:8091
|
Switched IP family for node: http://172.23.123.31:8091
|
Switched IP family for node: http://172.23.123.32:8091
|
Switched IP family for node: http://172.23.123.33:8091
|
Switched IP family for node: http://172.23.96.122:8091
|
Switched IP family for node: http://172.23.96.14:8091
|
Switched IP family for node: http://172.23.96.243:8091
|
Switched IP family for node: http://172.23.97.105:8091
|
Switched IP family for node: http://172.23.97.148:8091
|
Switched IP family for node: http://172.23.97.149:8091
|
Switched IP family for node: http://172.23.97.150:8091
|
Switched IP family for node: http://172.23.97.151:8091
|
SUCCESS: Switched IP family of the cluster
|
[root@localhost logs]# /opt/couchbase/bin/couchbase-cli node-to-node-encryption -c http://localhost:8091 -u Administrator -p password --enable
|
Turned on encryption for node: http://172.23.106.134:8091
|
Turned on encryption for node: http://172.23.106.136:8091
|
Turned on encryption for node: http://172.23.106.137:8091
|
Turned on encryption for node: http://172.23.106.138:8091
|
Turned on encryption for node: http://172.23.120.58:8091
|
Turned on encryption for node: http://172.23.120.73:8091
|
Turned on encryption for node: http://172.23.120.74:8091
|
Turned on encryption for node: http://172.23.120.75:8091
|
Turned on encryption for node: http://172.23.120.77:8091
|
Turned on encryption for node: http://172.23.120.81:8091
|
Turned on encryption for node: http://172.23.120.86:8091
|
Turned on encryption for node: http://172.23.121.118:8091
|
Turned on encryption for node: http://172.23.121.77:8091
|
Turned on encryption for node: http://172.23.123.24:8091
|
Turned on encryption for node: http://172.23.123.25:8091
|
Turned on encryption for node: http://172.23.123.26:8091
|
Turned on encryption for node: http://172.23.123.31:8091
|
Turned on encryption for node: http://172.23.123.32:8091
|
Turned on encryption for node: http://172.23.123.33:8091
|
Turned on encryption for node: http://172.23.96.122:8091
|
Turned on encryption for node: http://172.23.96.14:8091
|
Turned on encryption for node: http://172.23.96.243:8091
|
Turned on encryption for node: http://172.23.97.105:8091
|
Turned on encryption for node: http://172.23.97.148:8091
|
Turned on encryption for node: http://172.23.97.149:8091
|
Turned on encryption for node: http://172.23.97.150:8091
|
Turned on encryption for node: http://172.23.97.151:8091
|
SUCCESS: Switched node-to-node encryption on
|
[root@localhost logs]# /opt/couchbase/bin/couchbase-cli setting-security -c http://localhost:8091 -u Administrator -p password --set --cluster-encryption-level strict
|
SUCCESS: Security settings updated
|
[root@localhost logs]#
|
At this point I noticed that Rebalance button was enabled. Not sure why this was enabled as we had nothing to rebalance afaik(this is tracked through MB-48448). However, when I did rebalance it failed as shown below.
ns_1@172.23.106.136 5:47:32 AM 14 Sep, 2021
Rebalance exited with reason {service_rebalance_failed,eventing,
|
{worker_died,
|
{'EXIT',<0.29362.951>,
|
{rebalance_failed,
|
{service_error,
|
<<"eventing rebalance hasn't made progress for past 1200 secs">>}}}}}.
|
Rebalance Operation Id = f38dda25a6ccfe31114529f5d88b0753
|
cbcollect_info attached. This the first time we are running this system test upgrade on 7.0.2, hence there is no baseline as such and no last working build.
Attachments
Activity
Field | Original Value | New Value |
---|---|---|
Description |
+Steps to Repro+
1. Run the following longevity script on 6.6.3 for 5 days. {noformat} ./sequoia -client 172.23.104.254:2375 -provider file:centos_second_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.3-9808 -skip_setup=true -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true {noformat} At this point it should have a 27 node cluster ( 9 Kv, 6 Index, 3 analytics, 3 fts, 3 eventing and 3 n1ql) 2. Create 10k metakv tombstones. This has been part of our testing since {noformat} #!/bin/sh for i in {0..10000} do `curl -X PUT -u Administrator:password http://localhost:8091/_metakv/key{$i} -d 'value=foo1'` `curl -X DELETE -v -u Administrator:password http://localhost:8091/_metakv/key{$i}` done {noformat} 3. Swap rebalance 6 nodes , 1 of each service with that of 7.0.2 nodes. Rebalance goes through successfully. 4. Failover 6 nodes(6.6.3 nodes)1 of each service(kv is graceful failover), Upgrade these nodes to 7.0.2, do a recovery of all the 6 node(kv is delta recovery) and rebalance. 5. Repeat step no 4, until all the nodes in cluster are upgrade. After upgrade I validated that all the metakv tombstones are purged and enabled IPv4 only + enforce tls using the below commands. {noformat} [root@localhost logs]# grep 'ns_config tombstone' debug.log [ns_server:debug,2021-09-14T04:26:34.722-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 11869 ns_config tombstone(s) up to timestamp 63798837690. Tombstones: [ns_server:debug,2021-09-14T04:30:38.296-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798837934. Tombstones: [ns_server:debug,2021-09-14T04:31:41.731-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 127 ns_config tombstone(s) up to timestamp 63798837998. Tombstones: [ns_server:debug,2021-09-14T04:39:42.419-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838481. Tombstones: [ns_server:debug,2021-09-14T04:40:43.040-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838542. Tombstones: [root@localhost logs]# [root@localhost logs]# /opt/couchbase/bin/couchbase-cli ip-family -c http://localhost:8091 -u Administrator -p password --set --ipv4only Switched IP family for node: http://172.23.106.134:8091 Switched IP family for node: http://172.23.106.136:8091 Switched IP family for node: http://172.23.106.137:8091 Switched IP family for node: http://172.23.106.138:8091 Switched IP family for node: http://172.23.120.58:8091 Switched IP family for node: http://172.23.120.73:8091 Switched IP family for node: http://172.23.120.74:8091 Switched IP family for node: http://172.23.120.75:8091 Switched IP family for node: http://172.23.120.77:8091 Switched IP family for node: http://172.23.120.81:8091 Switched IP family for node: http://172.23.120.86:8091 Switched IP family for node: http://172.23.121.118:8091 Switched IP family for node: http://172.23.121.77:8091 Switched IP family for node: http://172.23.123.24:8091 Switched IP family for node: http://172.23.123.25:8091 Switched IP family for node: http://172.23.123.26:8091 Switched IP family for node: http://172.23.123.31:8091 Switched IP family for node: http://172.23.123.32:8091 Switched IP family for node: http://172.23.123.33:8091 Switched IP family for node: http://172.23.96.122:8091 Switched IP family for node: http://172.23.96.14:8091 Switched IP family for node: http://172.23.96.243:8091 Switched IP family for node: http://172.23.97.105:8091 Switched IP family for node: http://172.23.97.148:8091 Switched IP family for node: http://172.23.97.149:8091 Switched IP family for node: http://172.23.97.150:8091 Switched IP family for node: http://172.23.97.151:8091 SUCCESS: Switched IP family of the cluster [root@localhost logs]# /opt/couchbase/bin/couchbase-cli node-to-node-encryption -c http://localhost:8091 -u Administrator -p password --enable Turned on encryption for node: http://172.23.106.134:8091 Turned on encryption for node: http://172.23.106.136:8091 Turned on encryption for node: http://172.23.106.137:8091 Turned on encryption for node: http://172.23.106.138:8091 Turned on encryption for node: http://172.23.120.58:8091 Turned on encryption for node: http://172.23.120.73:8091 Turned on encryption for node: http://172.23.120.74:8091 Turned on encryption for node: http://172.23.120.75:8091 Turned on encryption for node: http://172.23.120.77:8091 Turned on encryption for node: http://172.23.120.81:8091 Turned on encryption for node: http://172.23.120.86:8091 Turned on encryption for node: http://172.23.121.118:8091 Turned on encryption for node: http://172.23.121.77:8091 Turned on encryption for node: http://172.23.123.24:8091 Turned on encryption for node: http://172.23.123.25:8091 Turned on encryption for node: http://172.23.123.26:8091 Turned on encryption for node: http://172.23.123.31:8091 Turned on encryption for node: http://172.23.123.32:8091 Turned on encryption for node: http://172.23.123.33:8091 Turned on encryption for node: http://172.23.96.122:8091 Turned on encryption for node: http://172.23.96.14:8091 Turned on encryption for node: http://172.23.96.243:8091 Turned on encryption for node: http://172.23.97.105:8091 Turned on encryption for node: http://172.23.97.148:8091 Turned on encryption for node: http://172.23.97.149:8091 Turned on encryption for node: http://172.23.97.150:8091 Turned on encryption for node: http://172.23.97.151:8091 SUCCESS: Switched node-to-node encryption on [root@localhost logs]# /opt/couchbase/bin/couchbase-cli setting-security -c http://localhost:8091 -u Administrator -p password --set --cluster-encryption-level strict SUCCESS: Security settings updated [root@localhost logs]# {noformat} At this point I noticed that Rebalance button was enabled. Not sure why this was enabled as we had nothing to rebalance afaik(this is tracked through + ns_1@172.23.106.136 5:47:32 AM 14 Sep, 2021 + {noformat} Rebalance exited with reason {service_rebalance_failed,eventing, {worker_died, {'EXIT',<0.29362.951>, {rebalance_failed, {service_error, <<"eventing rebalance hasn't made progress for past 1200 secs">>}}}}}. Rebalance Operation Id = f38dda25a6ccfe31114529f5d88b0753 {noformat} cbcollect_info attached. This the first time we are running this system test upgrade on 7.0.2, hence there is no baseline as such and no last working build. |
+Steps to Repro+
1. Run the following longevity script on 6.6.3 for 5 days. {noformat} ./sequoia -client 172.23.104.254:2375 -provider file:centos_second_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.3-9808 -skip_setup=true -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true {noformat} At this point it should have a 27 node cluster ( 9 Kv, 6 Index, 3 analytics, 3 fts, 3 eventing and 3 n1ql) 2. Create 10k metakv tombstones. This has been part of our testing since {noformat} #!/bin/sh for i in {0..10000} do `curl -X PUT -u Administrator:password http://localhost:8091/_metakv/key{$i} -d 'value=foo1'` `curl -X DELETE -v -u Administrator:password http://localhost:8091/_metakv/key{$i}` done {noformat} 3. Swap rebalance 6 nodes , 1 of each service with that of 7.0.2 nodes. Rebalance goes through successfully. 4. Failover 6 nodes(6.6.3 nodes)1 of each service(kv is graceful failover), Upgrade these nodes to 7.0.2, do a recovery of all the 6 node(kv is delta recovery) and rebalance. 5. Repeat step no 4, until all the nodes in cluster are upgrade. After upgrade I validated that all the metakv tombstones are purged and enabled IPv4 only + enforce tls using the below commands. {noformat} [root@localhost logs]# grep 'ns_config tombstone' debug.log [ns_server:debug,2021-09-14T04:26:34.722-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 11869 ns_config tombstone(s) up to timestamp 63798837690. Tombstones: [ns_server:debug,2021-09-14T04:30:38.296-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798837934. Tombstones: [ns_server:debug,2021-09-14T04:31:41.731-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 127 ns_config tombstone(s) up to timestamp 63798837998. Tombstones: [ns_server:debug,2021-09-14T04:39:42.419-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838481. Tombstones: [ns_server:debug,2021-09-14T04:40:43.040-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838542. Tombstones: [root@localhost logs]# [root@localhost logs]# /opt/couchbase/bin/couchbase-cli ip-family -c http://localhost:8091 -u Administrator -p password --set --ipv4only Switched IP family for node: http://172.23.106.134:8091 Switched IP family for node: http://172.23.106.136:8091 Switched IP family for node: http://172.23.106.137:8091 Switched IP family for node: http://172.23.106.138:8091 Switched IP family for node: http://172.23.120.58:8091 Switched IP family for node: http://172.23.120.73:8091 Switched IP family for node: http://172.23.120.74:8091 Switched IP family for node: http://172.23.120.75:8091 Switched IP family for node: http://172.23.120.77:8091 Switched IP family for node: http://172.23.120.81:8091 Switched IP family for node: http://172.23.120.86:8091 Switched IP family for node: http://172.23.121.118:8091 Switched IP family for node: http://172.23.121.77:8091 Switched IP family for node: http://172.23.123.24:8091 Switched IP family for node: http://172.23.123.25:8091 Switched IP family for node: http://172.23.123.26:8091 Switched IP family for node: http://172.23.123.31:8091 Switched IP family for node: http://172.23.123.32:8091 Switched IP family for node: http://172.23.123.33:8091 Switched IP family for node: http://172.23.96.122:8091 Switched IP family for node: http://172.23.96.14:8091 Switched IP family for node: http://172.23.96.243:8091 Switched IP family for node: http://172.23.97.105:8091 Switched IP family for node: http://172.23.97.148:8091 Switched IP family for node: http://172.23.97.149:8091 Switched IP family for node: http://172.23.97.150:8091 Switched IP family for node: http://172.23.97.151:8091 SUCCESS: Switched IP family of the cluster [root@localhost logs]# /opt/couchbase/bin/couchbase-cli node-to-node-encryption -c http://localhost:8091 -u Administrator -p password --enable Turned on encryption for node: http://172.23.106.134:8091 Turned on encryption for node: http://172.23.106.136:8091 Turned on encryption for node: http://172.23.106.137:8091 Turned on encryption for node: http://172.23.106.138:8091 Turned on encryption for node: http://172.23.120.58:8091 Turned on encryption for node: http://172.23.120.73:8091 Turned on encryption for node: http://172.23.120.74:8091 Turned on encryption for node: http://172.23.120.75:8091 Turned on encryption for node: http://172.23.120.77:8091 Turned on encryption for node: http://172.23.120.81:8091 Turned on encryption for node: http://172.23.120.86:8091 Turned on encryption for node: http://172.23.121.118:8091 Turned on encryption for node: http://172.23.121.77:8091 Turned on encryption for node: http://172.23.123.24:8091 Turned on encryption for node: http://172.23.123.25:8091 Turned on encryption for node: http://172.23.123.26:8091 Turned on encryption for node: http://172.23.123.31:8091 Turned on encryption for node: http://172.23.123.32:8091 Turned on encryption for node: http://172.23.123.33:8091 Turned on encryption for node: http://172.23.96.122:8091 Turned on encryption for node: http://172.23.96.14:8091 Turned on encryption for node: http://172.23.96.243:8091 Turned on encryption for node: http://172.23.97.105:8091 Turned on encryption for node: http://172.23.97.148:8091 Turned on encryption for node: http://172.23.97.149:8091 Turned on encryption for node: http://172.23.97.150:8091 Turned on encryption for node: http://172.23.97.151:8091 SUCCESS: Switched node-to-node encryption on [root@localhost logs]# /opt/couchbase/bin/couchbase-cli setting-security -c http://localhost:8091 -u Administrator -p password --set --cluster-encryption-level strict SUCCESS: Security settings updated [root@localhost logs]# {noformat} At this point I noticed that Rebalance button was enabled. Not sure why this was enabled as we had nothing to rebalance afaik(this is tracked through +ns_1@172.23.106.136 5:47:32 AM 14 Sep, 2021 + {noformat} Rebalance exited with reason {service_rebalance_failed,eventing, {worker_died, {'EXIT',<0.29362.951>, {rebalance_failed, {service_error, <<"eventing rebalance hasn't made progress for past 1200 secs">>}}}}}. Rebalance Operation Id = f38dda25a6ccfe31114529f5d88b0753 {noformat} cbcollect_info attached. This the first time we are running this system test upgrade on 7.0.2, hence there is no baseline as such and no last working build. |
Description |
+Steps to Repro+
1. Run the following longevity script on 6.6.3 for 5 days. {noformat} ./sequoia -client 172.23.104.254:2375 -provider file:centos_second_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.3-9808 -skip_setup=true -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true {noformat} At this point it should have a 27 node cluster ( 9 Kv, 6 Index, 3 analytics, 3 fts, 3 eventing and 3 n1ql) 2. Create 10k metakv tombstones. This has been part of our testing since {noformat} #!/bin/sh for i in {0..10000} do `curl -X PUT -u Administrator:password http://localhost:8091/_metakv/key{$i} -d 'value=foo1'` `curl -X DELETE -v -u Administrator:password http://localhost:8091/_metakv/key{$i}` done {noformat} 3. Swap rebalance 6 nodes , 1 of each service with that of 7.0.2 nodes. Rebalance goes through successfully. 4. Failover 6 nodes(6.6.3 nodes)1 of each service(kv is graceful failover), Upgrade these nodes to 7.0.2, do a recovery of all the 6 node(kv is delta recovery) and rebalance. 5. Repeat step no 4, until all the nodes in cluster are upgrade. After upgrade I validated that all the metakv tombstones are purged and enabled IPv4 only + enforce tls using the below commands. {noformat} [root@localhost logs]# grep 'ns_config tombstone' debug.log [ns_server:debug,2021-09-14T04:26:34.722-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 11869 ns_config tombstone(s) up to timestamp 63798837690. Tombstones: [ns_server:debug,2021-09-14T04:30:38.296-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798837934. Tombstones: [ns_server:debug,2021-09-14T04:31:41.731-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 127 ns_config tombstone(s) up to timestamp 63798837998. Tombstones: [ns_server:debug,2021-09-14T04:39:42.419-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838481. Tombstones: [ns_server:debug,2021-09-14T04:40:43.040-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838542. Tombstones: [root@localhost logs]# [root@localhost logs]# /opt/couchbase/bin/couchbase-cli ip-family -c http://localhost:8091 -u Administrator -p password --set --ipv4only Switched IP family for node: http://172.23.106.134:8091 Switched IP family for node: http://172.23.106.136:8091 Switched IP family for node: http://172.23.106.137:8091 Switched IP family for node: http://172.23.106.138:8091 Switched IP family for node: http://172.23.120.58:8091 Switched IP family for node: http://172.23.120.73:8091 Switched IP family for node: http://172.23.120.74:8091 Switched IP family for node: http://172.23.120.75:8091 Switched IP family for node: http://172.23.120.77:8091 Switched IP family for node: http://172.23.120.81:8091 Switched IP family for node: http://172.23.120.86:8091 Switched IP family for node: http://172.23.121.118:8091 Switched IP family for node: http://172.23.121.77:8091 Switched IP family for node: http://172.23.123.24:8091 Switched IP family for node: http://172.23.123.25:8091 Switched IP family for node: http://172.23.123.26:8091 Switched IP family for node: http://172.23.123.31:8091 Switched IP family for node: http://172.23.123.32:8091 Switched IP family for node: http://172.23.123.33:8091 Switched IP family for node: http://172.23.96.122:8091 Switched IP family for node: http://172.23.96.14:8091 Switched IP family for node: http://172.23.96.243:8091 Switched IP family for node: http://172.23.97.105:8091 Switched IP family for node: http://172.23.97.148:8091 Switched IP family for node: http://172.23.97.149:8091 Switched IP family for node: http://172.23.97.150:8091 Switched IP family for node: http://172.23.97.151:8091 SUCCESS: Switched IP family of the cluster [root@localhost logs]# /opt/couchbase/bin/couchbase-cli node-to-node-encryption -c http://localhost:8091 -u Administrator -p password --enable Turned on encryption for node: http://172.23.106.134:8091 Turned on encryption for node: http://172.23.106.136:8091 Turned on encryption for node: http://172.23.106.137:8091 Turned on encryption for node: http://172.23.106.138:8091 Turned on encryption for node: http://172.23.120.58:8091 Turned on encryption for node: http://172.23.120.73:8091 Turned on encryption for node: http://172.23.120.74:8091 Turned on encryption for node: http://172.23.120.75:8091 Turned on encryption for node: http://172.23.120.77:8091 Turned on encryption for node: http://172.23.120.81:8091 Turned on encryption for node: http://172.23.120.86:8091 Turned on encryption for node: http://172.23.121.118:8091 Turned on encryption for node: http://172.23.121.77:8091 Turned on encryption for node: http://172.23.123.24:8091 Turned on encryption for node: http://172.23.123.25:8091 Turned on encryption for node: http://172.23.123.26:8091 Turned on encryption for node: http://172.23.123.31:8091 Turned on encryption for node: http://172.23.123.32:8091 Turned on encryption for node: http://172.23.123.33:8091 Turned on encryption for node: http://172.23.96.122:8091 Turned on encryption for node: http://172.23.96.14:8091 Turned on encryption for node: http://172.23.96.243:8091 Turned on encryption for node: http://172.23.97.105:8091 Turned on encryption for node: http://172.23.97.148:8091 Turned on encryption for node: http://172.23.97.149:8091 Turned on encryption for node: http://172.23.97.150:8091 Turned on encryption for node: http://172.23.97.151:8091 SUCCESS: Switched node-to-node encryption on [root@localhost logs]# /opt/couchbase/bin/couchbase-cli setting-security -c http://localhost:8091 -u Administrator -p password --set --cluster-encryption-level strict SUCCESS: Security settings updated [root@localhost logs]# {noformat} At this point I noticed that Rebalance button was enabled. Not sure why this was enabled as we had nothing to rebalance afaik(this is tracked through +ns_1@172.23.106.136 5:47:32 AM 14 Sep, 2021 + {noformat} Rebalance exited with reason {service_rebalance_failed,eventing, {worker_died, {'EXIT',<0.29362.951>, {rebalance_failed, {service_error, <<"eventing rebalance hasn't made progress for past 1200 secs">>}}}}}. Rebalance Operation Id = f38dda25a6ccfe31114529f5d88b0753 {noformat} cbcollect_info attached. This the first time we are running this system test upgrade on 7.0.2, hence there is no baseline as such and no last working build. |
+Steps to Repro+
1. Run the following longevity script on 6.6.3 for 5 days. {noformat} ./sequoia -client 172.23.104.254:2375 -provider file:centos_second_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.3-9808 -skip_setup=true -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true {noformat} At this point it should have a 27 node cluster ( 9 Kv, 6 Index, 3 analytics, 3 fts, 3 eventing and 3 n1ql) 2. Create 10k metakv tombstones. This has been part of our testing since {noformat} #!/bin/sh for i in {0..10000} do `curl -X PUT -u Administrator:password http://localhost:8091/_metakv/key{$i} -d 'value=foo1'` `curl -X DELETE -v -u Administrator:password http://localhost:8091/_metakv/key{$i}` done {noformat} 3. Swap rebalance 6 nodes , 1 of each service with that of 7.0.2 nodes. Rebalance goes through successfully. 4. Failover 6 nodes(6.6.3 nodes)1 of each service(kv is graceful failover), Upgrade these nodes to 7.0.2, do a recovery of all the 6 node(kv is delta recovery) and rebalance. 5. Repeat step no 4, until all the nodes in cluster are upgrade. After upgrade I validated that all the metakv tombstones are purged and enabled IPv4 only + enforce tls using the below commands. {noformat} [root@localhost logs]# grep 'ns_config tombstone' debug.log [ns_server:debug,2021-09-14T04:26:34.722-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 11869 ns_config tombstone(s) up to timestamp 63798837690. Tombstones: [ns_server:debug,2021-09-14T04:30:38.296-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798837934. Tombstones: [ns_server:debug,2021-09-14T04:31:41.731-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 127 ns_config tombstone(s) up to timestamp 63798837998. Tombstones: [ns_server:debug,2021-09-14T04:39:42.419-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838481. Tombstones: [ns_server:debug,2021-09-14T04:40:43.040-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838542. Tombstones: [root@localhost logs]# [root@localhost logs]# /opt/couchbase/bin/couchbase-cli ip-family -c http://localhost:8091 -u Administrator -p password --set --ipv4only Switched IP family for node: http://172.23.106.134:8091 Switched IP family for node: http://172.23.106.136:8091 Switched IP family for node: http://172.23.106.137:8091 Switched IP family for node: http://172.23.106.138:8091 Switched IP family for node: http://172.23.120.58:8091 Switched IP family for node: http://172.23.120.73:8091 Switched IP family for node: http://172.23.120.74:8091 Switched IP family for node: http://172.23.120.75:8091 Switched IP family for node: http://172.23.120.77:8091 Switched IP family for node: http://172.23.120.81:8091 Switched IP family for node: http://172.23.120.86:8091 Switched IP family for node: http://172.23.121.118:8091 Switched IP family for node: http://172.23.121.77:8091 Switched IP family for node: http://172.23.123.24:8091 Switched IP family for node: http://172.23.123.25:8091 Switched IP family for node: http://172.23.123.26:8091 Switched IP family for node: http://172.23.123.31:8091 Switched IP family for node: http://172.23.123.32:8091 Switched IP family for node: http://172.23.123.33:8091 Switched IP family for node: http://172.23.96.122:8091 Switched IP family for node: http://172.23.96.14:8091 Switched IP family for node: http://172.23.96.243:8091 Switched IP family for node: http://172.23.97.105:8091 Switched IP family for node: http://172.23.97.148:8091 Switched IP family for node: http://172.23.97.149:8091 Switched IP family for node: http://172.23.97.150:8091 Switched IP family for node: http://172.23.97.151:8091 SUCCESS: Switched IP family of the cluster [root@localhost logs]# /opt/couchbase/bin/couchbase-cli node-to-node-encryption -c http://localhost:8091 -u Administrator -p password --enable Turned on encryption for node: http://172.23.106.134:8091 Turned on encryption for node: http://172.23.106.136:8091 Turned on encryption for node: http://172.23.106.137:8091 Turned on encryption for node: http://172.23.106.138:8091 Turned on encryption for node: http://172.23.120.58:8091 Turned on encryption for node: http://172.23.120.73:8091 Turned on encryption for node: http://172.23.120.74:8091 Turned on encryption for node: http://172.23.120.75:8091 Turned on encryption for node: http://172.23.120.77:8091 Turned on encryption for node: http://172.23.120.81:8091 Turned on encryption for node: http://172.23.120.86:8091 Turned on encryption for node: http://172.23.121.118:8091 Turned on encryption for node: http://172.23.121.77:8091 Turned on encryption for node: http://172.23.123.24:8091 Turned on encryption for node: http://172.23.123.25:8091 Turned on encryption for node: http://172.23.123.26:8091 Turned on encryption for node: http://172.23.123.31:8091 Turned on encryption for node: http://172.23.123.32:8091 Turned on encryption for node: http://172.23.123.33:8091 Turned on encryption for node: http://172.23.96.122:8091 Turned on encryption for node: http://172.23.96.14:8091 Turned on encryption for node: http://172.23.96.243:8091 Turned on encryption for node: http://172.23.97.105:8091 Turned on encryption for node: http://172.23.97.148:8091 Turned on encryption for node: http://172.23.97.149:8091 Turned on encryption for node: http://172.23.97.150:8091 Turned on encryption for node: http://172.23.97.151:8091 SUCCESS: Switched node-to-node encryption on [root@localhost logs]# /opt/couchbase/bin/couchbase-cli setting-security -c http://localhost:8091 -u Administrator -p password --set --cluster-encryption-level strict SUCCESS: Security settings updated [root@localhost logs]# {noformat} At this point I noticed that Rebalance button was enabled. Not sure why this was enabled as we had nothing to rebalance afaik(this is tracked through +ns_1@172.23.106.136 5:47:32 AM 14 Sep, 2021+ {noformat} Rebalance exited with reason {service_rebalance_failed,eventing, {worker_died, {'EXIT',<0.29362.951>, {rebalance_failed, {service_error, <<"eventing rebalance hasn't made progress for past 1200 secs">>}}}}}. Rebalance Operation Id = f38dda25a6ccfe31114529f5d88b0753 {noformat} cbcollect_info attached. This the first time we are running this system test upgrade on 7.0.2, hence there is no baseline as such and no last working build. |
Description |
+Steps to Repro+
1. Run the following longevity script on 6.6.3 for 5 days. {noformat} ./sequoia -client 172.23.104.254:2375 -provider file:centos_second_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.3-9808 -skip_setup=true -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true {noformat} At this point it should have a 27 node cluster ( 9 Kv, 6 Index, 3 analytics, 3 fts, 3 eventing and 3 n1ql) 2. Create 10k metakv tombstones. This has been part of our testing since {noformat} #!/bin/sh for i in {0..10000} do `curl -X PUT -u Administrator:password http://localhost:8091/_metakv/key{$i} -d 'value=foo1'` `curl -X DELETE -v -u Administrator:password http://localhost:8091/_metakv/key{$i}` done {noformat} 3. Swap rebalance 6 nodes , 1 of each service with that of 7.0.2 nodes. Rebalance goes through successfully. 4. Failover 6 nodes(6.6.3 nodes)1 of each service(kv is graceful failover), Upgrade these nodes to 7.0.2, do a recovery of all the 6 node(kv is delta recovery) and rebalance. 5. Repeat step no 4, until all the nodes in cluster are upgrade. After upgrade I validated that all the metakv tombstones are purged and enabled IPv4 only + enforce tls using the below commands. {noformat} [root@localhost logs]# grep 'ns_config tombstone' debug.log [ns_server:debug,2021-09-14T04:26:34.722-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 11869 ns_config tombstone(s) up to timestamp 63798837690. Tombstones: [ns_server:debug,2021-09-14T04:30:38.296-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798837934. Tombstones: [ns_server:debug,2021-09-14T04:31:41.731-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 127 ns_config tombstone(s) up to timestamp 63798837998. Tombstones: [ns_server:debug,2021-09-14T04:39:42.419-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838481. Tombstones: [ns_server:debug,2021-09-14T04:40:43.040-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838542. Tombstones: [root@localhost logs]# [root@localhost logs]# /opt/couchbase/bin/couchbase-cli ip-family -c http://localhost:8091 -u Administrator -p password --set --ipv4only Switched IP family for node: http://172.23.106.134:8091 Switched IP family for node: http://172.23.106.136:8091 Switched IP family for node: http://172.23.106.137:8091 Switched IP family for node: http://172.23.106.138:8091 Switched IP family for node: http://172.23.120.58:8091 Switched IP family for node: http://172.23.120.73:8091 Switched IP family for node: http://172.23.120.74:8091 Switched IP family for node: http://172.23.120.75:8091 Switched IP family for node: http://172.23.120.77:8091 Switched IP family for node: http://172.23.120.81:8091 Switched IP family for node: http://172.23.120.86:8091 Switched IP family for node: http://172.23.121.118:8091 Switched IP family for node: http://172.23.121.77:8091 Switched IP family for node: http://172.23.123.24:8091 Switched IP family for node: http://172.23.123.25:8091 Switched IP family for node: http://172.23.123.26:8091 Switched IP family for node: http://172.23.123.31:8091 Switched IP family for node: http://172.23.123.32:8091 Switched IP family for node: http://172.23.123.33:8091 Switched IP family for node: http://172.23.96.122:8091 Switched IP family for node: http://172.23.96.14:8091 Switched IP family for node: http://172.23.96.243:8091 Switched IP family for node: http://172.23.97.105:8091 Switched IP family for node: http://172.23.97.148:8091 Switched IP family for node: http://172.23.97.149:8091 Switched IP family for node: http://172.23.97.150:8091 Switched IP family for node: http://172.23.97.151:8091 SUCCESS: Switched IP family of the cluster [root@localhost logs]# /opt/couchbase/bin/couchbase-cli node-to-node-encryption -c http://localhost:8091 -u Administrator -p password --enable Turned on encryption for node: http://172.23.106.134:8091 Turned on encryption for node: http://172.23.106.136:8091 Turned on encryption for node: http://172.23.106.137:8091 Turned on encryption for node: http://172.23.106.138:8091 Turned on encryption for node: http://172.23.120.58:8091 Turned on encryption for node: http://172.23.120.73:8091 Turned on encryption for node: http://172.23.120.74:8091 Turned on encryption for node: http://172.23.120.75:8091 Turned on encryption for node: http://172.23.120.77:8091 Turned on encryption for node: http://172.23.120.81:8091 Turned on encryption for node: http://172.23.120.86:8091 Turned on encryption for node: http://172.23.121.118:8091 Turned on encryption for node: http://172.23.121.77:8091 Turned on encryption for node: http://172.23.123.24:8091 Turned on encryption for node: http://172.23.123.25:8091 Turned on encryption for node: http://172.23.123.26:8091 Turned on encryption for node: http://172.23.123.31:8091 Turned on encryption for node: http://172.23.123.32:8091 Turned on encryption for node: http://172.23.123.33:8091 Turned on encryption for node: http://172.23.96.122:8091 Turned on encryption for node: http://172.23.96.14:8091 Turned on encryption for node: http://172.23.96.243:8091 Turned on encryption for node: http://172.23.97.105:8091 Turned on encryption for node: http://172.23.97.148:8091 Turned on encryption for node: http://172.23.97.149:8091 Turned on encryption for node: http://172.23.97.150:8091 Turned on encryption for node: http://172.23.97.151:8091 SUCCESS: Switched node-to-node encryption on [root@localhost logs]# /opt/couchbase/bin/couchbase-cli setting-security -c http://localhost:8091 -u Administrator -p password --set --cluster-encryption-level strict SUCCESS: Security settings updated [root@localhost logs]# {noformat} At this point I noticed that Rebalance button was enabled. Not sure why this was enabled as we had nothing to rebalance afaik(this is tracked through +ns_1@172.23.106.136 5:47:32 AM 14 Sep, 2021+ {noformat} Rebalance exited with reason {service_rebalance_failed,eventing, {worker_died, {'EXIT',<0.29362.951>, {rebalance_failed, {service_error, <<"eventing rebalance hasn't made progress for past 1200 secs">>}}}}}. Rebalance Operation Id = f38dda25a6ccfe31114529f5d88b0753 {noformat} cbcollect_info attached. This the first time we are running this system test upgrade on 7.0.2, hence there is no baseline as such and no last working build. |
+Steps to Repro+
1. Run the following longevity script on 6.6.3 for 5 days. {noformat} ./sequoia -client 172.23.104.254:2375 -provider file:centos_second_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.3-9808 -skip_setup=true -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true {noformat} At this point it should have a 27 node cluster ( 9 Kv, 6 Index, 3 analytics, 3 fts, 3 eventing and 3 n1ql) 2. Create 10k metakv tombstones. This has been part of our testing since {noformat} #!/bin/sh for i in {0..10000} do `curl -X PUT -u Administrator:password http://localhost:8091/_metakv/key{$i} -d 'value=foo1'` `curl -X DELETE -v -u Administrator:password http://localhost:8091/_metakv/key{$i}` done {noformat} 3. Swap rebalance 6 nodes , 1 of each service with that of 7.0.2 nodes. Rebalance goes through successfully. 4. Failover 6 nodes(6.6.3 nodes)1 of each service(kv is graceful failover), Upgrade these nodes to 7.0.2, do a recovery of all the 6 node(kv is delta recovery) and rebalance. 5. Repeat step no 4 until all the nodes in cluster are upgraded to 7.0.2. After upgrade I validated that all the metakv tombstones are purged and enabled IPv4 only + enforce tls using the below commands. {noformat} [root@localhost logs]# grep 'ns_config tombstone' debug.log [ns_server:debug,2021-09-14T04:26:34.722-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 11869 ns_config tombstone(s) up to timestamp 63798837690. Tombstones: [ns_server:debug,2021-09-14T04:30:38.296-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798837934. Tombstones: [ns_server:debug,2021-09-14T04:31:41.731-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 127 ns_config tombstone(s) up to timestamp 63798837998. Tombstones: [ns_server:debug,2021-09-14T04:39:42.419-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838481. Tombstones: [ns_server:debug,2021-09-14T04:40:43.040-07:00,ns_1@172.23.106.134:tombstone_agent<0.984.0>:tombstone_agent:purge:195]Purged 1 ns_config tombstone(s) up to timestamp 63798838542. Tombstones: [root@localhost logs]# [root@localhost logs]# /opt/couchbase/bin/couchbase-cli ip-family -c http://localhost:8091 -u Administrator -p password --set --ipv4only Switched IP family for node: http://172.23.106.134:8091 Switched IP family for node: http://172.23.106.136:8091 Switched IP family for node: http://172.23.106.137:8091 Switched IP family for node: http://172.23.106.138:8091 Switched IP family for node: http://172.23.120.58:8091 Switched IP family for node: http://172.23.120.73:8091 Switched IP family for node: http://172.23.120.74:8091 Switched IP family for node: http://172.23.120.75:8091 Switched IP family for node: http://172.23.120.77:8091 Switched IP family for node: http://172.23.120.81:8091 Switched IP family for node: http://172.23.120.86:8091 Switched IP family for node: http://172.23.121.118:8091 Switched IP family for node: http://172.23.121.77:8091 Switched IP family for node: http://172.23.123.24:8091 Switched IP family for node: http://172.23.123.25:8091 Switched IP family for node: http://172.23.123.26:8091 Switched IP family for node: http://172.23.123.31:8091 Switched IP family for node: http://172.23.123.32:8091 Switched IP family for node: http://172.23.123.33:8091 Switched IP family for node: http://172.23.96.122:8091 Switched IP family for node: http://172.23.96.14:8091 Switched IP family for node: http://172.23.96.243:8091 Switched IP family for node: http://172.23.97.105:8091 Switched IP family for node: http://172.23.97.148:8091 Switched IP family for node: http://172.23.97.149:8091 Switched IP family for node: http://172.23.97.150:8091 Switched IP family for node: http://172.23.97.151:8091 SUCCESS: Switched IP family of the cluster [root@localhost logs]# /opt/couchbase/bin/couchbase-cli node-to-node-encryption -c http://localhost:8091 -u Administrator -p password --enable Turned on encryption for node: http://172.23.106.134:8091 Turned on encryption for node: http://172.23.106.136:8091 Turned on encryption for node: http://172.23.106.137:8091 Turned on encryption for node: http://172.23.106.138:8091 Turned on encryption for node: http://172.23.120.58:8091 Turned on encryption for node: http://172.23.120.73:8091 Turned on encryption for node: http://172.23.120.74:8091 Turned on encryption for node: http://172.23.120.75:8091 Turned on encryption for node: http://172.23.120.77:8091 Turned on encryption for node: http://172.23.120.81:8091 Turned on encryption for node: http://172.23.120.86:8091 Turned on encryption for node: http://172.23.121.118:8091 Turned on encryption for node: http://172.23.121.77:8091 Turned on encryption for node: http://172.23.123.24:8091 Turned on encryption for node: http://172.23.123.25:8091 Turned on encryption for node: http://172.23.123.26:8091 Turned on encryption for node: http://172.23.123.31:8091 Turned on encryption for node: http://172.23.123.32:8091 Turned on encryption for node: http://172.23.123.33:8091 Turned on encryption for node: http://172.23.96.122:8091 Turned on encryption for node: http://172.23.96.14:8091 Turned on encryption for node: http://172.23.96.243:8091 Turned on encryption for node: http://172.23.97.105:8091 Turned on encryption for node: http://172.23.97.148:8091 Turned on encryption for node: http://172.23.97.149:8091 Turned on encryption for node: http://172.23.97.150:8091 Turned on encryption for node: http://172.23.97.151:8091 SUCCESS: Switched node-to-node encryption on [root@localhost logs]# /opt/couchbase/bin/couchbase-cli setting-security -c http://localhost:8091 -u Administrator -p password --set --cluster-encryption-level strict SUCCESS: Security settings updated [root@localhost logs]# {noformat} At this point I noticed that Rebalance button was enabled. Not sure why this was enabled as we had nothing to rebalance afaik(this is tracked through +ns_1@172.23.106.136 5:47:32 AM 14 Sep, 2021+ {noformat} Rebalance exited with reason {service_rebalance_failed,eventing, {worker_died, {'EXIT',<0.29362.951>, {rebalance_failed, {service_error, <<"eventing rebalance hasn't made progress for past 1200 secs">>}}}}}. Rebalance Operation Id = f38dda25a6ccfe31114529f5d88b0753 {noformat} cbcollect_info attached. This the first time we are running this system test upgrade on 7.0.2, hence there is no baseline as such and no last working build. |
Assignee | Balakumaran Gopal [ balakumaran.gopal ] | Jeelan Poola [ jeelan.poola ] |
Labels | system_test_upgrade upgrade |
Assignee | Jeelan Poola [ jeelan.poola ] | Ankit Prabhu [ ankit.prabhu ] |
Assignee | Ankit Prabhu [ ankit.prabhu ] | Abhishek Jindal [ abhishek.jindal ] |
Summary | [System Test] - Eventing rebalance hangs post 6.6.3 -> 7.0.2 upgrade | [System Test] - Eventing rebalance fails after 1200secs consistently post enforce-tls unless functions are paused/resumed |
Assignee | Abhishek Jindal [ abhishek.jindal ] | Balakumaran Gopal [ balakumaran.gopal ] |
VERIFICATION STEPS | Dup of MB-48334 | |
Resolution | Duplicate [ 3 ] | |
Status | Open [ 1 ] | Resolved [ 5 ] |
Status | Resolved [ 5 ] | Closed [ 6 ] |
Noticed errors like these on the eventing log
172.23.96.122
2021-09-14T07:15:56.942-07:00 [Error] Consumer::getOpCallback [worker_timer_op_1:/tmp/127.0.0.1:8091_1_1550114910.sock:9871] Bucket fetch failed for key: <ud>eventing::1550114910::timer_op::vb::797</ud>, err: unambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"unambiguous timeout"}},"OperationID":"Get","Opaque":"0x0","TimeObserved":2500111201,"RetryReasons":null,"RetryAttempts":0,"LastDispatchedTo":"","LastDispatchedFrom":"","LastConnectionID":""}
2021-09-14T07:15:56.942-07:00 [Error] Consumer::getOpCallback [worker_timer_op_0:/tmp/127.0.0.1:8091_0_1550114910.sock:9887] Bucket fetch failed for key: <ud>eventing::1550114910::timer_op::vb::683</ud>, err: unambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"unambiguous timeout"}},"OperationID":"Get","Opaque":"0x0","TimeObserved":2500136088,"RetryReasons":null,"RetryAttempts":0,"LastDispatchedTo":"","LastDispatchedFrom":"","LastConnectionID":""}
2021-09-14T07:15:56.942-07:00 [Error] Consumer::addOwnershipHistorySECallback [worker_bucket_op_curl_2:/tmp/127.0.0.1:8091_2_1030245617.sock:9793] Key: eventing::1030245617::bucket_op_curl::vb::944, subdoc operation failed while performing ownership entry app post STREAMEND, err: ambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"ambiguous timeout"}},"OperationID":"MutateIn","Opaque":"0x0","TimeObserved":2500086758,"RetryReasons":null,"RetryAttempts":0,"LastDispatchedTo":"","LastDispatchedFrom":"","LastConnectionID":""}