Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.0
-
Untriaged
-
1
-
Yes
Description
Build : 7.1.0-2543
Test : -test tests/integration/neo/test_neo_couchstore_milestone4.yml -scope tests/integration/neo/scope_couchstore.yml
Iteration : 1st and 2nd
Scale : 3
In the first iteration, there was a rebalance operation to perform a hard failover, full recovery and add back on a KV node 172.23.105.107. This rebalance operation took 12+ hrs to complete. As seen in the rebalance report (rebalance_report_20220403T014416.json attached):
"eventing" : {
|
"completedTime" : "2022-04-02T18:44:16.254-07:00",
|
"perNodeProgress" : {
|
"ns_1@172.23.104.67" : 1,
|
"ns_1@172.23.120.107" : 1,
|
"ns_1@172.23.96.192" : 1
|
},
|
"startTime" : "2022-04-02T06:36:55.090-07:00",
|
"timeTaken" : 43641164,
|
"totalProgress" : 100
|
}
|
Logs covering this occurrence :
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.104.137.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.104.155.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.104.157.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.104.5.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.104.67.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.104.69.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.104.70.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.105.107.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.105.111.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.105.168.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.106.100.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.106.188.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.108.103.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.120.107.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.120.245.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.121.117.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.123.28.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.96.148.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.96.192.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.96.251.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.96.252.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.96.253.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.97.119.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.97.121.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.97.122.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.97.239.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.99.20.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.99.21.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1648957940/collectinfo-2022-04-03T035222-ns_1%40172.23.99.25.zip
|
In the 2nd iteration, there is a rebalance operation currently in progress. There are 3 nodes (kv, index, query) which were simultaneously autofailed over (multi-node failover), and are consequently rebalanced in. This rebalance operation is ongoing for the last 5+ hrs. The Eventing phase is taking long here too (4+ hrs right now).
[2022-04-03T08:31:08-07:00, sequoiatools/cbutil:667091] /cbinit.py 172.23.106.100 root couchbase stop
|
[2022-04-03T08:31:28-07:00, sequoiatools/cbutil:7144a4] /cbinit.py 172.23.123.28 root couchbase stop
|
[2022-04-03T08:31:38-07:00, sequoiatools/cbutil:085d27] /cbinit.py 172.23.104.137 root couchbase stop
|
[2022-04-03T08:31:44-07:00, sequoiatools/cmd:70243a] 10
|
[2022-04-03T08:32:00-07:00, sequoiatools/couchbase-cli:7.1:d32689] rebalance -c 172.23.108.103:8091 -u Administrator -p password
|
[2022-04-03T08:59:30-07:00, sequoiatools/cmd:d26d5a] 60
|
[2022-04-03T09:00:36-07:00, sequoiatools/cmd:4c4e4c] 60
|
[2022-04-03T09:01:42-07:00, sequoiatools/cbutil:6f46a5] /cbinit.py 172.23.106.100,172.23.123.28,172.23.104.137 root couchbase start
|
[2022-04-03T09:01:49-07:00, sequoiatools/cmd:df307e] 120
|
[2022-04-03T09:03:55-07:00, sequoiatools/couchbase-cli:7.1:3d1e9a] server-add -c 172.23.108.103:8091 --server-add https://172.23.106.100 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
|
[2022-04-03T09:04:12-07:00, sequoiatools/couchbase-cli:7.1:ca0529] server-add -c 172.23.108.103:8091 --server-add https://172.23.123.28 -u Administrator -p password --server-add-username Administrator --server-add-password password --services index
|
[2022-04-03T09:04:25-07:00, sequoiatools/couchbase-cli:7.1:be3732] server-add -c 172.23.108.103:8091 --server-add https://172.23.104.137 -u Administrator -p password --server-add-username Administrator --server-add-password password --services query
|
→
|
|
Error occurred on container - sequoiatools/couchbase-cli:7.1:[server-add -c 172.23.108.103:8091 --server-add https://172.23.104.137 -u Administrator -p password --server-add-username Administrator --server-add-password password --services query]
|
|
docker logs be3732
|
docker start be3732
|
|
=ERROR: Prepare join failed. Node is already part of cluster.
|
[2022-04-03T09:04:32-07:00, sequoiatools/couchbase-cli:7.1:787fbc] rebalance -c 172.23.108.103:8091 -u Administrator -p password
|
The following set of logs were collected after around 1 hr of rebalance start. Eventing nodes are : 172.23.104.5, 172.23.104.67, 172.23.96.192
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.104.137.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.104.155.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.104.5.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.104.67.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.104.69.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.104.70.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.105.107.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.105.111.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.105.168.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.106.100.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.106.188.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.108.103.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.120.107.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.120.245.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.121.117.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.123.28.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.96.148.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.96.192.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.96.251.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.96.252.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.96.253.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.97.119.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.97.121.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.97.122.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.99.11.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.99.20.zip
|
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1649006126/collectinfo-2022-04-03T171529-ns_1%40172.23.99.25.zip
|
This is a regression since RC3 since this issue was never seen earlier in any of the builds in the longevity test.