Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
6.6.2-9588 -> 7.0.0-5275
-
Untriaged
-
Centos 64-bit
-
1
-
No
Description
Script to Repro
1. Run the following 6.6.2 longevity test for 3-4 days. We will have 27 node cluster at the end of it.
./sequoia -client 172.23.96.162:2375 -provider file:centos_third_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.2-9588 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
|
2. Run the script [^create_drop.sh] on 6.6.2 nodes on the cluster. And this was run on the 7.0.0 nodes as well that will be brought into the cluster using swap rebalance for upgrade
3. Swap rebalance 6(1 of each service) 6.6.2 nodes with 7.0.0 nodes.
4. Graceful failover 6 node (1 of each service), upgrade, do a recovery and start rebalance.
5. Graceful failover 6 node (1 of each service), upgrade, do a recovery and start rebalance.
Rebalance at step 5 failed. This is tracked by bug MB-46778. If the following errors are related to the same we can close this bug off.
172.23.106.239 : fts
/opt/couchbase/var/lib/couchbase/logs/fts.log.8.gz:2021-06-07T11:21:42.392-07:00 [WARN] ns_server: retrieve partition seqs: gocouchbase_utils: CouchbaseBucket connection failed, server: http://127.0.0.1:8091, poolName: default, bucketName: default, sourceParams: "{}", err: Get http://127.0.0.1:8091/pools: net/http: request canceled (Client.Timeout exceeded while awaiting headers), please check that your authUser and authPassword are correct and that your couchbase cluster ("http://127.0.0.1:8091") is available -- cbft.RunSourcePartitionSeqs() at ns_server.go:1118
|
/opt/couchbase/var/lib/couchbase/logs/fts.log.8.gz:2021-06-07T11:22:03.539-07:00 [WARN] ns_server: retrieve partition seqs: gocouchbase_utils: CouchbaseBucket connection failed, server: http://127.0.0.1:8091, poolName: default, bucketName: default, sourceParams: "{}", err: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connect: connection refused, please check that your authUser and authPassword are correct and that your couchbase cluster ("http://127.0.0.1:8091") is available -- cbft.RunSourcePartitionSeqs() at ns_server.go:1118
|
ns_1@172.23.106.239 1:52:11 PM 7 Jun, 2021
Service 'fts' exited with status 137. Restarting. Messages:
|
2021-06-07T13:52:06.955-07:00 [WARN] feed_dcp_gocouchbase: OnError, name: good_state_7871405847e4f01f_6ddbfb54, bucketName: default, bucketUUID: , err: pkt.Receive, err: EOF -- cbgt.(*DCPFeed).OnError() at feed_dcp_gocouchbase.go:393
|
2021-06-07T13:52:06.956-07:00 [INFO] cbdatasource: receiver closed, server: 172.23.105.61:11210, name: fts:good_state_7871405847e4f01f_6ddbfb54-77c5312a, traces: vb: 0 => 94 (3x);
|
2021-06-07T13:52:06.956-07:00 [WARN] feed_dcp_gocouchbase: OnError, name: social_1ecade980c50afc4_6ddbfb54, bucketName: default, bucketUUID: , err: pkt.Receive, err: EOF -- cbgt.(*DCPFeed).OnError() at feed_dcp_gocouchbase.go:393
|
2021-06-07T13:52:06.956-07:00 [INFO] cbdatasource: receiver closed, server: 172.23.105.102:11210, name: fts:social_1ecade980c50afc4_6ddbfb54-33f03d9a, traces: vb: 0 => 94 (3x);
|
2021-06-07T13:52:06.956-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: pkt.Receive, err: EOF
|
2021-06-07T13:52:06.956-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: pkt.Receive, err: EOF
|
2021-06-07T13:52:06.956-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: pkt.Receive, err: EOF
|
Err EOF
|
Err EOF
|
2021-06-07T13:52:06.790-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: pkt.Receive, err: EOF
|
ns_1@172.23.106.239 11:20:28 AM 7 Jun, 2021
Service 'fts' exited with status 137. Restarting. Messages:
|
2021-06-07T11:20:20.311-07:00 [INFO] cbdatasource: receiver closed, server: 172.23.104.244:11210, name: fts:social_1ecade980c50afc4_f4e0a48a-373647b5, traces: vb: 0 => 94 (3x);
|
2021-06-07T11:20:21.166-07:00 [INFO] cbdatasource: receiver closed, server: 172.23.104.245:11210, name: fts:social_1ecade980c50afc4_6ddbfb54-754d6aff, traces: vb: 0 => 94 (3x);
|
2021-06-07T11:20:20.909-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: pkt.Receive, err: EOF
|
2021-06-07T11:20:21.188-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: pkt.Receive, err: EOF
|
2021-06-07T11:20:19.871-07:00 [WARN] feed_dcp_gocouchbase: OnError, name: social_1ecade980c50afc4_f4e0a48a, bucketName: default, bucketUUID: , err: pkt.Receive, err: EOF -- cbgt.(*DCPFeed).OnError() at feed_dcp_gocouchbase.go:393
|
2021-06-07T11:20:23.891-07:00 [INFO] cbdatasource: receiver closed, server: 172.23.104.214:11210, name: fts:social_1ecade980c50afc4_f4e0a48a-373647b5, traces: vb: 0 => 94 (3x);
|
2021-06-07T11:20:20.197-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: pkt.Receive, err: EOF
|
Err EOF
|
cbcollect_info attached. This was not seen on last system test upgrade we had from 6.6.2-9588 -> 7.0.0-5226.