Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
6.6.2-9588 -> 7.0.0-5275
-
Untriaged
-
Centos 64-bit
-
1
-
No
Description
Script to Repro
1. Run the following 6.6.2 longevity test for 3-4 days. We will have 27 node cluster at the end of it.
./sequoia -client 172.23.96.162:2375 -provider file:centos_third_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.2-9588 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
|
2. Swap rebalance 6(1 of each service) 6.6.2 nodes with 7.0.0 nodes.
3. Graceful failover 6 node (1 of each service), upgrade, do a recovery and start rebalance.
Somewhere between steps 2 and 3, I noticed the following panic.
ns_log 000ns_1@172.23.106.54 4:15:35 AM 7 Jun, 2021
Service 'goxdcr' exited with status 2. Restarting. Messages:
|
2021-06-07T04:15:33.998-07:00 INFO GOXDCR.GenericSupervisor: Adding child AdminportSupervisor to supervisor ReplicationManagerSupervisor
|
2021-06-07T04:15:33.998-07:00 INFO GOXDCR.ReplMgr: ReplicationManager is running
|
2021-06-07T04:15:33.998-07:00 INFO GOXDCR.HttpServer: [xdcr:127.0.0.1:9998] new http server xdcr 127.0.0.1:9998 /
|
2021-06-07T04:15:33.998-07:00 INFO GOXDCR.AdminPort: http server started 127.0.0.1:9998 !
|
2021-06-07T04:15:33.998-07:00 INFO GOXDCR.HttpServer: [xdcr:127.0.0.1:9998] starting ...
|
panic: Cannot continue without retrieving remote cluster reference
|
|
goroutine 15 [running]:
|
github.com/couchbase/goxdcr/metadata_svc.(*CollectionsManifestAgent).populateRemoteClusterRefOnce(0xc0001c6340, 0x1, 0x0)
|
/tmp/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/collections_manifest_service.go:724 +0x12b
|
created by github.com/couchbase/goxdcr/metadata_svc.(*CollectionsManifestAgent).Start
|
/tmp/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/collections_manifest_service.go:757 +0x353
|
cbcollect_info attached.
Attachments
For Gerrit Dashboard: MB-46771 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
155245,3 | MB-46771 - increase timeout due to remote bootstrap node not responding before forcing a panic restart | master | goxdcr | Status: MERGED | +2 | +1 |
155268,2 | MB-46771 - increase timeout due to remote bootstrap node not responding before forcing a panic restart | cheshire-cat | goxdcr | Status: MERGED | +2 | +1 |