Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.6.2
-
6.6.2-9588 ----> 7.0.0-4979
-
Untriaged
-
1
-
Unknown
-
CX Sprint 251
Description
Steps to Repro
It is an essentially an upgrade of the system test cluster.
1. Start a 6.6.2 system test longevity run.
2. It has following cluster setup
- 9 data nodes
- 3 analytics nodes
- 3 eventing nodes
- 4 indexing nodes
- 3 search nodes
- 3 query nodes
3. It has 10 buckets, fts indexes, analytics datasets, 2i indexes, eventing functions.
4. We do a swap rebalance of 6 node(1 data, 1 index, 1 analytics, 1 fts, 1 query, 1 eventing) which has 6.6.2-9588 with 7.0.0-4979. This woks fine.
5. Failover one fts node 6.6.2-9588 - 172.23.106.207
6. Failover one n1ql node 6.6.2-9588 - 172.23.106.191
7. Now try to graceful failover one 6.6.2-9588 - 172.23.105.90
8. Now I hit into MB-45767 and later MB-45769.
However at this point to unblock myself and to complete the upgrade of the entire cluster I decided to do offline upgrade(rpm -U http://172.23.126.166/builds/latestbuilds/couchbase-server/cheshire-cat/4979/couchbase-server-enterprise-7.0.0-4979-centos7.x86_64.rpm) of all the remaining 6.6.2 nodes in the cluster one after the another.
Things worked fine until I did an offline upgrade of the last 6.6.2 node in the cluster(172.23.104.15), then except 172.23.104.15 and one more node 172.23.104.244 every other node went down and then are not accessible now.
172.23.105.61 had lot of cbas exits
2021-04-20 02:47:46,377 - systestmon - WARNING - *** 192 occurences of exited with status keyword found on 172.23.105.61 ***
|
2021-04-20 02:47:46,377 - systestmon - DEBUG - [user:info,2021-04-20T02:30:05.659-07:00,ns_1@172.23.105.61:<0.611.0>:ns_log:crash_consumption_loop:63]Service 'cbas' exited with status 2. Restarting. Messages:
|
2021-04-20 02:47:46,377 - systestmon - DEBUG - [user:info,2021-04-20T02:30:11.036-07:00,ns_1@172.23.105.61:<0.611.0>:ns_log:crash_consumption_loop:63]Service 'cbas' exited with status 2. Restarting. Messages:
|
Stack trace - ns_1@172.23.105.61 - 3:24:46 AM 20 Apr, 2021
Service 'cbas' exited with status 2. Restarting. Messages:
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/net/http/transport.go:1575 +0xb0d
|
|
goroutine 49 [select]:
|
net/http.(*persistConn).writeLoop(0xc00018c000)
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/net/http/transport.go:2205 +0x123
|
created by net/http.(*Transport).dialConn
|
/home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/net/http/transport.go:1576 +0xb32
|
|
*** end; calling os.Exit()...
|
|
panic: error setRequestAuth(): Unable to find given hostport in cbauth database: `172.23.105.62:8095'
|
|
goroutine 1 [running]:
|
github.com/couchbase/clog.Panicf(0x9a63a5, 0x1a, 0xc000100bc0, 0x1, 0x1)
|
/tmp/workspace/couchbase-server-unix/godeps/src/github.com/couchbase/clog/clog.go:362 +0xec
|
main.(*Mgr).isNodeAuthorized(0xc00035a000, 0x7fff070827ad, 0x20, 0xc00010d000)
|
goproj/src/github.com/couchbase/cbas/cbas/manager.go:1509 +0x8ee
|
main.main.func8()
|
goproj/src/github.com/couchbase/cbas/cbas/start.go:346 +0x11c
|
main.startingLock(0x9c24b8)
|
goproj/src/github.com/couchbase/cbas/cbas/start.go:458 +0x76
|
main.main()
|
goproj/src/github.com/couchbase/cbas/cbas/start.go:345 +0x17f0
|
Have attached the cbcollects from only 3 nodes- Unable to collect from the UI for the other nodes except 2.
172.23.104.15 - one of the node thats up
172.23.104.244 - another one thats up
172.23.105.61 - This is down, but manually collected by logging in as this had some 192 exits of cbas.
Attachments
Issue Links
- is a backport of
-
MB-45799 [Upgrade] - Service 'cbas' exited with status 2. Restarting. Messages:
- Closed