Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46929

[BP 6.6.3][Upgrade] - Service 'cbas' exited with status 2. Restarting. Messages:

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown
    • CX Sprint 251

    Description

      Steps to Repro
      It is an essentially an upgrade of the system test cluster.

      1. Start a 6.6.2 system test longevity run.
      2. It has following cluster setup

      • 9 data nodes
      • 3 analytics nodes
      • 3 eventing nodes
      • 4 indexing nodes
      • 3 search nodes
      • 3 query nodes

      3. It has 10 buckets, fts indexes, analytics datasets, 2i indexes, eventing functions.
      4. We do a swap rebalance of 6 node(1 data, 1 index, 1 analytics, 1 fts, 1 query, 1 eventing) which has 6.6.2-9588 with 7.0.0-4979. This woks fine.
      5. Failover one fts node 6.6.2-9588 - 172.23.106.207
      6. Failover one n1ql node 6.6.2-9588 - 172.23.106.191
      7. Now try to graceful failover one 6.6.2-9588 - 172.23.105.90
      8. Now I hit into MB-45767 and later MB-45769.

      However at this point to unblock myself and to complete the upgrade of the entire cluster I decided to do offline upgrade(rpm -U http://172.23.126.166/builds/latestbuilds/couchbase-server/cheshire-cat/4979/couchbase-server-enterprise-7.0.0-4979-centos7.x86_64.rpm) of all the remaining 6.6.2 nodes in the cluster one after the another.

      Things worked fine until I did an offline upgrade of the last 6.6.2 node in the cluster(172.23.104.15), then except 172.23.104.15 and one more node 172.23.104.244 every other node went down and then are not accessible now.

      172.23.105.61 had lot of cbas exits

      2021-04-20 02:47:46,377 - systestmon - WARNING - *** 192 occurences of exited with status keyword found on 172.23.105.61 ***
      2021-04-20 02:47:46,377 - systestmon - DEBUG - [user:info,2021-04-20T02:30:05.659-07:00,ns_1@172.23.105.61:<0.611.0>:ns_log:crash_consumption_loop:63]Service 'cbas' exited with status 2. Restarting. Messages:
      2021-04-20 02:47:46,377 - systestmon - DEBUG - [user:info,2021-04-20T02:30:11.036-07:00,ns_1@172.23.105.61:<0.611.0>:ns_log:crash_consumption_loop:63]Service 'cbas' exited with status 2. Restarting. Messages:
      

      Stack trace - ns_1@172.23.105.61 - 3:24:46 AM 20 Apr, 2021

      Service 'cbas' exited with status 2. Restarting. Messages:
      /home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/net/http/transport.go:1575 +0xb0d
       
      goroutine 49 [select]:
      net/http.(*persistConn).writeLoop(0xc00018c000)
      /home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/net/http/transport.go:2205 +0x123
      created by net/http.(*Transport).dialConn
      /home/couchbase/.cbdepscache/exploded/x86_64/go-1.13.7/go/src/net/http/transport.go:1576 +0xb32
       
      *** end; calling os.Exit()...
       
      panic: error setRequestAuth(): Unable to find given hostport in cbauth database: `172.23.105.62:8095'
       
      goroutine 1 [running]:
      github.com/couchbase/clog.Panicf(0x9a63a5, 0x1a, 0xc000100bc0, 0x1, 0x1)
      /tmp/workspace/couchbase-server-unix/godeps/src/github.com/couchbase/clog/clog.go:362 +0xec
      main.(*Mgr).isNodeAuthorized(0xc00035a000, 0x7fff070827ad, 0x20, 0xc00010d000)
      goproj/src/github.com/couchbase/cbas/cbas/manager.go:1509 +0x8ee
      main.main.func8()
      goproj/src/github.com/couchbase/cbas/cbas/start.go:346 +0x11c
      main.startingLock(0x9c24b8)
      goproj/src/github.com/couchbase/cbas/cbas/start.go:458 +0x76
      main.main()
      goproj/src/github.com/couchbase/cbas/cbas/start.go:345 +0x17f0
      

      Have attached the cbcollects from only 3 nodes- Unable to collect from the UI for the other nodes except 2.

      172.23.104.15 - one of the node thats up
      172.23.104.244 - another one thats up
      172.23.105.61 - This is down, but manually collected by logging in as this had some 192 exits of cbas.

      Cluster is messed up at this point in time and looks like

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty