Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44374

[System Test] : query service (cbq-engine) crashes with status 1

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Critical
    • 7.0.0
    • Cheshire-Cat
    • query

    Description

      Build : 7.0.0-4454
      Test : -test tests/integration/cheshirecat/test_cheshirecat_kv_gsi_coll_xdcr_backup_sgw_fts_itemct_txns_eventing.yml -scope tests/integration/cheshirecat/scope_cheshirecat_with_backup.yml
      Scale : 2
      Iteration : 1st

      On 172.23.97.149, cbq-engine crashed 1 after restarting after being OOM killed. From the diag logs, sequence of events for 172.23.97.149 between 2021-02-14T18:13:48 & 2021-02-14T18:14:01

      021-02-14T18:13:48.041-08:00, mb_master:0:info:message(ns_1@172.23.97.149) - Haven't heard from a higher priority node or a master, so I'm taking over.
      2021-02-14T18:13:48.554-08:00, auto_failover:0:info:message(ns_1@172.23.97.149) - Enabled auto-failover with timeout 120 and max count 1
      2021-02-14T18:13:51.748-08:00, ns_log:0:info:message(ns_1@172.23.97.149) - Service 'query' exited with status 137. Restarting. Messages:
      2021-02-14T18:13:30.669-08:00 [INFO] Using plain authentication for user <ud>@cbq-engine</ud> 
      2021-02-14T18:13:30.669-08:00 [ERROR] Connection Error: EOF. Refreshing bucket default (vbid:651,keys:<ud>[00000000000000000002]</ud>) 
      2021-02-14T18:13:30.789-08:00 [ERROR] Connection Error: EOF. Refreshing bucket default (vbid:151,keys:<ud>[00000000000000000013]</ud>) 
      2021-02-14T18:13:31.099-08:00 [Info] GSIC[default/default-_default-_default-1613352171532214613] bd919ecf-d0d9-4b40-9ae4-6a75e0c2f828 new temp file ... /opt/couchbase/var/lib/couchbase/tmp/scan-results36217754674507
      2021-02-14T18:13:31.244-08:00 [INFO] Using plain authentication for user <ud>@cbq-engine</ud> 
      2021-02-14T18:13:31.250-08:00 [INFO] Using plain authentication for user <ud>@cbq-engine</ud> 
      2021-02-14T18:13:31.254-08:00 [Info] GSIC[default/default-_default-_default-1613352171532214613] request(c3ae516e-0505-4027-9849-80ead181d214) removing temp file /opt/couchbase/var/lib/couchbase/tmp/scan-results36217135173788 ...
      2021/02/14 18:13:31 audit: unable to get real userid from request: no web credentials found in request
       
      2021-02-14T18:13:52.284-08:00, ns_log:0:info:message(ns_1@172.23.97.149) - Service 'query' exited with status 1. Restarting. Messages:
      2021/02/14 18:13:52 revrpc: Got error (unexpected EOF) and will retry in 1s
      2021/02/14 18:13:52 WARN: cfg_metakv: RunObserveChildren, err: Get http://127.0.0.1:8091/_metakv/fts/cbgt/cfg/?feed=continuous: CBAuth database is stale: last reason: unexpected EOF -- cbgt.NewCfgMetaKv.func1() at cfg_metakv.go:155
      _time=2021-02-14T18:13:52.270-08:00 _level=ERROR _msg=Unable to initialize cbauth. Error CBAuth database is stale: last reason: unexpected EOF 
      _time=2021-02-14T18:13:52.270-08:00 _level=WARN _msg=Unable to parse url <ud>http://127.0.0.1:8091</ud>: Unusable url http://127.0.0.1:8091. No user information. 
      _time=2021-02-14T18:13:52.270-08:00 _level=WARN _msg=Unable to initialize cbAuth, access to couchbase buckets may be restricted 
      _time=2021-02-14T18:13:52.270-08:00 _level=ERROR _msg=Cannot connect url http://127.0.0.1:8091 - cause: Get http://127.0.0.1:8091/pools: dial tcp 127.0.0.1:8091: connect: connection refused 
      _time=2021-02-14T18:13:52.270-08:00 _level=ERROR _msg=Shutting down. 
       
      2021-02-14T18:13:52.531-08:00, ns_node_disco:5:warning:node down(ns_1@172.23.123.31) - Node 'ns_1@172.23.123.31' saw that node 'ns_1@172.23.97.149' went down. Details: [{nodedown_reason,
                                                                                         {shutdown,
                                                                                          transport_closed}}]
      2021-02-14T18:13:52.532-08:00, ns_node_disco:5:warning:node down(ns_1@172.23.123.26) - Node 'ns_1@172.23.123.26' saw that node 'ns_1@172.23.97.149' went down. Details: [{nodedown_reason,
                                                                                         {shutdown,
                                                                                          transport_closed}}]
      2021-02-14T18:13:52.532-08:00, ns_node_disco:5:warning:node down(ns_1@172.23.123.32) - Node 'ns_1@172.23.123.32' saw that node 'ns_1@172.23.97.149' went down. Details: [{nodedown_reason,
                                                                                         {shutdown,
                                                                                          transport_closed}}]
      2021-02-14T18:13:52.533-08:00, ns_node_disco:5:warning:node down(ns_1@172.23.123.24) - Node 'ns_1@172.23.123.24' saw that node 'ns_1@172.23.97.149' went down. Details: [{nodedown_reason,
                                                                                         {shutdown,
                                                                                          transport_closed}}]
      2021-02-14T18:13:52.533-08:00, ns_node_disco:5:warning:node down(ns_1@172.23.96.122) - Node 'ns_1@172.23.96.122' saw that node 'ns_1@172.23.97.149' went down. Details: [{nodedown_reason,
                                                                                         {shutdown,
                                                                                          transport_closed}}]
      2021-02-14T18:13:52.533-08:00, ns_node_disco:5:warning:node down(ns_1@172.23.123.25) - Node 'ns_1@172.23.123.25' saw that node 'ns_1@172.23.97.149' went down. Details: [{nodedown_reason,
                                                                                         {shutdown,
                                                                                          transport_closed}}]
      2021-02-14T18:13:52.534-08:00, ns_node_disco:5:warning:node down(ns_1@172.23.121.77) - Node 'ns_1@172.23.121.77' saw that node 'ns_1@172.23.97.149' went down. Details: [{nodedown_reason,
                                                                                         {shutdown,
                                                                                          transport_closed}}]
      2021-02-14T18:13:52.534-08:00, ns_node_disco:5:warning:node down(ns_1@172.23.106.134) - Node 'ns_1@172.23.106.134' saw that node 'ns_1@172.23.97.149' went down. Details: [{nodedown_reason,
                                                                                          {shutdown,
                                                                                           transport_closed}}]
      2021-02-14T18:13:52.534-08:00, ns_node_disco:5:warning:node down(ns_1@172.23.120.74) - Node 'ns_1@172.23.120.74' saw that node 'ns_1@172.23.97.149' went down. Details: [{nodedown_reason,
                                                                                         {shutdown,
                                                                                          transport_closed}}]
      2021-02-14T18:13:52.534-08:00, ns_node_disco:5:warning:node down(ns_1@172.23.120.77) - Node 'ns_1@172.23.120.77' saw that node 'ns_1@172.23.97.149' went down. Details: [{nodedown_reason,
                                                                                         {shutdown,
                                                                                          transport_closed}}]
      2021-02-14T18:13:52.536-08:00, ns_node_disco:5:warning:node down(ns_1@172.23.120.86) - Node 'ns_1@172.23.120.86' saw that node 'ns_1@172.23.97.149' went down. Details: [{nodedown_reason,
                                                                                         {shutdown,
                                                                                          transport_closed}}]
      2021-02-14T18:13:52.537-08:00, ns_node_disco:5:warning:node down(ns_1@172.23.120.81) - Node 'ns_1@172.23.120.81' saw that node 'ns_1@172.23.97.149' went down. Details: [{nodedown_reason,
                                                                                         {shutdown,
                                                                                          transport_closed}}]
      2021-02-14T18:13:52.537-08:00, ns_node_disco:5:warning:node down(ns_1@172.23.96.243) - Node 'ns_1@172.23.96.243' saw that node 'ns_1@172.23.97.149' went down. Details: [{nodedown_reason,
                                                                                         {shutdown,
                                                                                          transport_closed}}]
      2021-02-14T18:13:57.473-08:00, ns_node_disco:4:info:node up(ns_1@172.23.123.32) - Node 'ns_1@172.23.123.32' saw that node 'ns_1@172.23.97.149' came up. Tags: []
      2021-02-14T18:13:57.474-08:00, ns_node_disco:4:info:node up(ns_1@172.23.96.243) - Node 'ns_1@172.23.96.243' saw that node 'ns_1@172.23.97.149' came up. Tags: []
      2021-02-14T18:13:57.475-08:00, ns_node_disco:4:info:node up(ns_1@172.23.106.134) - Node 'ns_1@172.23.106.134' saw that node 'ns_1@172.23.97.149' came up. Tags: []
      2021-02-14T18:13:57.476-08:00, ns_node_disco:4:info:node up(ns_1@172.23.120.74) - Node 'ns_1@172.23.120.74' saw that node 'ns_1@172.23.97.149' came up. Tags: []
      2021-02-14T18:13:57.477-08:00, ns_node_disco:4:info:node up(ns_1@172.23.120.81) - Node 'ns_1@172.23.120.81' saw that node 'ns_1@172.23.97.149' came up. Tags: []
      2021-02-14T18:13:57.479-08:00, ns_node_disco:4:info:node up(ns_1@172.23.123.31) - Node 'ns_1@172.23.123.31' saw that node 'ns_1@172.23.97.149' came up. Tags: []
      2021-02-14T18:13:57.479-08:00, ns_node_disco:4:info:node up(ns_1@172.23.96.122) - Node 'ns_1@172.23.96.122' saw that node 'ns_1@172.23.97.149' came up. Tags: []
      2021-02-14T18:13:57.479-08:00, ns_node_disco:4:info:node up(ns_1@172.23.123.24) - Node 'ns_1@172.23.123.24' saw that node 'ns_1@172.23.97.149' came up. Tags: []
      2021-02-14T18:13:57.479-08:00, ns_node_disco:4:info:node up(ns_1@172.23.121.77) - Node 'ns_1@172.23.121.77' saw that node 'ns_1@172.23.97.149' came up. Tags: []
      2021-02-14T18:13:57.482-08:00, ns_node_disco:4:info:node up(ns_1@172.23.120.77) - Node 'ns_1@172.23.120.77' saw that node 'ns_1@172.23.97.149' came up. Tags: []
      2021-02-14T18:13:57.485-08:00, ns_node_disco:4:info:node up(ns_1@172.23.123.25) - Node 'ns_1@172.23.123.25' saw that node 'ns_1@172.23.97.149' came up. Tags: []
      2021-02-14T18:13:57.485-08:00, ns_node_disco:4:info:node up(ns_1@172.23.120.86) - Node 'ns_1@172.23.120.86' saw that node 'ns_1@172.23.97.149' came up. Tags: []
      2021-02-14T18:13:57.531-08:00, ns_node_disco:4:info:node up(ns_1@172.23.123.26) - Node 'ns_1@172.23.123.26' saw that node 'ns_1@172.23.97.149' came up. Tags: []
      2021-02-14T18:14:00.516-08:00, mb_master:0:info:message(ns_1@172.23.106.134) - Haven't heard from a higher priority node or a master, so I'm taking over.
      2021-02-14T18:14:00.523-08:00, ns_log:0:info:message(ns_1@172.23.97.149) - Service 'query' exited with status 1. Restarting. Messages:
      2021/02/14 18:13:57 revrpc: Got error (dial tcp 127.0.0.1:8091: connect: connection refused) and will retry in 1s
      2021/02/14 18:13:57 WARN: cfg_metakv: RunObserveChildren, err: Get http://127.0.0.1:8091/_metakv/fts/cbgt/cfg/?feed=continuous: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connect: connection refused -- cbgt.NewCfgMetaKv.func1() at cfg_metakv.go:155
      _time=2021-02-14T18:13:57.476-08:00 _level=ERROR _msg=Unable to initialize cbauth. Error CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connect: connection refused 
      _time=2021-02-14T18:13:57.476-08:00 _level=WARN _msg=Unable to parse url <ud>http://127.0.0.1:8091</ud>: Unusable url http://127.0.0.1:8091. No user information. 
      _time=2021-02-14T18:13:57.476-08:00 _level=WARN _msg=Unable to initialize cbAuth, access to couchbase buckets may be restricted 
      _time=2021-02-14T18:13:57.477-08:00 _level=ERROR _msg=Cannot connect url http://127.0.0.1:8091 - cause: Get http://127.0.0.1:8091/pools: dial tcp 127.0.0.1:8091: connect: connection refused 
      _time=2021-02-14T18:13:57.477-08:00 _level=ERROR _msg=Shutting down. 
       
      2021-02-14T18:14:00.545-08:00, auto_failover:0:info:message(ns_1@172.23.106.134) - Enabled auto-failover with timeout 120 and max count 1
      2021-02-14T18:14:01.281-08:00, menelaus_web_sup:1:info:web start ok(ns_1@172.23.97.149) - Couchbase Server has started on web port 8091 on node 'ns_1@172.23.97.149'. Version: "7.0.0-4454-enterprise".
      2021-02-14T18:14:01.533-08:00, mb_master:0:warning:message(ns_1@172.23.97.149) - Somebody thinks we're master. Not forcing mastership takover over ourselves

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              isha Isha Kandaswamy (Inactive)
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty