Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45430

[System Test] FTS crash-restart happened 15 times on 1 node

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Critical
    • 7.0.0
    • Cheshire-Cat
    • fts

    Description

      Build : 7.0.0-4797
      Test : -test tests/integration/cheshirecat/test_cheshirecat_longevity_no_eventing_scale3.yml -scope tests/integration/cheshirecat/scope_cheshirecat_with_backup.yml (w/o Eventing)
      Scale : 3
      Iteration : 3rd
      Day : 4th

      cbft crashed and restarted 15 times on node 172.23.96.122 between 2021-04-01T06:14 & 2021-04-01T06:16

      [root@localhost logs]# zgrep -i "cbft started" fts.log* | grep 2021-04-01T06
      fts.log:2021-04-01T06:14:36.128-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      fts.log:2021-04-01T06:14:41.344-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      fts.log:2021-04-01T06:14:46.533-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      fts.log:2021-04-01T06:14:51.727-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      fts.log:2021-04-01T06:14:57.001-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      fts.log:2021-04-01T06:15:02.188-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      fts.log:2021-04-01T06:15:07.434-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      fts.log:2021-04-01T06:15:12.657-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      fts.log:2021-04-01T06:15:17.823-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      fts.log:2021-04-01T06:15:23.014-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      fts.log:2021-04-01T06:15:28.214-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      fts.log:2021-04-01T06:15:33.408-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      fts.log:2021-04-01T06:15:38.608-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      fts.log:2021-04-01T06:15:43.827-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      fts.log:2021-04-01T06:15:49.048-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      

      2021-04-01T06:13:36.669-07:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:615
      2021-04-01T06:13:36.674-07:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:615
      2021-04-01T06:13:36.674-07:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:615
      2021-04-01T06:13:36.674-07:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:615
      2021-04-01T06:13:36.686-07:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:615
      2021-04-01T06:13:44.208-07:00 [WARN] grpc_client: Query() returned error from host: 172.23.106.134:19130, err: grpc_client: query got status code: 429, resp: &bleve.SearchResult{Status:(*bleve.SearchStatus)(0xc3c65f7840), Request:(*bleve.SearchRequest)(0xc4ae793ec0), Hits:search.DocumentMatchCollection(nil), Total:0x0, MaxScore:0, Took:0, Facets:search.FacetResults(nil)}, err: rpc error: code = ResourceExhausted desc = grpc_server: Search query reject on not enough quota: query request rejected -- cbft.(*GrpcClient).SearchInContext.func1() at grpc_client.go:159
      2021-04-01T06:13:44.208-07:00 [INFO] bleve: remoteClient: 172.23.106.134:19130 query reject, statusCode: 429, err: <nil>
      2021-04-01T06:14:14.892-07:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:615
      2021-04-01T06:14:16.887-07:00 [INFO] cfg_metakv: metaKVCallback, path: /fts/cbgt/cfg/curMetaKvPlanKey, key: curMetaKvPlanKey, deletion: false
      2021-04-01T06:14:15.235-07:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:615
      2021-04-01T06:14:15.285-07:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:615
      2021-04-01T06:14:15.103-07:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:615
      2021-04-01T06:14:15.610-07:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:615
      2021-04-01T06:14:15.610-07:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:615
      2021-04-01T06:14:14.769-07:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:615
      2021-04-01T06:14:14.792-07:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:615
      2021-04-01T06:14:14.611-07:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:615
      2021-04-01T06:14:26.925-07:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:615
      2021/04/01 06:14:36 revrpc: Got error (dial tcp 127.0.0.1:8091: connect: connection refused) and will retry in 1s
      2021-04-01T06:14:36.128-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      2021-04-01T06:14:36.128-07:00 [INFO] main: file descriptor limit current: 200000 max: 200000
      2021-04-01T06:14:36.128-07:00 [INFO]   -authType="cbauth"
      2021-04-01T06:14:36.128-07:00 [INFO]   -bindGrpc="172.23.96.122:9130,0.0.0.0:9130"
      2021-04-01T06:14:36.128-07:00 [INFO]   -bindGrpcSsl="172.23.96.122:19130,0.0.0.0:19130"
      2021-04-01T06:14:36.128-07:00 [INFO]   -bindHttp="172.23.96.122:8094,0.0.0.0:8094"
      2021-04-01T06:14:36.128-07:00 [INFO]   -bindHttps=":18094"
      2021-04-01T06:14:36.128-07:00 [INFO]   -cfgConnect="metakv"
      2021-04-01T06:14:36.128-07:00 [INFO]   -container=""
      2021-04-01T06:14:36.135-07:00 [INFO]   -dataDir="/data/couchbase/@fts"
      2021-04-01T06:14:36.136-07:00 [INFO]   -extras="172.23.96.122:8091"
      2021-04-01T06:14:36.136-07:00 [INFO]   -help="false"
      2021-04-01T06:14:36.136-07:00 [INFO]   -options="startCheckServer=skip,slowQueryLogTimeout=5s,defaultMaxPartitionsPerPIndex=171,bleveMaxResultWindow=10000,failoverAssignAllPrimaries=false,hideUI=true,cbaudit=true,ipv6=false,ftsMemoryQuota=20837376000,maxReplicasAllowed=3,bucketTypesAllowed=membase:ephemeral,http2=true,vbuckets=1024"
      2021-04-01T06:14:36.136-07:00 [INFO]   -register="wanted"
      2021-04-01T06:14:36.136-07:00 [INFO]   -server="http://127.0.0.1:8091"
      2021-04-01T06:14:36.136-07:00 [INFO]   -staticDir="static"
      2021-04-01T06:14:36.136-07:00 [INFO]   -staticETag=""
      2021-04-01T06:14:36.136-07:00 [INFO]   -tags="feed,janitor,pindex,queryer,cbauth_service"
      2021-04-01T06:14:36.136-07:00 [INFO]   -tlsCertFile="/opt/couchbase/var/lib/couchbase/config/memcached-cert.pem"
      2021-04-01T06:14:36.136-07:00 [INFO]   -tlsKeyFile="/opt/couchbase/var/lib/couchbase/config/memcached-key.pem"
      2021-04-01T06:14:36.136-07:00 [INFO]   -uuid="637f82a2a3f563d712e8f81ccc161282"
      2021-04-01T06:14:36.136-07:00 [INFO]   -version="false"
      2021-04-01T06:14:36.136-07:00 [INFO]   -weight="1"
      2021-04-01T06:14:36.136-07:00 [INFO]   GOMAXPROCS=8
      2021-04-01T06:14:36.136-07:00 [INFO] main: registered bleve stores
      2021-04-01T06:14:36.136-07:00 [INFO]   gtreap
      2021-04-01T06:14:36.136-07:00 [INFO]   boltdb
      2021-04-01T06:14:36.136-07:00 [INFO]   metrics
      2021-04-01T06:14:36.136-07:00 [INFO]   goleveldb
      2021-04-01T06:14:36.136-07:00 [INFO]   moss
      2021-04-01T06:14:36.136-07:00 [INFO] main: curr dir: "/opt/couchbase/var/lib/couchbase"
      2021-04-01T06:14:36.136-07:00 [INFO] main: data dir: "/data/couchbase/@fts"
      2021-04-01T06:14:36.145-07:00 [INFO] cbauth: key: fts/remoteClients registered for tls config updates
      2021-04-01T06:14:36.145-07:00 [INFO] app_herder: memQuota: 20837376000, appQuota: 20837376000, indexQuota: 15628032000, queryQuota: 18753638400
      2021-04-01T06:14:36.146-07:00 [INFO] main: custom jsoniter json implementation enabled
      2021-04-01T06:14:36.146-07:00 [WARN] cfg_metakv: RunObserveChildren, err: Get http://127.0.0.1:8091/_metakv/fts/cbgt/cfg/?feed=continuous: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connect: connection refused -- cbgt.NewCfgMetaKv.func1() at cfg_metakv.go:155
      2021-04-01T06:14:36.146-07:00 [WARN] main: failed to start audit service with err: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connect: connection refused -- main.mainStart() at main.go:474
      2021-04-01T06:14:36.146-07:00 [INFO] cbauth: received  security change notification, code: 3
      2021-04-01T06:14:36.146-07:00 [WARN] cbauth: GetTLSConfig failed, err: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connect: connection refused -- cbgt.(*SecurityContext).refreshConfig() at cbauth.go:135
      2021-04-01T06:14:36.146-07:00 [INFO] cbauth: key: fts/grpc-ssl registered for tls config updates
      2021-04-01T06:14:36.146-07:00 [INFO] init_grpc: GrpcServer Started at "0.0.0.0:9130", proto: "tcp4"
      2021-04-01T06:14:36.148-07:00 [INFO] init_grpc: GrpcServer Started at "0.0.0.0:19130", proto: "tcp4"
      2021-04-01T06:14:36.150-07:00 [INFO] main: ctl starting, dryRun: false, waitForMemberNodes: 30, verbose: 3, maxConcurrentPartitionMovesPerNode: 1
      2021-04-01T06:14:36.150-07:00 [WARN] error in appending certificates to transport's tlsConfig -- cbft.setupHttp2ClientLOCKED() at pindex_bleve.go:2530
      2021-04-01T06:14:36.150-07:00 [INFO] ctl: run, CurrentMemberNodes err Get http://127.0.0.1:8091/_metakv/fts/cbgt/cfg/nodeDefs-wanted/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connect: connection refused
      2021-04-01T06:14:36.150-07:00 [FATA] main: ctl.StartCtl, err: Get http://127.0.0.1:8091/_metakv/fts/cbgt/cfg/nodeDefs-wanted/: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connect: connection refused -- main.main() at main.go:234
      2021/04/01 06:14:41 revrpc: Got error (dial tcp 127.0.0.1:8091: connect: connection refused) and will retry in 1s
      2021-04-01T06:14:41.344-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      2021-04-01T06:14:41.345-07:00 [INFO] main: file descriptor limit current: 200000 max: 200000
      

      The other FTS node in the cluster is 172.23.106.134.

      Attachments

        1. screenshot-1.png
          screenshot-1.png
          46 kB
        2. screenshot-2.png
          screenshot-2.png
          44 kB
        3. screenshot-3.png
          screenshot-3.png
          29 kB
        4. screenshot-4.png
          screenshot-4.png
          33 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            mihir.kamdar Mihir Kamdar (Inactive)
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty