Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-45258

[System Test] Service 'fts' exited with status 137: Terminating abnormally

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      Build: 7.0.0-4766
      Componet Clusterops test: -test tests/fts/cheshire-cat/test_fts_clusterops_cheshire_cat_coll_crud.yml -scope tests/fts/cheshire-cat/scope_fts_cheshire_cat.yml
      Test Cycle: 1

      Test with index create/drop loop.

      Test steps:

      1. there are 5 buckets, out of which 20 static fts indexes are created on collections of 3 buckets. Mutations are going on these collections
      2. For the collections on other 2 buckets, we create and drop indexes and no mutations are going on these collections.
      3. Continuously run queries on the indexes of collections of bucket1 and bucket2
      4. wait for 15 mins
      5. kill cbft on 172.23.97.217 and wait for 15 mins
      6. add fts node 172.23.107.4 and start rebalance and wait for 15 mins which failed with : MB-45254
      7. Test continued. Kill cbft on 172.23.107.5 and wait for 15 mins
      8. Rebalance to remove kv node: 172.23.97.232

      Seeing below on node 172.23.107.2 and 172.23.97.217

      info.log

       
      [ns_server:error,2021-03-24T17:18:47.047-07:00,ns_1@172.23.107.2:service_status_keeper_worker<0.8117.0>:rest_utils:get_json:63]Request to (fts) api/nsstatus with headers [] failed: {error,timeout}
      [ns_server:error,2021-03-24T17:18:47.048-07:00,ns_1@172.23.107.2:service_status_keeper-fts<0.8122.0>:service_status_keeper:handle_cast:109]Service service_fts returned incorrect status
      [ns_server:info,2021-03-24T17:18:52.749-07:00,ns_1@172.23.107.2:<0.22630.147>:diag_handler:log_all_dcp_stats:197]logging dcp stats
      [ns_server:info,2021-03-24T17:18:53.409-07:00,ns_1@172.23.107.2:<0.22630.147>:diag_handler:log_all_dcp_stats:201]end of logging dcp stats
      [ns_server:warn,2021-03-24T17:18:53.802-07:00,ns_1@172.23.107.2:timer_lag_recorder<0.7871.0>:timer_lag_recorder:report_missed_msgs:53]Skipped 3 'check_time' messages
      [ns_server:info,2021-03-24T17:18:55.017-07:00,ns_1@172.23.107.2:ns_config_rep<0.22880.147>:ns_config_rep:pull_one_node:359]Pulling config from: 'ns_1@172.23.107.3'
      [ns_server:error,2021-03-24T17:19:03.809-07:00,ns_1@172.23.107.2:service_status_keeper_worker<0.8117.0>:rest_utils:get_json:63]Request to (fts) api/nsstatus with headers [] failed: {error,timeout}
      [ns_server:warn,2021-03-24T17:19:20.737-07:00,ns_1@172.23.107.2:timer_lag_recorder<0.7871.0>:timer_lag_recorder:report_missed_msgs:53]Skipped 1 'check_time' messages
      [ns_server:info,2021-03-24T17:19:24.856-07:00,ns_1@172.23.107.2:ns_config_rep<0.22880.147>:ns_config_rep:pull_one_node:359]Pulling config from: 'ns_1@172.23.97.235'
      [ns_server:error,2021-03-24T17:19:24.933-07:00,ns_1@172.23.107.2:service_status_keeper-fts<0.8122.0>:service_status_keeper:handle_cast:109]Service service_fts returned incorrect status
      [ns_server:error,2021-03-24T17:19:28.002-07:00,ns_1@172.23.107.2:service_agent-fts<0.8865.0>:service_agent:handle_info:291]Lost json rpc connection for service fts, reason shutdown. Terminating.
      [ns_server:error,2021-03-24T17:19:28.014-07:00,ns_1@172.23.107.2:service_agent-fts<0.8865.0>:service_agent:terminate:312]Terminating abnormally
      [user:info,2021-03-24T17:19:28.018-07:00,ns_1@172.23.107.2:<0.7872.0>:ns_log:crash_consumption_loop:69]Service 'fts' exited with status 137. Restarting. Messages: 

        

      in fts.log, we see while restarting fts its been seen "CBAuth database is stale" and crashed with "Service 'fts' exited with status 1"

       2021-03-24T17:19:21.359-07:00 [INFO] app_herder: persister progress, indexes: 367, waiting: 82
      2021-03-24T17:19:34.662-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      2021-03-24T17:19:34.662-07:00 [INFO] main: file descriptor limit current: 200000 max: 200000
      2021-03-24T17:19:34.663-07:00 [INFO]   -authType="cbauth"
      2021-03-24T17:19:34.663-07:00 [INFO]   -bindGrpc="172.23.107.2:9130,0.0.0.0:9130"
      2021-03-24T17:19:34.663-07:00 [INFO]   -bindGrpcSsl="172.23.107.2:19130,0.0.0.0:19130"
      2021-03-24T17:19:34.663-07:00 [INFO]   -bindHttp="172.23.107.2:8094,0.0.0.0:8094"
      2021-03-24T17:19:34.663-07:00 [INFO]   -bindHttps=":18094"
      2021-03-24T17:19:34.663-07:00 [INFO]   -cfgConnect="metakv"
      2021-03-24T17:19:34.663-07:00 [INFO]   -container=""
      2021-03-24T17:19:34.663-07:00 [INFO]   -dataDir="/data/@fts"
      2021-03-24T17:19:34.663-07:00 [INFO]   -extras="172.23.107.2:8091"
      2021-03-24T17:19:34.663-07:00 [INFO]   -help="false"
      2021-03-24T17:19:34.663-07:00 [INFO]   -options="startCheckServer=skip,slowQueryLogTimeout=5s,defaultMaxPartitionsPerPIndex=171,bleveMaxResultWindow=10000,failoverAssignAllPrimaries=false,hideUI=true,cbaudit=true,ipv6=false,ftsMemoryQuota=17145856000,maxReplicasAllowed=3,bucketTypesAllowed=membase:ephemeral,http2=true,vbuckets=1024"
      2021-03-24T17:19:34.663-07:00 [INFO]   -register="wanted"
      2021-03-24T17:19:34.663-07:00 [INFO]   -server="http://127.0.0.1:8091"
      2021-03-24T17:19:34.663-07:00 [INFO]   -staticDir="static"
      2021-03-24T17:19:34.663-07:00 [INFO]   -staticETag=""
      2021-03-24T17:19:34.663-07:00 [INFO]   -tags="feed,janitor,pindex,queryer,cbauth_service"
      2021-03-24T17:19:34.663-07:00 [INFO]   -tlsCertFile="/opt/couchbase/var/lib/couchbase/config/memcached-cert.pem"
      2021-03-24T17:19:34.663-07:00 [INFO]   -tlsKeyFile="/opt/couchbase/var/lib/couchbase/config/memcached-key.pem"
      2021-03-24T17:19:34.663-07:00 [INFO]   -uuid="a8d092168910204cb96eb762ec06923a"
      2021-03-24T17:19:34.663-07:00 [INFO]   -version="false"
      2021-03-24T17:19:34.663-07:00 [INFO]   -weight="1"
      2021-03-24T17:19:34.663-07:00 [INFO]   GOMAXPROCS=8
      2021-03-24T17:19:34.663-07:00 [INFO] main: registered bleve stores
      2021-03-24T17:19:34.663-07:00 [INFO]   gtreap
      2021-03-24T17:19:34.663-07:00 [INFO]   boltdb
      2021-03-24T17:19:34.663-07:00 [INFO]   metrics
      2021-03-24T17:19:34.663-07:00 [INFO]   goleveldb
      2021-03-24T17:19:34.663-07:00 [INFO]   moss
      2021-03-24T17:19:34.663-07:00 [INFO] main: curr dir: "/opt/couchbase/var/lib/couchbase"
      2021-03-24T17:19:34.663-07:00 [INFO] main: data dir: "/data/@fts"
      2021-03-24T17:20:34.653-07:00 [INFO] cbauth: key: fts/remoteClients registered for tls config updates
      2021-03-24T17:20:34.654-07:00 [INFO] app_herder: memQuota: 17145856000, appQuota: 17145856000, indexQuota: 12859392000, queryQuota: 15431270400
      2021-03-24T17:20:34.654-07:00 [INFO] main: custom jsoniter json implementation enabled
      2021-03-24T17:20:34.654-07:00 [WARN] main: failed to start audit service with err: CBAuth database is stale. Was never updated yet. -- main.mainStart() at main.go:474
      2021-03-24T17:20:34.654-07:00 [WARN] cfg_metakv: RunObserveChildren, err: Get http://127.0.0.1:8091/_metakv/fts/cbgt/cfg/?feed=continuous: CBAuth database is stale. Was never updated yet. -- cbgt.NewCfgMetaKv.func1() at cfg_metakv.go:155
      2021-03-24T17:20:34.654-07:00 [INFO] cbauth: received  security change notification, code: 3
      2021-03-24T17:20:34.654-07:00 [WARN] cbauth: GetTLSConfig failed, err: CBAuth database is stale. Was never updated yet. -- cbgt.(*SecurityContext).refreshConfig() at cbauth.go:135
      2021-03-24T17:20:34.655-07:00 [INFO] cbauth: key: fts/grpc-ssl registered for tls config updates
      2021-03-24T17:20:34.655-07:00 [INFO] init_grpc: GrpcServer Started at "0.0.0.0:9130", proto: "tcp4"
      2021-03-24T17:20:34.658-07:00 [INFO] main: ctl starting, dryRun: false, waitForMemberNodes: 30, verbose: 3, maxConcurrentPartitionMovesPerNode: 1
      2021-03-24T17:20:34.658-07:00 [WARN] error in appending certificates to transport's tlsConfig -- cbft.setupHttp2ClientLOCKED() at pindex_bleve.go:2557
      2021-03-24T17:20:34.658-07:00 [INFO] ctl: run, CurrentMemberNodes err Get http://127.0.0.1:8091/_metakv/fts/cbgt/cfg/nodeDefs-wanted/: CBAuth database is stale. Was never updated yet.
      2021-03-24T17:20:34.658-07:00 [FATA] main: ctl.StartCtl, err: Get http://127.0.0.1:8091/_metakv/fts/cbgt/cfg/nodeDefs-wanted/: CBAuth database is stale. Was never updated yet. -- main.main() at main.go:234
      2021-03-24T17:20:35.005-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
      2021-03-24T17:20:35.005-07:00 [INFO] main: file descriptor limit current: 200000 max: 200000
      2021-03-24T17:20:35.005-07:00 [INFO]   -authType="cbauth"
      2021-03-24T17:20:35.005-07:00 [INFO]   -bindGrpc="172.23.107.2:9130,0.0.0.0:9130"
      2021-03-24T17:20:35.005-07:00 [INFO]   -bindGrpcSsl="172.23.107.2:19130,0.0.0.0:19130"
       

      cbcollect logs:

      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1616643988/collectinfo-2021-03-25T034631-ns_1%40172.23.107.2.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1616643988/collectinfo-2021-03-25T034631-ns_1%40172.23.107.3.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1616643988/collectinfo-2021-03-25T034631-ns_1%40172.23.107.4.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1616643988/collectinfo-2021-03-25T034631-ns_1%40172.23.107.5.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1616643988/collectinfo-2021-03-25T034631-ns_1%40172.23.97.215.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1616643988/collectinfo-2021-03-25T034631-ns_1%40172.23.97.217.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1616643988/collectinfo-2021-03-25T034631-ns_1%40172.23.97.227.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1616643988/collectinfo-2021-03-25T034631-ns_1%40172.23.97.232.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1616643988/collectinfo-2021-03-25T034631-ns_1%40172.23.97.235.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1616643988/collectinfo-2021-03-25T034631-ns_1%40172.23.97.236.zip

      ########## Cluster config ##################
      ######  fts : 5 ===== > [172.23.107.2:8091 172.23.107.3:8091 172.23.107.5:8091 172.23.97.216:8091 172.23.97.217:8091]  ###########
      ######  kv : 4 ===== > [172.23.97.215:8091 172.23.97.232:8091 172.23.97.235:8091 172.23.97.237:8091]  ###########
      ######  n1ql : 2 ===== > [172.23.97.227:8091 172.23.97.236:8091]  ########### 

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              abhinav Abhi Dangeti
              girish.benakappa Girish Benakappa
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty