Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46783

[Upgrade] - Service 'fts' exited with status 137. Restarting. Messages seeing during upgrade

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.0.2, 7.1.0
    • Cheshire-Cat
    • fts
    • 6.6.2-9588 -> 7.0.0-5275
    • Untriaged
    • Centos 64-bit
    • 1
    • No

    Description

      Script to Repro
      1. Run the following 6.6.2 longevity test for 3-4 days. We will have 27 node cluster at the end of it.

      ./sequoia -client 172.23.96.162:2375 -provider file:centos_third_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.2-9588 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true
      

      2. Run the script [^create_drop.sh] on 6.6.2 nodes on the cluster. And this was run on the 7.0.0 nodes as well that will be brought into the cluster using swap rebalance for upgrade
      3. Swap rebalance 6(1 of each service) 6.6.2 nodes with 7.0.0 nodes.
      4. Graceful failover 6 node (1 of each service), upgrade, do a recovery and start rebalance.
      5. Graceful failover 6 node (1 of each service), upgrade, do a recovery and start rebalance.

      Rebalance at step 5 failed. This is tracked by bug MB-46778. If the following errors are related to the same we can close this bug off.

      172.23.106.239 : fts

      /opt/couchbase/var/lib/couchbase/logs/fts.log.8.gz:2021-06-07T11:21:42.392-07:00 [WARN] ns_server: retrieve partition seqs: gocouchbase_utils: CouchbaseBucket connection failed, server: http://127.0.0.1:8091, poolName: default, bucketName: default, sourceParams: "{}", err: Get http://127.0.0.1:8091/pools: net/http: request canceled (Client.Timeout exceeded while awaiting headers), please check that your authUser and authPassword are correct and that your couchbase cluster ("http://127.0.0.1:8091") is available -- cbft.RunSourcePartitionSeqs() at ns_server.go:1118
      /opt/couchbase/var/lib/couchbase/logs/fts.log.8.gz:2021-06-07T11:22:03.539-07:00 [WARN] ns_server: retrieve partition seqs: gocouchbase_utils: CouchbaseBucket connection failed, server: http://127.0.0.1:8091, poolName: default, bucketName: default, sourceParams: "{}", err: CBAuth database is stale: last reason: dial tcp 127.0.0.1:8091: connect: connection refused, please check that your authUser and authPassword are correct and that your couchbase cluster ("http://127.0.0.1:8091") is available -- cbft.RunSourcePartitionSeqs() at ns_server.go:1118
      

      ns_1@172.23.106.239 1:52:11 PM   7 Jun, 2021

      Service 'fts' exited with status 137. Restarting. Messages:
      2021-06-07T13:52:06.955-07:00 [WARN] feed_dcp_gocouchbase: OnError, name: good_state_7871405847e4f01f_6ddbfb54, bucketName: default, bucketUUID: , err: pkt.Receive, err: EOF -- cbgt.(*DCPFeed).OnError() at feed_dcp_gocouchbase.go:393
      2021-06-07T13:52:06.956-07:00 [INFO] cbdatasource: receiver closed, server: 172.23.105.61:11210, name: fts:good_state_7871405847e4f01f_6ddbfb54-77c5312a, traces: vb: 0 => 94 (3x);
      2021-06-07T13:52:06.956-07:00 [WARN] feed_dcp_gocouchbase: OnError, name: social_1ecade980c50afc4_6ddbfb54, bucketName: default, bucketUUID: , err: pkt.Receive, err: EOF -- cbgt.(*DCPFeed).OnError() at feed_dcp_gocouchbase.go:393
      2021-06-07T13:52:06.956-07:00 [INFO] cbdatasource: receiver closed, server: 172.23.105.102:11210, name: fts:social_1ecade980c50afc4_6ddbfb54-33f03d9a, traces: vb: 0 => 94 (3x);
      2021-06-07T13:52:06.956-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: pkt.Receive, err: EOF
      2021-06-07T13:52:06.956-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: pkt.Receive, err: EOF
      2021-06-07T13:52:06.956-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: pkt.Receive, err: EOF
      Err EOF
      Err EOF
      2021-06-07T13:52:06.790-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: pkt.Receive, err: EOF
      

      ns_1@172.23.106.239 11:20:28 AM   7 Jun, 2021

      Service 'fts' exited with status 137. Restarting. Messages:
      2021-06-07T11:20:20.311-07:00 [INFO] cbdatasource: receiver closed, server: 172.23.104.244:11210, name: fts:social_1ecade980c50afc4_f4e0a48a-373647b5, traces: vb: 0 => 94 (3x);
      2021-06-07T11:20:21.166-07:00 [INFO] cbdatasource: receiver closed, server: 172.23.104.245:11210, name: fts:social_1ecade980c50afc4_6ddbfb54-754d6aff, traces: vb: 0 => 94 (3x);
      2021-06-07T11:20:20.909-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: pkt.Receive, err: EOF
      2021-06-07T11:20:21.188-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: pkt.Receive, err: EOF
      2021-06-07T11:20:19.871-07:00 [WARN] feed_dcp_gocouchbase: OnError, name: social_1ecade980c50afc4_f4e0a48a, bucketName: default, bucketUUID: , err: pkt.Receive, err: EOF -- cbgt.(*DCPFeed).OnError() at feed_dcp_gocouchbase.go:393
      2021-06-07T11:20:23.891-07:00 [INFO] cbdatasource: receiver closed, server: 172.23.104.214:11210, name: fts:social_1ecade980c50afc4_f4e0a48a-373647b5, traces: vb: 0 => 94 (3x);
      2021-06-07T11:20:20.197-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: pkt.Receive, err: EOF
      Err EOF
      

      cbcollect_info attached. This was not seen on last system test upgrade we had from 6.6.2-9588 -> 7.0.0-5226.

      Attachments

        Issue Links

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              Balakumaran.Gopal Balakumaran Gopal
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty