Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31667

FTS System Test : Nodes running out of disk space and causing rebalance failures

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 6.5.0
    • 6.0.0
    • fts
    • centos2 cluster

    Description

      Build : 6.0.0-1693 (RC4) (was also seen in RC3, but couldn't confirm due to lack of resources)
      Test : -test tests/fts/test_fts_alice_component.yml -scope tests/fts/scope_component_fts.yml
      Scale : 1

      The FTS system test is showing failures in rebalance operations due to lack of disk space on some nodes. This issue was seen in RC3 as well, but could not be investigated further due to lack of resources.

      The cluster is live and available for debugging : http://172.23.96.206:8091

      This issue was not seen on RC2. This could also be related to MB-31405.

      Log Excerpts

      [root@localhost logs]# cat error.log | grep -i "rebalance exited" -a5 -b5
      7647-                                         [{file,
      7696-                                           "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
      7836-                                          {line,92}]}]}}
      7893-[ns_server:error,2018-10-15T22:20:19.781-07:00,ns_1@172.23.96.206:service_rebalancer-fts<0.3722.56>:service_agent:process_bad_results:810]Service call unset_rebalancer (service fts) failed on some nodes:
      8097-[{'ns_1@172.23.96.206',nack}]
      8127:[user:error,2018-10-15T22:20:19.782-07:00,ns_1@172.23.96.206:<0.22786.0>:ns_orchestrator:do_log_rebalance_completion:1117]Rebalance exited with reason {service_rebalance_failed,fts,
      8309-                                 {lost_connection,shutdown}}
      8370-[ns_server:error,2018-10-15T22:20:20.225-07:00,ns_1@172.23.96.206:service_stats_collector-fts<0.8908.0>:rest_utils:get_json_local:63]Request to (fts) api/nsstats failed: {error,
      8548-                                      {econnrefused,
      8601-                                       [{lhttpc_client,send_request,1,
      8672-                                         [{file,
      --
      225890-                                         [{file,
      225939-                                           "/home/couchbase/jenkins/workspace/couchbase-server-unix/couchdb/src/lhttpc/lhttpc_client.erl"},
      226079-                                          {line,92}]}]}}
      

      [root@localhost logs]# zgrep -i "2018-10-15T22:20" fts.log* | grep -i "FATA"
      fts.log.9.gz:2018-10-15T22:20:18.767-07:00 [FATA] scorch AsyncError, treating this as fatal, err: got err persisting snapshot: error persisting segment: open /data/@fts/social_70fa7eefa8e4f81e_6ddbfb54.pindex/store/0000000114cf.zap: no space left on device, stack dump:  -- main.initBleveOptions.func1() at init_bleve.go:91
      fts.log.9.gz:2018-10-15T22:20:25.484-07:00 [FATA] scorch AsyncError, treating this as fatal, err: got err persisting snapshot: open /data/@fts/st_index_scorch_14d99cdd094405bc_f4e0a48a.pindex/store/000000008225.zap: no space left on device, stack dump:  -- main.initBleveOptions.func1() at init_bleve.go:91
      fts.log.9.gz:2018-10-15T22:20:31.014-07:00 [FATA] moss OnError, treating this as fatal, err: write /data/@fts/good_state_731de917f63d2eb4_f4e0a48a.pindex/store/data-0000000000000002.moss: no space left on device, stack dump:  -- main.initMossOptions.func1() at init_moss.go:69
      fts.log.9.gz:2018-10-15T22:20:37.884-07:00 [FATA] moss OnError, treating this as fatal, err: write /data/@fts/good_state_731de917f63d2eb4_f4e0a48a.pindex/store/data-0000000000000005.moss: no space left on device, stack dump:  -- main.initMossOptions.func1() at init_moss.go:69
      fts.log.9.gz:2018-10-15T22:20:41.279-07:00 [FATA] moss OnError, treating this as fatal, err: open /data/@fts/good_state_731de917f63d2eb4_f4e0a48a.pindex/store/data-0000000000000008.moss: no space left on device, stack dump:  -- main.initMossOptions.func1() at init_moss.go:69
      fts.log.9.gz:2018-10-15T22:20:50.152-07:00 [FATA] moss OnError, treating this as fatal, err: open /data/@fts/good_state_731de917f63d2eb4_f4e0a48a.pindex/store/data-0000000000000009.moss: no space left on device, stack dump:  -- main.initMossOptions.func1() at init_moss.go:69
      fts.log.9.gz:2018-10-15T22:20:59.504-07:00 [FATA] moss OnError, treating this as fatal, err: write /data/@fts/good_state_731de917f63d2eb4_f4e0a48a.pindex/store/data-0000000000000005.moss: no space left on device, stack dump:  -- main.initMossOptions.func1() at init_moss.go:69
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              mihir.kamdar Mihir Kamdar (Inactive)
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty