Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-25592

rebalance not progressing (or very slowly)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • 5.0.0
    • 5.0.0
    • test-execution
    • 5.0.0-3456

    Description

      Could be duplicate of MB-25434 - if so please mark.  I'm opening separate issue after speaking with Arun because we couldn't tell wether the rebalance was actually stuck or slowly progressing.  The test is the same and repro/details are here: https://github.com/couchbaselabs/sequoia/blob/master/tests/integration/README.md

       

      After 4 hours one of the vbuckets hasn't updated looking at updater_loop.

      [ns_server:debug,2017-08-07T11:51:44.550-07:00,ns_1@172.23.108.103:<0.28713.0>:ns_rebalance_observer:docs_left_updater_loop:318]Starting docs_left_updater_loop:"default"
      [ns_server:debug,2017-08-07T11:51:49.551-07:00,ns_1@172.23.108.103:<0.28713.0>:ns_rebalance_observer:docs_left_updater_loop:318]Starting docs_left_updater_loop:"default"
       
      ...
       
      [ns_server:debug,2017-08-07T13:55:54.550-07:00,ns_1@172.23.108.103:<0.28713.0>:ns_rebalance_observer:docs_left_updater_loop:318]Starting docs_left_updater_loop:"default"
      [ns_server:debug,2017-08-07T13:55:59.550-07:00,ns_1@172.23.108.103:<0.28713.0>:ns_rebalance_observer:docs_left_updater_loop:318]Starting docs_left_updater_loop:"default"

       

      Seeing some I/O errors in memcached (172.23.97.237)

      2017-08-07T11:46:04.793935-07:00 WARNING (WAREHOUSE) CouchKVStore::openDB: error:no such file [No such file or directory], name:/data/WAREHOUSE/270.couch.1, option:2, fileRev:1
      2017-08-07T11:46:04.794116-07:00 WARNING (WAREHOUSE) doDcpVbTakeoverStats: exception while getting num persisted deletes for vbucket:270 - treating as 0 deletes. Details: CouchKVStore::getNumPersistedDeletes:Failed to open database file for vBucket = 270 rev = 1 with error:no such file: Input/output error

       

      We are not sure if the behavior is caused by addition of xattrs via sync_gateway to test.  Next step is to re-run without the sg load.  Unless there're any other recommendations.

       

      Collect from 2 hrs ago attached, and updated collect is pending.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            tommie Tommie McAfee (Inactive)
            tommie Tommie McAfee (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty