Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-16155

rebalance-in fails to wait for bucket deletion after graceful failover.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 4.1.0
    • 4.0.0
    • couchbase-bucket
    • Security Level: Public
    • Sherlock RC4 4.0.0-4047 - This symptom *probably* existed before Sherlock RC1, we only just got to the bottom of triaging this.
    • Untriaged
    • Centos 64-bit
    • Unknown
    • KV: Sep 14 - Oct 2

    Description

      Test first loads 100M documents and did a graceful failover. It was fine.

      Test then add back the node (.14) and starts rebalance. It didn't complete.

      (If I then manually trigger rebalance again, it is fine.)

      (Also, if run with 10M documents total, test also passes.)

      (The 100M case is very reproducible on Ares.)

      REST call to pools/default/tasks got this:

      {u'status': u'notRunning', u'statusIsStale': False, u'errorMessage': u'Reba lance failed. See logs for detailed reason. You can try rebalance again.', u'type': u'rebalance', u'masterRequestTimedOu t': False}

      Here is some log snippet from the console:

      Failed to wait deletion of some buckets on some nodes: [{'ns_1@172.23.96.14',
      {'EXIT',

      {old_buckets_shutdown_wait_failed, ["bucket-1"]}

      }}]

      Here is something possibly relevant in the ns_server.debug.log on the .14 node:

      [ns_server:error,2015-08-25T00:47:09.552-07:00,ns_1@172.23.96.14:timeout_diag_logger<0.129.0>:timeout_diag_logger:do_diag:105]Got timeout {slow_bucket_stop,{{single_bucket_kv_sup,"bucket-1"},
      <0.369.0>,supervisor,
      [single_bucket_kv_sup]}}

      Attachments

        1. 172.23.96.11.zip
          6.90 MB
        2. 172.23.96.12.zip
          6.66 MB
        3. 172.23.96.13.zip
          6.63 MB
        4. 172.23.96.14.zip
          6.25 MB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              dkao David Kao (Inactive)
              dkao David Kao (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty