Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7272 memcached/ep-engine crashes in flusher or other paths when it receives a shutdown message from ns-server
  3. MB-7110

[system test] rebalance failed due to "Failed to wait deletion of some buckets on some nodes"

    XMLWordPrintable

Details

    • Technical task
    • Resolution: Fixed
    • Blocker
    • 2.0.1
    • 2.0
    • ns_server
    • Security Level: Public
    • centos 6.2 64bit build 2.0.0-1931

    Description

      Cluster information:

      • 8 centos 6.2 64bit server with 4 cores CPU
      • Each server has 32 GB RAM and 400 GB SSD disk.
      • 24.8 GB RAM for couchbase server at each node
      • SSD disk format ext4 on /data
      • Each server has its own SSD drive, no disk sharing with other server.
      • Create cluster with 6 nodes installed couchbase server 2.0.0-1931
      • Link to manifest file http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1931-rel.rpm.manifest.xml
      • Cluster has 2 buckets, default and saslbucket (12GB/each with 1 replica) and with 64 vbuckets setup.
      • Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)

      10.6.2.37
      10.6.2.38
      10.6.2.44
      10.6.2.45
      10.6.2.42
      10.6.2.43

      • Load 20 million items to each bucket. Each key has size 1024 bytes
      • After done loading, wait until initial index.
      • After initial indexing done, mutate all items with size from 1024 to 1512 bytes.
      • Queries all 4 views from 2 docs
      • Add node 44 and rebalance. Passed
      • Add node 45 and rebalance. Passed.
      • Check auto failover is enable on cluster.
      • Turn on firewall on node 40
        iptables -A INPUT -p tcp -i eth0 --dport 1000:60000 -j REJECT
        iptables -A OUTPUT -p tcp -o eth0 --sport 1000:60000 -j REJECT
      • Node 40 was down as expected.
      • Auto failover kicked in after one minute.
      • Disable firewall on node 40. Cluster saw node 40 up.
      • Add node 40 back to cluster and rebalance. In few seconds, rebalance failed with error

      [rebalance:error,2012-11-06T0:41:48.498,ns_1@10.6.2.37:<0.4077.2612>:ns_rebalancer:do_wait_buckets_shutdown:204]Failed to wait deletion of some buckets on some nodes: [{'ns_1@10.6.2.40',
      {'EXIT',

      {old_buckets_shutdown_wait_failed, ["default"]}

      }}]

      [user:info,2012-11-06T0:41:48.500,ns_1@10.6.2.37:<0.14641.0>:ns_orchestrator:handle_info:319]Rebalance exited with reason {buckets_shutdown_wait_failed,
      [{'ns_1@10.6.2.40',
      {'EXIT',

      {old_buckets_shutdown_wait_failed, ["default"]}

      }}]}

      Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201211/8nodes-ci-1931-reb-failed-undelete-old-bucket-20121106-121536.tgz

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            jin Jin Lim (Inactive)
            thuan Thuan Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty