Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-8303

[system test] Rebalance in hangs with some initial vbucket movement due to bucket state bouncing between not ready and active

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Major
    • 2.1.0
    • 2.1.0
    • couchbase-bucket, ns_server
    • Security Level: Public
    • None
    • build 2.0.2-800-rel
    • Centos 64-bit

    Description

      Cluster ip is 172.23.105.23
      1. create 8 nodes cluster, each node has 12G RAM, HHD
      2. create 2 buckets default and saslbucket, with memory quota 6G and 4G
      3. Run the KV use case for 2 days:
      loading 35M items to each bucket, make resident ratio 70%~80%, access the data 15k ops/sec with 5% create, 5% delete, 5%expire, 5% update, 80 gets for several hours.
      4. Then with the same work load, run some rebalance operations. When proceed to rebalance in 2 nodes, after several vbucket movement (17 vbucket for saslbucket), rebalance hangs.

      When check the log, see a lot of bucket state change between not ready and active:

      root@cola-s10305:/opt/couchbase/var/lib/couchbase/logs# tail -f error.1
      <<"replication_building_1013_'ns_1@172.23.105.33'">>} took too long: 1196997 us
      [ns_server:error,2013-05-16T15:13:17.111,ns_1@172.23.105.32:<0.11769.0>:ns_memcached:verify_report_long_call:294]call

      {stats,<<>>} took too long: 556521 us
      [ns_server:error,2013-05-16T15:19:19.834,ns_1@172.23.105.32:<0.11769.0>:ns_memcached:verify_report_long_call:294]call {get_tap_docs_estimate,1013, <<"replication_building_1013_'ns_1@172.23.105.33'">>} took too long: 725657 us
      [ns_server:error,2013-05-16T15:22:18.618,ns_1@172.23.105.32:ns_memcached-saslbucket<0.11752.0>:ns_memcached:handle_info:671]handle_info(ensure_bucket,..) took too long: 661061 us
      [ns_server:error,2013-05-16T15:23:35.127,ns_1@172.23.105.32:<0.11704.0>:ns_memcached:verify_report_long_call:294]call {stats,<<>>}

      took too long: 568954 us
      [ns_server:error,2013-05-16T15:25:59.291,ns_1@172.23.105.32:<0.11705.0>:ns_memcached:verify_report_long_call:294]call topkeys took too long: 1217646 us
      [ns_server:error,2013-05-16T15:30:17.553,ns_1@172.23.105.32:ns_doctor<0.9998.0>:ns_doctor:update_status:234]The following buckets became not ready on node 'ns_1@172.23.105.23': ["saslbucket"], those of them are active ["saslbucket"]
      [ns_server:error,2013-05-16T15:30:37.540,ns_1@172.23.105.32:ns_doctor<0.9998.0>:ns_doctor:update_status:234]The following buckets became not ready on node 'ns_1@172.23.105.23': ["saslbucket"], those of them are active ["saslbucket"]

      diag links:
      https://s3.amazonaws.com/bugdb/jira/MB-8303/8nodes_202-800_rebalance_hang_20130516-143321.tgz

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Aliaksey Artamonau Aliaksey Artamonau (Inactive)
            Chisheng Chisheng Hong (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty