Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6707

[system test] rebalance does not stop when click on "Stop Rebalance" button

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Duplicate
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0
    • Component/s: ns_server, UI
    • Security Level: Public
    • Labels:
    • Environment:
      centos 6.2 64bit build 2.0.0-1746

      Description

      Cluster information:

      • 8 centos 6.2 64bit server with 4 cores CPU
      • Each server has 32 GB RAM and 400 GB SSD disk.
      • SSD disk format ext4 on /data
      • Each server has its own drive, no disk sharing with other server.
      • Load 15 million items to both buckets
      • Cluster has 2 buckets, default (11GB) and saslbucket (11GB) with consistent view enable. For 2 buckets, we use only 68% total RAM of system.
      • Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
      • Create cluster with 4 nodes installed couchbase server 2.0.0-1746

      10.6.2.37
      10.6.2.38
      10.6.2.39
      10.6.2.40

      • Data path /data
      • View path /data
      • Add 4 nodes to cluster and rebalance
        10.6.2.42
        10.6.2.43
        10.6.2.44
        10.6.2.45
      • rebalance hang. Filed bug MB-6706
      • Try to stop rebalance by click on "Stop Rebalance" button. Cluster does not stop rebalance.

      Link to diags of all nodes https://s3.amazonaws.com/packages.couchbase/diag-logs/orange/201209/8nodes-diags-1746-not-able-stop-rebalance-20120920.tgz

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        karan Karan Kumar (Inactive) added a comment -

        I think its related to MB-6706.

        Can you link both these bugs?

        Show
        karan Karan Kumar (Inactive) added a comment - I think its related to MB-6706 . Can you link both these bugs?
        Hide
        thuan Thuan Nguyen added a comment -

        I will promote this bug to blocker since it bocks my testing. I tried using UI, couchbase-cli and even shutdown couchbase server on a node but it does not stop

        Show
        thuan Thuan Nguyen added a comment - I will promote this bug to blocker since it bocks my testing. I tried using UI, couchbase-cli and even shutdown couchbase server on a node but it does not stop
        Hide
        karan Karan Kumar (Inactive) added a comment -

        This is during when we have no load or views. The cluster is essentially idle.

        Show
        karan Karan Kumar (Inactive) added a comment - This is during when we have no load or views. The cluster is essentially idle.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Ketaki, your case may be very different. Please attach diags

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Ketaki, your case may be very different. Please attach diags
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Unfortunately there's nothing I can see in this diags. Can you please try again?

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Unfortunately there's nothing I can see in this diags. Can you please try again?
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - - edited

        I was referencing to system logs. Sorry, system tests.

        I have few commits in gerrit that improve diag grabbing.

        I'd like experiment to be retried with those fixes. So that I have diags that will tell me why this happens.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - - edited I was referencing to system logs. Sorry, system tests. I have few commits in gerrit that improve diag grabbing. I'd like experiment to be retried with those fixes. So that I have diags that will tell me why this happens.
        Show
        thuan Thuan Nguyen added a comment - Hit this bug again in build 2.0.0-1862 in system test Link to manifest file of build 2.0.0-1862 http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1862-rel.rpm.manifest.xml Link to collect info of all node https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201210/8nodes-col-info-1862-reb-failed-Partition-not-in-active-nor-passive-set-20121017-233606.tgz
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Rebalance being "hung" in MB-6953 is actually 'it'. It failed (you can see that in logs) but termination code is waiting until all movers die. But one mover is blocked waiting for index update on slow node and does not pay attention to exit signal from parent. Fix is coming.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Rebalance being "hung" in MB-6953 is actually 'it'. It failed (you can see that in logs) but termination code is waiting until all movers die. But one mover is blocked waiting for index update on slow node and does not pay attention to exit signal from parent. Fix is coming.

          People

          • Assignee:
            alkondratenko Aleksey Kondratenko (Inactive)
            Reporter:
            thuan Thuan Nguyen
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes