Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6550

[longevity] Rebalance hang after failover and remove node because of the memory leak on a couple of nodes

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 2.0-beta
    • 2.0-beta
    • couchbase-bucket
    • Security Level: Public
    • centos 6.2 64bit

    Description

      Cluster information:

      • 11 centos 6.2 64bit server with 4 cores CPU
      • Each server has 10 GB RAM and 150 GB disk.
      • 8 GB RAM for couchbase server at each node (80% total system memmories)
      • Disk format ext3 on both data and root
      • Each server has its own drive, no disk sharing with other server.
      • Load 9 million items to both buckets
      • Cluster has 2 buckets, default (3GB) and saslbucket (3GB)
      • Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
      • Add one more doc d2 with 2 views to default bucket
      • Start cluster with 10 nodes installed couchbase server 2.0.0-1663
        10.3.121.13
        10.3.121.14
        10.3.121.15
        10.3.121.16
        10.3.121.17
        10.3.121.20
        10.3.121.22
        10.3.121.24
        10.3.121.25
        10.3.121.23
      • Data path /data
      • View path /data
      • The last run, I do swap rebalance remove node 13 and add node 26.
      • Then node 26 failed due to physical failure. I failover node 26 and rebalance.
      • Rebalance failed with known issue MB-6497 at the end of rebalance saslbucket
      • Node 22 down due to run out of disk space. Failover node 22.
      • Remove node 13. Start rebalance from 19:26:35 - Wed Sep 5, 2012

      Bucket "default" rebalance does not seem to be swap rebalance ns_vbucket_mover000 ns_1@10.3.121.14 19:26:35 - Wed Sep 5, 2012

      Rebalance hang until now Thu Sep 6 19:25:29 PDT 2012

      CPU and beam stats

      10.3.121.15
      Vm: 2796m Rm: 613m CPU: 13.7 beam.smp
      Vm: 6091m Rm: 4.2g CPU: 9.8 memcached
      10.3.121.13
      Vm: 1845m Rm: 338m CPU: 9.9 beam.smp
      Vm: 1230m Rm: 1.0g CPU: 2.0 memcached
      10.3.121.23
      Vm: 2443m Rm: 652m CPU: 9.8 beam.smp
      Vm: 4969m Rm: 3.4g CPU: 7.9 memcached
      10.3.121.24
      Vm: 3304m Rm: 907m CPU: 19.4 beam.smp
      Vm: 5440m Rm: 4.0g CPU: 3.9 memcached
      10.3.121.14
      Vm: 3462m Rm: 665m CPU: 30.7 beam.smp
      Vm: 6329m Rm: 4.1g CPU: 5.1 memcached
      10.3.121.16
      Vm: 2702m Rm: 642m CPU: 13.2 beam.smp
      Vm: 4845m Rm: 3.5g CPU: 5.0 memcached
      10.3.121.17
      Vm: 4498m Rm: 1.4g CPU: 91.2 beam.smp
      Vm: 5359m Rm: 3.6g CPU: 1.7 memcached
      10.3.121.20
      Vm: 3793m Rm: 1.0g CPU: 11.7 beam.smp
      Vm: 5356m Rm: 3.7g CPU: 1.7 memcached

      Swap stats in MB
      Total Used Free
      10.3.121.15
      Swap: 5199 1815 3384
      10.3.121.13
      Swap: 5199 10 5189
      10.3.121.22
      Swap: 5199 15 5184
      10.3.121.14
      Swap: 5199 2503 2696
      10.3.121.23
      Swap: 5199 1037 4162
      10.3.121.24
      Swap: 5199 1543 3656
      10.3.121.17
      Swap: 5199 2156 3043
      10.3.121.16
      Swap: 5199 1156 4043
      10.3.121.20
      Swap: 5199 1949 3250

      Link to diags of all nodes
      https://s3.amazonaws.com/packages.couchbase/diag-logs/orange/201209/9nodes-1663-reb-hang-20120906.tgz

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            chiyoung Chiyoung Seo (Inactive)
            thuan Thuan Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty