Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6799

[RN 2.0.1][system test] view index disk size grows too big during rebalance

    Details

    • Flagged:
      Release Note

      Description

      Cluster information:

      • 8 centos 6.2 64bit server with 4 cores CPU
      • Each server has 32 GB RAM and 400 GB SSD disk.
      • 24.8 GB RAM for couchbase server at each node
      • SSD disk format ext4 on /data
      • Each server has its own SSD drive, no disk sharing with other server.
      • Create cluster with 6 nodes installed couchbase server 2.0.0-1781
      • Cluster has 2 buckets, default (12GB) and saslbucket (12GB).
      • Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
      • Enable consistent view on cluster (by default)

      10.6.2.37
      10.6.2.38
      10.6.2.44
      10.6.2.45
      10.6.2.42
      10.6.2.43

      • Load 14 million items to both bucket. Each key has size from 512 bytes to 1024 bytes
      • Queries all 4 views from 2 docs

      10.6.2.39
      10.6.2.40

      • Data path /data
      • View path /data

      Manifest info from build 1781
      http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1781-rel.rpm.manifest.xml

        • Add 2 nodes: 39 and 40 and rebalance. During rebalance, reboot node 42 and 43. Rebalance failed as expected.
      • After node finished warmup, rebalance again. Rebalance failed with bug MB-6490 on node 44.
      • Failover node 44 and rebalance
        • Monitor disk size of all nodes, I see node 45 and 37 having biggest disk size

      Thuans-MacBook-Pro:testrunner thuan$ python scripts/ssh.py -i ../ini/10-c-long.ini "df -kh | grep data"
      10.6.2.44
      394G 468M 394G 1% /data
      10.6.2.39
      394G 44G 331G 12% /data
      10.6.2.42
      394G 69G 326G 18% /data
      10.6.2.40
      394G 55G 319G 15% /data
      10.6.2.45
      394G 346G 48G 88% /data
      10.6.2.43
      394G 110G 284G 28% /data
      10.6.2.37
      394G 299G 76G 80% /data
      10.6.2.38
      394G 184G 191G 50% /data

      • Then check on index file size of all nodes, I see file size of replica index of node 45 is too big, 114GB compare to other nodes.

      total 2.4G
      rw-rr-. 1 couchbase couchbase 686M Oct 2 14:36 main_ae72f9d24da5d9368eed3fb3519c1687.view.21
      rw-rr-. 1 couchbase couchbase 1.7G Oct 2 14:38 replica_ae72f9d24da5d9368eed3fb3519c1687.view.57
      drwxr-xr-x. 2 couchbase couchbase 4.0K Oct 1 14:59 tmp_ae72f9d24da5d9368eed3fb3519c1687_main
      drwxr-xr-x. 2 couchbase couchbase 4.0K Oct 1 11:03 tmp_ae72f9d24da5d9368eed3fb3519c1687_replica

      10.6.2.43
      total 2.1G
      rw-rr-. 1 couchbase couchbase 674M Oct 2 14:33 main_ae72f9d24da5d9368eed3fb3519c1687.view.16
      rw-rr-. 1 couchbase couchbase 1.4G Oct 2 14:37 replica_ae72f9d24da5d9368eed3fb3519c1687.view.22
      drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:20 tmp_ae72f9d24da5d9368eed3fb3519c1687_main
      drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:20 tmp_ae72f9d24da5d9368eed3fb3519c1687_replica

      10.6.2.39
      total 2.4G
      rw-rr-. 1 couchbase couchbase 702M Oct 2 14:36 main_ae72f9d24da5d9368eed3fb3519c1687.view.10
      rw-rr-. 1 couchbase couchbase 1.8G Oct 2 14:40 replica_ae72f9d24da5d9368eed3fb3519c1687.view.52
      drwxr-xr-x. 2 couchbase couchbase 4.0K Oct 1 14:44 tmp_ae72f9d24da5d9368eed3fb3519c1687_main
      drwxr-xr-x. 2 couchbase couchbase 4.0K Oct 1 11:03 tmp_ae72f9d24da5d9368eed3fb3519c1687_replica

      10.6.2.45
      total 132G
      rw-rr-. 1 couchbase couchbase 18G Oct 2 14:40 main_ae72f9d24da5d9368eed3fb3519c1687.view.13
      rw-rr-. 1 couchbase couchbase 114G Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.72
      rw-rr-. 1 couchbase couchbase 4.0M Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.72.compact
      rw-rr-. 1 couchbase couchbase 0 Oct 2 14:40 replica_ae72f9d24da5d9368eed3fb3519c1687.view.log
      drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 30 01:51 tmp_ae72f9d24da5d9368eed3fb3519c1687_main
      drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 19:29 tmp_ae72f9d24da5d9368eed3fb3519c1687_replica

      10.6.2.42
      total 12G
      rw-rr-. 1 couchbase couchbase 620M Oct 2 14:41 main_ae72f9d24da5d9368eed3fb3519c1687.view.18
      rw-rr-. 1 couchbase couchbase 11G Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.18
      rw-rr-. 1 couchbase couchbase 27M Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.18.compact
      rw-rr-. 1 couchbase couchbase 0 Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.log
      drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:20 tmp_ae72f9d24da5d9368eed3fb3519c1687_main
      drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:20 tmp_ae72f9d24da5d9368eed3fb3519c1687_replica

      10.6.2.38
      total 2.1G
      rw-rr-. 1 couchbase couchbase 682M Oct 2 14:38 main_ae72f9d24da5d9368eed3fb3519c1687.view.11
      rw-rr-. 1 couchbase couchbase 1.4G Oct 2 14:40 replica_ae72f9d24da5d9368eed3fb3519c1687.view.12
      drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:20 tmp_ae72f9d24da5d9368eed3fb3519c1687_main
      drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:21 tmp_ae72f9d24da5d9368eed3fb3519c1687_replica

      10.6.2.37
      total 67G
      rw-rr-. 1 couchbase couchbase 4.1G Oct 2 14:36 main_ae72f9d24da5d9368eed3fb3519c1687.view.12
      rw-rr-. 1 couchbase couchbase 63G Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.16
      rw-rr-. 1 couchbase couchbase 9.8M Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.16.compact
      rw-rr-. 1 couchbase couchbase 0 Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.log
      drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:20 tmp_ae72f9d24da5d9368eed3fb3519c1687_main
      drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:20 tmp_ae72f9d24da5d9368eed3fb3519c1687_replica

      • I go to couchdb log of node 45 and see index compaction started at Tue Oct 02 2012 13:21:14 and stop at 2 percent (see in log couchdb.9 and couchdb.10)

      At log couchdb.9
      [couchdb:debug,2012-10-02T13:21:16.593,ns_1@10.6.2.45:couch_task_status:couch_log:debug:36]New task status for <0.11726.933>: [

      {changes_done,10000}

      ,

      {design_documents,[<<"_design/d11">>]},
      {indexer_type,replica},
      {original_target,{[{type,bucket}]}},
      {progress,0},
      {set,<<"saslbucket">>},
      {signature, <<"ae72f9d24da5d9368eed3fb3519c1687">>},
      {started_on,1349209274},
      {total_changes,6339225},
      {trigger_type,scheduled},
      {type,view_compaction},
      {updated_on,1349209276}]

      [couchdb:debug,2012-10-02T13:21:23.124,ns_1@10.6.2.45:couch_task_status:couch_log:debug:36]New task status for <0.11726.933>: [{changes_done,40000},
      {design_documents,[<<"_design/d11">>]}

      ,

      {indexer_type,replica},
      {original_target,{[{type,bucket}]}},
      {progress,0},
      {set,<<"saslbucket">>},
      {signature, <<"ae72f9d24da5d9368eed3fb3519c1687">>},
      {started_on,1349209274},
      {total_changes,6339225},
      {trigger_type,scheduled},
      {type,view_compaction},
      {updated_on,1349209283}]

      [couchdb:debug,2012-10-02T13:21:30.203,ns_1@10.6.2.45:couch_task_status:couch_log:debug:36]New task status for <0.11726.933>: [{changes_done,70000},
      {design_documents,[<<"_design/d11">>]},
      {indexer_type,replica}

      ,
      {original_target,{[

      {type,bucket}]}},
      {progress,1},
      {set,<<"saslbucket">>},
      {signature, <<"ae72f9d24da5d9368eed3fb3519c1687">>},
      {started_on,1349209274},
      {total_changes,6339225},
      {trigger_type,scheduled},
      {type,view_compaction},
      {updated_on,1349209290}]

      [couchdb:debug,2012-10-02T13:21:42.138,ns_1@10.6.2.45:couch_task_status:couch_log:debug:36]New task status for <0.11726.933>: [{changes_done,130000},
      {design_documents,[<<"_design/d11">>]},
      {indexer_type,replica},
      {original_target,{[{type,bucket}

      ]}},

      {progress,2},
      {set,<<"saslbucket">>},
      {signature, <<"ae72f9d24da5d9368eed3fb3519c1687">>},
      {started_on,1349209274},
      {total_changes,6339225},
      {trigger_type,scheduled},
      {type,view_compaction},
      {updated_on,1349209302}]

      [couchdb:debug,2012-10-02T13:21:52.828,ns_1@10.6.2.45:couch_task_status:couch_log:debug:36]New task status for <0.11726.933>: [{changes_done,140000},
      {design_documents,[<<"_design/d11">>]},
      {indexer_type,replica},
      {original_target,{[{type,bucket}]}},
      {progress,2}

      ,

      {set,<<"saslbucket">>},
      {signature, <<"ae72f9d24da5d9368eed3fb3519c1687">>},
      {started_on,1349209274},
      {total_changes,6339225},
      {trigger_type,scheduled},
      {type,view_compaction},
      {updated_on,1349209312}]

      ** At log couchdb.10

      [couchdb:debug,2012-10-02T13:22:13.490,ns_1@10.6.2.45:couch_task_status:couch_log:debug:36]New task status for <0.11726.933>: [{changes_done,150000},
      {design_documents,[<<"_design/d11">>]},
      {indexer_type,replica},
      {original_target,{[{type,bucket}]}},
      {progress,2},
      {set,<<"saslbucket">>}

      ,

      {signature, <<"ae72f9d24da5d9368eed3fb3519c1687">>},
      {started_on,1349209274},
      {total_changes,6339225},
      {trigger_type,scheduled},
      {type,view_compaction},
      {updated_on,1349209333}]

      [couchdb:debug,2012-10-02T13:22:34.317,ns_1@10.6.2.45:couch_task_status:couch_log:debug:36]New task status for <0.11726.933>: [{changes_done,160000},
      {design_documents,[<<"_design/d11">>]},
      {indexer_type,replica},
      {original_target,{[{type,bucket}]}},
      {progress,2},
      {set,<<"saslbucket">>},
      {signature, <<"ae72f9d24da5d9368eed3fb3519c1687">>}

      ,

      {started_on,1349209274},
      {total_changes,6339225},
      {trigger_type,scheduled},
      {type,view_compaction},
      {updated_on,1349209354}]



      [root@localhost logs]# grep "started_on,1349209274}" couchdb.*
      couchdb.10: {started_on,1349209274}

      ,
      couchdb.10:

      {started_on,1349209274},
      couchdb.9: {started_on,1349209274}

      ,
      couchdb.9:

      {started_on,1349209274},
      couchdb.9: {started_on,1349209274}

      ,
      couchdb.9:

      {started_on,1349209274},
      couchdb.9: {started_on,1349209274}

      ,
      couchdb.9:

      {started_on,1349209274}

      ,

      [root@localhost logs]# ls -latrh | grep couchdb
      rw-rr-. 1 couchbase couchbase 13 Sep 28 20:12 couchdb.siz
      rw-rr-. 1 couchbase couchbase 10M Oct 2 05:06 couchdb.13
      rw-rr-. 1 couchbase couchbase 10M Oct 2 05:36 couchdb.14
      rw-rr-. 1 couchbase couchbase 10M Oct 2 06:04 couchdb.15
      rw-rr-. 1 couchbase couchbase 10M Oct 2 06:33 couchdb.16
      rw-rr-. 1 couchbase couchbase 10M Oct 2 06:59 couchdb.17
      rw-rr-. 1 couchbase couchbase 10M Oct 2 07:29 couchdb.18
      rw-rr-. 1 couchbase couchbase 10M Oct 2 07:55 couchdb.19
      rw-rr-. 1 couchbase couchbase 10M Oct 2 08:22 couchdb.20
      rw-rr-. 1 couchbase couchbase 10M Oct 2 09:15 couchdb.1
      rw-rr-. 1 couchbase couchbase 10M Oct 2 10:13 couchdb.2
      rw-rr-. 1 couchbase couchbase 10M Oct 2 10:36 couchdb.3
      rw-rr-. 1 couchbase couchbase 10M Oct 2 11:04 couchdb.4
      rw-rr-. 1 couchbase couchbase 10M Oct 2 11:36 couchdb.5
      rw-rr-. 1 couchbase couchbase 10M Oct 2 12:05 couchdb.6
      rw-rr-. 1 couchbase couchbase 10M Oct 2 12:28 couchdb.7
      rw-rr-. 1 couchbase couchbase 10M Oct 2 12:55 couchdb.8
      rw-rr-. 1 couchbase couchbase 10M Oct 2 13:21 couchdb.9
      rw-rr-. 1 couchbase couchbase 10M Oct 2 13:46 couchdb.10
      rw-rr-. 1 couchbase couchbase 170 Oct 2 14:12 couchdb.idx
      rw-rr-. 1 couchbase couchbase 10M Oct 2 14:12 couchdb.11
      rw-rr-. 1 couchbase couchbase 9.7M Oct 2 14:37 couchdb.12

      • Then compaction restart at Tue Oct 02 2012 13:47:05 and stop at 2 percent at in log couchdb.10 and couchdb.11

      [root@localhost logs]# grep "started_on,1349210733" couchdb.*
      couchdb.10:

      {started_on,1349210733},
      couchdb.10: {started_on,1349210733}

      ,
      couchdb.10:

      {started_on,1349210733},
      couchdb.10: {started_on,1349210733}

      ,
      couchdb.10:

      {started_on,1349210733},
      couchdb.11: {started_on,1349210733}

      ,
      couchdb.11:

      {started_on,1349210733},
      couchdb.11: {started_on,1349210733}

      ,
      couchdb.11:

      {started_on,1349210733}

      ,

      Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201210/8nodes-col-info-1781-rebalance-hang-20121002-114333.tgz

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        pavelpaulau Pavel Paulau added a comment -

        From performance tests perspective the issue is fixed.

        Show
        pavelpaulau Pavel Paulau added a comment - From performance tests perspective the issue is fixed.
        Hide
        ketaki Ketaki Gangal added a comment -

        Tested on build 2.0.1-140. View index disk size grows around 5-6 times during rebalance in.

        This is much lower than what was seen earlier though.

        Show
        ketaki Ketaki Gangal added a comment - Tested on build 2.0.1-140. View index disk size grows around 5-6 times during rebalance in. This is much lower than what was seen earlier though.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        more tests are ran and confirmed that disk size growth is now bounded by 6x for index on linux.

        if different behavior is seen in any of the system test runs the ticket will be reopened

        Show
        farshid Farshid Ghods (Inactive) added a comment - more tests are ran and confirmed that disk size growth is now bounded by 6x for index on linux. if different behavior is seen in any of the system test runs the ticket will be reopened
        Hide
        kzeller kzeller added a comment -

        Added to 2.0.1 RN:

        During rebalance, index files were growing to an unnecessarily large size. This has
        been fixed.

        Show
        kzeller kzeller added a comment - Added to 2.0.1 RN: During rebalance, index files were growing to an unnecessarily large size. This has been fixed.
        Hide
        kzeller kzeller added a comment -

        Added to 2.0.1 RN:

        During rebalance, index files were growing to an unnecessarily large size. This has
        been fixed.

        Show
        kzeller kzeller added a comment - Added to 2.0.1 RN: During rebalance, index files were growing to an unnecessarily large size. This has been fixed.

          People

          • Assignee:
            kzeller kzeller
            Reporter:
            thuan Thuan Nguyen
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes