Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
2.0
-
Security Level: Public
-
centos 6.2 64 bit build 2.0.0-1781
-
Release Note
Description
Cluster information:
- 8 centos 6.2 64bit server with 4 cores CPU
- Each server has 32 GB RAM and 400 GB SSD disk.
- 24.8 GB RAM for couchbase server at each node
- SSD disk format ext4 on /data
- Each server has its own SSD drive, no disk sharing with other server.
- Create cluster with 6 nodes installed couchbase server 2.0.0-1781
- Cluster has 2 buckets, default (12GB) and saslbucket (12GB).
- Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
- Enable consistent view on cluster (by default)
10.6.2.37
10.6.2.38
10.6.2.44
10.6.2.45
10.6.2.42
10.6.2.43
- Load 14 million items to both bucket. Each key has size from 512 bytes to 1024 bytes
- Queries all 4 views from 2 docs
10.6.2.39
10.6.2.40
- Data path /data
- View path /data
Manifest info from build 1781
http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_64_2.0.0-1781-rel.rpm.manifest.xml
-
- Add 2 nodes: 39 and 40 and rebalance. During rebalance, reboot node 42 and 43. Rebalance failed as expected.
- After node finished warmup, rebalance again. Rebalance failed with bug
MB-6490on node 44. - Failover node 44 and rebalance
-
- Monitor disk size of all nodes, I see node 45 and 37 having biggest disk size
Thuans-MacBook-Pro:testrunner thuan$ python scripts/ssh.py -i ../ini/10-c-long.ini "df -kh | grep data"
10.6.2.44
394G 468M 394G 1% /data
10.6.2.39
394G 44G 331G 12% /data
10.6.2.42
394G 69G 326G 18% /data
10.6.2.40
394G 55G 319G 15% /data
10.6.2.45
394G 346G 48G 88% /data
10.6.2.43
394G 110G 284G 28% /data
10.6.2.37
394G 299G 76G 80% /data
10.6.2.38
394G 184G 191G 50% /data
- Then check on index file size of all nodes, I see file size of replica index of node 45 is too big, 114GB compare to other nodes.
total 2.4G
rw-rr-. 1 couchbase couchbase 686M Oct 2 14:36 main_ae72f9d24da5d9368eed3fb3519c1687.view.21
rw-rr-. 1 couchbase couchbase 1.7G Oct 2 14:38 replica_ae72f9d24da5d9368eed3fb3519c1687.view.57
drwxr-xr-x. 2 couchbase couchbase 4.0K Oct 1 14:59 tmp_ae72f9d24da5d9368eed3fb3519c1687_main
drwxr-xr-x. 2 couchbase couchbase 4.0K Oct 1 11:03 tmp_ae72f9d24da5d9368eed3fb3519c1687_replica
10.6.2.43
total 2.1G
rw-rr-. 1 couchbase couchbase 674M Oct 2 14:33 main_ae72f9d24da5d9368eed3fb3519c1687.view.16
rw-rr-. 1 couchbase couchbase 1.4G Oct 2 14:37 replica_ae72f9d24da5d9368eed3fb3519c1687.view.22
drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:20 tmp_ae72f9d24da5d9368eed3fb3519c1687_main
drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:20 tmp_ae72f9d24da5d9368eed3fb3519c1687_replica
10.6.2.39
total 2.4G
rw-rr-. 1 couchbase couchbase 702M Oct 2 14:36 main_ae72f9d24da5d9368eed3fb3519c1687.view.10
rw-rr-. 1 couchbase couchbase 1.8G Oct 2 14:40 replica_ae72f9d24da5d9368eed3fb3519c1687.view.52
drwxr-xr-x. 2 couchbase couchbase 4.0K Oct 1 14:44 tmp_ae72f9d24da5d9368eed3fb3519c1687_main
drwxr-xr-x. 2 couchbase couchbase 4.0K Oct 1 11:03 tmp_ae72f9d24da5d9368eed3fb3519c1687_replica
10.6.2.45
total 132G
rw-rr-. 1 couchbase couchbase 18G Oct 2 14:40 main_ae72f9d24da5d9368eed3fb3519c1687.view.13
rw-rr-. 1 couchbase couchbase 114G Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.72
rw-rr-. 1 couchbase couchbase 4.0M Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.72.compact
rw-rr-. 1 couchbase couchbase 0 Oct 2 14:40 replica_ae72f9d24da5d9368eed3fb3519c1687.view.log
drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 30 01:51 tmp_ae72f9d24da5d9368eed3fb3519c1687_main
drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 19:29 tmp_ae72f9d24da5d9368eed3fb3519c1687_replica
10.6.2.42
total 12G
rw-rr-. 1 couchbase couchbase 620M Oct 2 14:41 main_ae72f9d24da5d9368eed3fb3519c1687.view.18
rw-rr-. 1 couchbase couchbase 11G Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.18
rw-rr-. 1 couchbase couchbase 27M Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.18.compact
rw-rr-. 1 couchbase couchbase 0 Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.log
drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:20 tmp_ae72f9d24da5d9368eed3fb3519c1687_main
drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:20 tmp_ae72f9d24da5d9368eed3fb3519c1687_replica
10.6.2.38
total 2.1G
rw-rr-. 1 couchbase couchbase 682M Oct 2 14:38 main_ae72f9d24da5d9368eed3fb3519c1687.view.11
rw-rr-. 1 couchbase couchbase 1.4G Oct 2 14:40 replica_ae72f9d24da5d9368eed3fb3519c1687.view.12
drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:20 tmp_ae72f9d24da5d9368eed3fb3519c1687_main
drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:21 tmp_ae72f9d24da5d9368eed3fb3519c1687_replica
10.6.2.37
total 67G
rw-rr-. 1 couchbase couchbase 4.1G Oct 2 14:36 main_ae72f9d24da5d9368eed3fb3519c1687.view.12
rw-rr-. 1 couchbase couchbase 63G Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.16
rw-rr-. 1 couchbase couchbase 9.8M Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.16.compact
rw-rr-. 1 couchbase couchbase 0 Oct 2 14:41 replica_ae72f9d24da5d9368eed3fb3519c1687.view.log
drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:20 tmp_ae72f9d24da5d9368eed3fb3519c1687_main
drwxr-xr-x. 2 couchbase couchbase 4.0K Sep 29 13:20 tmp_ae72f9d24da5d9368eed3fb3519c1687_replica
- I go to couchdb log of node 45 and see index compaction started at Tue Oct 02 2012 13:21:14 and stop at 2 percent (see in log couchdb.9 and couchdb.10)
At log couchdb.9
[couchdb:debug,2012-10-02T13:21:16.593,ns_1@10.6.2.45:couch_task_status:couch_log:debug:36]New task status for <0.11726.933>: [
,
{indexer_type,replica},
{original_target,{[{type,bucket}]}},
{progress,0},
{set,<<"saslbucket">>},
{signature, <<"ae72f9d24da5d9368eed3fb3519c1687">>},
{started_on,1349209274},
{total_changes,6339225},
{trigger_type,scheduled},
{type,view_compaction},
{updated_on,1349209276}]
[couchdb:debug,2012-10-02T13:21:23.124,ns_1@10.6.2.45:couch_task_status:couch_log:debug:36]New task status for <0.11726.933>: [{changes_done,40000},
{design_documents,[<<"_design/d11">>]}
,
{original_target,{[{type,bucket}]}},
{progress,0},
{set,<<"saslbucket">>},
{signature, <<"ae72f9d24da5d9368eed3fb3519c1687">>},
{started_on,1349209274},
{total_changes,6339225},
{trigger_type,scheduled},
{type,view_compaction},
{updated_on,1349209283}]
[couchdb:debug,2012-10-02T13:21:30.203,ns_1@10.6.2.45:couch_task_status:couch_log:debug:36]New task status for <0.11726.933>: [{changes_done,70000},
{design_documents,[<<"_design/d11">>]},
{indexer_type,replica}
,
{original_target,{[
{progress,1},
{set,<<"saslbucket">>},
{signature, <<"ae72f9d24da5d9368eed3fb3519c1687">>},
{started_on,1349209274},
{total_changes,6339225},
{trigger_type,scheduled},
{type,view_compaction},
{updated_on,1349209290}]
[couchdb:debug,2012-10-02T13:21:42.138,ns_1@10.6.2.45:couch_task_status:couch_log:debug:36]New task status for <0.11726.933>: [{changes_done,130000},
{design_documents,[<<"_design/d11">>]},
{indexer_type,replica},
{original_target,{[{type,bucket}
]}},
{set,<<"saslbucket">>},
{signature, <<"ae72f9d24da5d9368eed3fb3519c1687">>},
{started_on,1349209274},
{total_changes,6339225},
{trigger_type,scheduled},
{type,view_compaction},
{updated_on,1349209302}]
[couchdb:debug,2012-10-02T13:21:52.828,ns_1@10.6.2.45:couch_task_status:couch_log:debug:36]New task status for <0.11726.933>: [{changes_done,140000},
{design_documents,[<<"_design/d11">>]},
{indexer_type,replica},
{original_target,{[{type,bucket}]}},
{progress,2}
,
{signature, <<"ae72f9d24da5d9368eed3fb3519c1687">>},
{started_on,1349209274},
{total_changes,6339225},
{trigger_type,scheduled},
{type,view_compaction},
{updated_on,1349209312}]
** At log couchdb.10
[couchdb:debug,2012-10-02T13:22:13.490,ns_1@10.6.2.45:couch_task_status:couch_log:debug:36]New task status for <0.11726.933>: [{changes_done,150000},
{design_documents,[<<"_design/d11">>]},
{indexer_type,replica},
{original_target,{[{type,bucket}]}},
{progress,2},
{set,<<"saslbucket">>}
,
{started_on,1349209274},
{total_changes,6339225},
{trigger_type,scheduled},
{type,view_compaction},
{updated_on,1349209333}]
[couchdb:debug,2012-10-02T13:22:34.317,ns_1@10.6.2.45:couch_task_status:couch_log:debug:36]New task status for <0.11726.933>: [{changes_done,160000},
{design_documents,[<<"_design/d11">>]},
{indexer_type,replica},
{original_target,{[{type,bucket}]}},
{progress,2},
{set,<<"saslbucket">>},
{signature, <<"ae72f9d24da5d9368eed3fb3519c1687">>}
,
{total_changes,6339225},
{trigger_type,scheduled},
{type,view_compaction},
{updated_on,1349209354}]
[root@localhost logs]# grep "started_on,1349209274}" couchdb.*
couchdb.10: {started_on,1349209274}
,
couchdb.10:
couchdb.9: {started_on,1349209274}
,
couchdb.9:
couchdb.9: {started_on,1349209274}
,
couchdb.9:
couchdb.9: {started_on,1349209274}
,
couchdb.9:
,
[root@localhost logs]# ls -latrh | grep couchdb
rw-rr-. 1 couchbase couchbase 13 Sep 28 20:12 couchdb.siz
rw-rr-. 1 couchbase couchbase 10M Oct 2 05:06 couchdb.13
rw-rr-. 1 couchbase couchbase 10M Oct 2 05:36 couchdb.14
rw-rr-. 1 couchbase couchbase 10M Oct 2 06:04 couchdb.15
rw-rr-. 1 couchbase couchbase 10M Oct 2 06:33 couchdb.16
rw-rr-. 1 couchbase couchbase 10M Oct 2 06:59 couchdb.17
rw-rr-. 1 couchbase couchbase 10M Oct 2 07:29 couchdb.18
rw-rr-. 1 couchbase couchbase 10M Oct 2 07:55 couchdb.19
rw-rr-. 1 couchbase couchbase 10M Oct 2 08:22 couchdb.20
rw-rr-. 1 couchbase couchbase 10M Oct 2 09:15 couchdb.1
rw-rr-. 1 couchbase couchbase 10M Oct 2 10:13 couchdb.2
rw-rr-. 1 couchbase couchbase 10M Oct 2 10:36 couchdb.3
rw-rr-. 1 couchbase couchbase 10M Oct 2 11:04 couchdb.4
rw-rr-. 1 couchbase couchbase 10M Oct 2 11:36 couchdb.5
rw-rr-. 1 couchbase couchbase 10M Oct 2 12:05 couchdb.6
rw-rr-. 1 couchbase couchbase 10M Oct 2 12:28 couchdb.7
rw-rr-. 1 couchbase couchbase 10M Oct 2 12:55 couchdb.8
rw-rr-. 1 couchbase couchbase 10M Oct 2 13:21 couchdb.9
rw-rr-. 1 couchbase couchbase 10M Oct 2 13:46 couchdb.10
rw-rr-. 1 couchbase couchbase 170 Oct 2 14:12 couchdb.idx
rw-rr-. 1 couchbase couchbase 10M Oct 2 14:12 couchdb.11
rw-rr-. 1 couchbase couchbase 9.7M Oct 2 14:37 couchdb.12
- Then compaction restart at Tue Oct 02 2012 13:47:05 and stop at 2 percent at in log couchdb.10 and couchdb.11
[root@localhost logs]# grep "started_on,1349210733" couchdb.*
couchdb.10:
couchdb.10: {started_on,1349210733}
,
couchdb.10:
couchdb.10: {started_on,1349210733}
,
couchdb.10:
couchdb.11: {started_on,1349210733}
,
couchdb.11:
couchdb.11: {started_on,1349210733}
,
couchdb.11:
,
Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201210/8nodes-col-info-1781-rebalance-hang-20121002-114333.tgz