Details
-
Bug
-
Resolution: Fixed
-
Major
-
2.0
-
Security Level: Public
-
centos 6.2 64bit build 2.0.0-1777
Description
Cluster information:
- 8 centos 6.2 64bit server with 4 cores CPU
- Each server has 32 GB RAM and 400 GB SSD disk.
- 24.8 GB RAM for couchbase server at each node
- SSD disk format ext4 on /data
- Each server has its own SSD drive, no disk sharing with other server.
- Create cluster with 6 nodes installed couchbase server 2.0.0-1777
- Cluster has 2 buckets, default (12GB) and saslbucket (12GB).
- Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
10.6.2.37
10.6.2.38
10.6.2.39
10.6.2.40
10.6.2.42
10.6.2.43
- Load 18 million items to both bucket. Each key has size from 512 bytes to 1024 bytes
- Queries all 4 views from 2 docs
10.6.2.44
10.6.2.45
- Do swap rebalance, add node44, 45 and remove node 39, 40 . Rebalance start to rebalance saslbucket
- Rebalance hang. Then I tried to stop rebalance by shutdown couchbase server on node 45 and node 37.
- During warmup, there are many popup error saying "Write Commit Failure. Disk write failed for item in Bucket "saslbucket" on node 10.6.2.45." and node 37
- I also see on this error on default bucket (not rebalance yet)
Write Commit Failure. Disk write failed for item in Bucket "default" on node 10.6.2.43. (repeated 19 times)
Write Commit Failure. Disk write failed for item in Bucket "default" on node 10.6.2.42.
Thu Sep 27 18:04:12.339644 PDT 3: Trying to connect to mccouch: "localhost:11213"
Thu Sep 27 18:04:12.340980 PDT 3: Connected to mccouch: "localhost:11213"
Thu Sep 27 18:04:12.356072 PDT 3: Extension support isn't implemented in this version of bucket_engine
Thu Sep 27 18:04:12.358478 PDT 3: Failed to load mutation log, falling back to key dump
Thu Sep 27 18:04:16.138683 PDT 3: metadata loaded in 3790 ms
Thu Sep 27 18:04:27.107375 PDT 3: warmup completed in 14 s
Thu Sep 27 18:11:12.045850 PDT 3: Warning: couchstore_open_db failed, name=/data/saslbucket/23.couch.1 option=0 rev=1 retried=2 error=no such file [none]
Thu Sep 27 18:11:12.045915 PDT 3: Warning: failed to open database, vbucketId = 23 fileRev = 1 numDocs = 7460
Thu Sep 27 18:11:12.045933 PDT 3: Warning: commit failed, cannot save CouchDB docs for vbucket = 23 rev = 1
Thu Sep 27 18:11:12.045964 PDT 3: Fatal error in persisting SET ``0-00121344a46d01ce'' on vb 23!!! Requeue it...
Thu Sep 27 18:11:12.045991 PDT 3: Fatal error in persisting SET ``0-00269625a3507145'' on vb 23!!! Requeue it...
Thu Sep 27 18:11:12.046007 PDT 3: Fatal error in persisting SET ``0-003ecbc258a7c34d'' on vb 23!!! Requeue it...
Thu Sep 27 18:11:12.046023 PDT 3: Fatal error in persisting SET ``0-003ef2a0296ef6f5'' on vb 23!!! Requeue it...
Thu Sep 27 18:11:12.046040 PDT 3: Fatal error in persisting SET ``0-0040dd9ab132b92d'' on vb 23!!! Requeue it...
Thu Sep 27 18:11:12.046070 PDT 3: Fatal error in persisting SET ``0-0048a96c38e2fca4'' on vb 23!!! Requeue it...
Thu Sep 27 18:11:12.046099 PDT 3: Fatal error in persisting SET ``0-004b9ef9f133455f'' on vb 23!!! Requeue it...
Thu Sep 27 18:11:12.046119 PDT 3: Fatal error in persisting SET ``0-004f3f81045ab9c1'' on vb 23!!! Requeue it...
Thu Sep 27 18:11:12.046141 PDT 3: Fatal error in persisting SET ``0-0067a77180cb6614'' on vb 23!!! Requeue it...
Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201209/8nodes-col-info-1777-disk-write-failed-during-warmup-20120927-184620.tgz
Link to memcached log of all nodes https://s3.amazonaws.com/packages.couchbase/memcached/orange/2_0_0/201209/8nodes-memcached-log-1777-disk-write-failed-during-warmup-20120927.tgz
Cluster is in failed state now.