Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6770

[system test] disk write commit failed during warmup

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 2.0
    • 2.0
    • couchbase-bucket
    • Security Level: Public
    • centos 6.2 64bit build 2.0.0-1777

    Description

      Cluster information:

      • 8 centos 6.2 64bit server with 4 cores CPU
      • Each server has 32 GB RAM and 400 GB SSD disk.
      • 24.8 GB RAM for couchbase server at each node
      • SSD disk format ext4 on /data
      • Each server has its own SSD drive, no disk sharing with other server.
      • Create cluster with 6 nodes installed couchbase server 2.0.0-1777
      • Cluster has 2 buckets, default (12GB) and saslbucket (12GB).
      • Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)

      10.6.2.37
      10.6.2.38
      10.6.2.39
      10.6.2.40
      10.6.2.42
      10.6.2.43

      • Load 18 million items to both bucket. Each key has size from 512 bytes to 1024 bytes
      • Queries all 4 views from 2 docs

      10.6.2.44
      10.6.2.45

      • Do swap rebalance, add node44, 45 and remove node 39, 40 . Rebalance start to rebalance saslbucket
      • Rebalance hang. Then I tried to stop rebalance by shutdown couchbase server on node 45 and node 37.
      • During warmup, there are many popup error saying "Write Commit Failure. Disk write failed for item in Bucket "saslbucket" on node 10.6.2.45." and node 37
      • I also see on this error on default bucket (not rebalance yet)
        Write Commit Failure. Disk write failed for item in Bucket "default" on node 10.6.2.43. (repeated 19 times)
        Write Commit Failure. Disk write failed for item in Bucket "default" on node 10.6.2.42.

      Thu Sep 27 18:04:12.339644 PDT 3: Trying to connect to mccouch: "localhost:11213"
      Thu Sep 27 18:04:12.340980 PDT 3: Connected to mccouch: "localhost:11213"
      Thu Sep 27 18:04:12.356072 PDT 3: Extension support isn't implemented in this version of bucket_engine
      Thu Sep 27 18:04:12.358478 PDT 3: Failed to load mutation log, falling back to key dump
      Thu Sep 27 18:04:16.138683 PDT 3: metadata loaded in 3790 ms
      Thu Sep 27 18:04:27.107375 PDT 3: warmup completed in 14 s
      Thu Sep 27 18:11:12.045850 PDT 3: Warning: couchstore_open_db failed, name=/data/saslbucket/23.couch.1 option=0 rev=1 retried=2 error=no such file [none]
      Thu Sep 27 18:11:12.045915 PDT 3: Warning: failed to open database, vbucketId = 23 fileRev = 1 numDocs = 7460
      Thu Sep 27 18:11:12.045933 PDT 3: Warning: commit failed, cannot save CouchDB docs for vbucket = 23 rev = 1
      Thu Sep 27 18:11:12.045964 PDT 3: Fatal error in persisting SET ``0-00121344a46d01ce'' on vb 23!!! Requeue it...
      Thu Sep 27 18:11:12.045991 PDT 3: Fatal error in persisting SET ``0-00269625a3507145'' on vb 23!!! Requeue it...
      Thu Sep 27 18:11:12.046007 PDT 3: Fatal error in persisting SET ``0-003ecbc258a7c34d'' on vb 23!!! Requeue it...
      Thu Sep 27 18:11:12.046023 PDT 3: Fatal error in persisting SET ``0-003ef2a0296ef6f5'' on vb 23!!! Requeue it...
      Thu Sep 27 18:11:12.046040 PDT 3: Fatal error in persisting SET ``0-0040dd9ab132b92d'' on vb 23!!! Requeue it...
      Thu Sep 27 18:11:12.046070 PDT 3: Fatal error in persisting SET ``0-0048a96c38e2fca4'' on vb 23!!! Requeue it...
      Thu Sep 27 18:11:12.046099 PDT 3: Fatal error in persisting SET ``0-004b9ef9f133455f'' on vb 23!!! Requeue it...
      Thu Sep 27 18:11:12.046119 PDT 3: Fatal error in persisting SET ``0-004f3f81045ab9c1'' on vb 23!!! Requeue it...
      Thu Sep 27 18:11:12.046141 PDT 3: Fatal error in persisting SET ``0-0067a77180cb6614'' on vb 23!!! Requeue it...

      Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/orange/2_0_0/201209/8nodes-col-info-1777-disk-write-failed-during-warmup-20120927-184620.tgz

      Link to memcached log of all nodes https://s3.amazonaws.com/packages.couchbase/memcached/orange/2_0_0/201209/8nodes-memcached-log-1777-disk-write-failed-during-warmup-20120927.tgz

      Cluster is in failed state now.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            chiyoung Chiyoung Seo (Inactive)
            thuan Thuan Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty