Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7245

[windows] Error Write Commit Failure. Disk write failed appeared during incremental rebalance in and then memcached exits with status 255

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0.1
    • Component/s: None
    • Security Level: Public
    • Labels:
      None
    • Environment:

      Description

      for reproduce use -t rebalancetests.IncrementalRebalanceInTests.test_load,replica=2,delete-ratio=0.6,expiry-ratio=0.2,do-stop=True

      cluster was incrementally rebalanced from 1 to 5 nodes. Rebalance 4->5 nodes is failed. Seeing error to persist DELETE keys:

      2012-11-21 04:34:03.664 ns_orchestrator:4:info:message(ns_1@10.1.3.82) - Starting rebalance, KeepNodes = ['ns_1@10.1.3.83','ns_1@10.1.3.86',
      'ns_1@10.1.3.84','ns_1@10.1.3.85',
      'ns_1@10.1.3.82'], EjectNodes = []

      ...

      2012-11-21 04:38:31.133 menelaus_web_alerts_srv:1:info:message(ns_1@10.1.3.82) - Write Commit Failure. Disk write failed for item in Bucket "bucket-0" on node 10.1.3.82.
      2012-11-21 04:38:36.758 ns_port_server:0:info:message(ns_1@10.1.3.82) - Port server memcached on node 'ns_1@10.1.3.82' exited with status 255. Restarting. Messages: Wed Nov 21 04:38:28.556445 Pacific Standard Time 3: Fatal error in persisting DELETE ``35394-8cb6bb7'' on vb 758!!! Requeue it...
      Wed Nov 21 04:38:28.556445 Pacific Standard Time 3: Fatal error in persisting DELETE ``35937-8cb6bb7'' on vb 758!!! Requeue it...
      Wed Nov 21 04:38:28.556445 Pacific Standard Time 3: Fatal error in persisting DELETE ``35989-ed566c9'' on vb 758!!! Requeue it...
      Wed Nov 21 04:38:28.556445 Pacific Standard Time 3: Fatal error in persisting DELETE ``37231-8cb6bb7'' on vb 758!!! Requeue it...
      Wed Nov 21 04:38:28.556445 Pacific Standard Time 3: Fatal error in persisting DELETE ``3736-ed566c9'' on vb 758!!! Requeue it...
      Wed Nov 21 04:38:28.556445 Pacific Standard Time 3: Fatal error in persisting DELETE ``3788-8cb6bb7'' on vb 758!!! Requeue it...
      Wed Nov 21 04:38:28.556445 Pacific Standard Time 3: Fatal error in persisting DELETE ``37892-8cb6bb7'' on vb 758!!! Requeue it...
      Wed Nov 21 04:38:28.556445 Pacific Standard Time 3: Fatal error in persisting DELETE ``38337-8cb6bb7'' on vb 758!!! Requeue it...
      Wed Nov 21 04:38:28.556445 Pacific Standard Time 3: Fatal error in persisting DELETE ``38994-8cb6bb7'' on vb 758!!! Requeue it...
      Wed Nov 21 04:38:28.556445 Pacific Standard Time 3: Fatal error in persisting DELETE ``39772-8cb6bb7'' on vb 758!!! Requeue it...

      ...

      Wed Nov 21 04:38:36.275195 Pacific Standard Time 3: TAP (Producer) eq_tapq:replication_ns_1@10.1.3.84 - Sending TAP_OPAQUE with command "complete_vb_filter_change" and vbucket 0

      ...

      2012-11-21 04:38:36.836 ns_memcached:4:info:message(ns_1@10.1.3.82) - Control connection to memcached on 'ns_1@10.1.3.82' disconnected: {badmatch,
      {error,
      closed}}
      2012-11-21 04:38:42.586 ns_vbucket_mover:0:critical:message(ns_1@10.1.3.82) - <0.18589.34> exited with {unexpected_exit,
      {'EXIT',<0.18653.34>,
      {{wait_checkpoint_persisted_failed,"bucket-0",759,
      6,
      [{'ns_1@10.1.3.82',
      {'EXIT',
      {badmatch,{error,closed,
      {gen_server,call,
      ['ns_memcached-bucket-0',

      {wait_for_checkpoint_persistence,759,6}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-bucket-0','ns_1@10.1.3.82'}

      ,
      {if_rebalance,<0.22501.33>,
      {wait_checkpoint_persisted,759,6}},
      infinity]}}}}]},
      [

      {ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-', 5}

      ]}}}

      attaching logs

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Show
        iryna iryna added a comment - https://s3.amazonaws.com/bugdb/jira/MB-7245/a243a966/10.1.3.82-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7245/a243a966/10.1.3.83-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7245/a243a966/10.1.3.84-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7245/a243a966/10.1.3.85-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7245/a243a966/10.1.3.86-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7245/a243a966/10.1.3.82-cbcollect.zip https://s3.amazonaws.com/bugdb/jira/MB-7245/a243a966/10.1.3.83-cbcollect.zip https://s3.amazonaws.com/bugdb/jira/MB-7245/a243a966/10.1.3.84-cbcollect.zip https://s3.amazonaws.com/bugdb/jira/MB-7245/a243a966/10.1.3.85-cbcollect.zip https://s3.amazonaws.com/bugdb/jira/MB-7245/a243a966/10.1.3.86-cbcollect.zip
        Hide
        chiyoung Chiyoung Seo added a comment -

        This issue is duplicate of

        http://www.couchbase.com/issues/browse/MB-7246

        which caused the rebalance failures constantly.

        Show
        chiyoung Chiyoung Seo added a comment - This issue is duplicate of http://www.couchbase.com/issues/browse/MB-7246 which caused the rebalance failures constantly.

          People

          • Assignee:
            chiyoung Chiyoung Seo
            Reporter:
            iryna iryna
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes