Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5858

Rebalance hangs with exception "replicator_died"

    XMLWordPrintable

Details

    Description

      Rebalance hangs with exception "replicator_died"

      The failure happens with the test on a 5 node cluster(http://qa.hq.northscale.net/job/centos-failover-tests/48/):
      ./testrunner -i /tmp/failover.ini get-logs=True -t failovertests.FailoverTests.test_failover_normal,replica=2,load_ratio=10

      In the diagnostics from the master node 10.1.3.114, the rebalance starts at [2012-07-09 17:46:03]

      [user:info] [2012-07-09 17:46:03] [ns_1@10.1.3.114:<0.1951.0>:ns_orchestrator:idle:399] Starting rebalance, KeepNodes = ['ns_1@10.1.3.114','ns_1@10.1.3.118',
      'ns_1@10.1.3.116'], EjectNodes = []

      and at [2012-07-09 18:02:44] it gets the below exception and crash reports:

      [ns_server:error] [2012-07-09 18:02:44] [ns_1@10.1.3.114:<0.28201.4>:ns_replicas_builder:build_replicas_main:109] Got premature exit from one of ebucketmigrators: {'EXIT',<19197.13149.3>,
      {badmatch,

      {error,timeout}}}

      [error_logger:error] [2012-07-09 18:02:44] [ns_1@10.1.3.114:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ebucketmigrator_srv:init/1
      pid: <19197.13149.3>
      registered_name: []
      exception error: no match of right hand side value {error,timeout}

      in function mc_client_binary:cmd_binary_vocal_recv/5
      in call from mc_client_binary:set_vbucket/3
      in call from ebucketmigrator_srv:'init/1-lc$^0/1-0'/3
      in call from ebucketmigrator_srv:init/1
      ancestors: [<0.28201.4>,<0.28200.4>,<0.19639.4>,<0.19596.4>]
      messages: []
      links: Port<19197.299960>,<0.28201.4>,#Port<19197.299959>
      dictionary: []
      trap_exit: false
      status: running
      heap_size: 610
      stack_size: 24
      reductions: 1757
      neighbours:

      =========================CRASH REPORT=========================
      crasher:
      initial call: erlang:apply/2
      pid: <0.28201.4>
      registered_name: []
      exception exit: {replicator_died,
      {'EXIT',<19197.13149.3>,{badmatch,{error,timeout}}}}
      in function ns_replicas_builder:'build_replicas_main/6-fun-0'/2
      in call from ns_replicas_builder:observe_wait_all_done_tail/5
      in call from ns_replicas_builder:observe_wait_all_done/5
      in call from ns_replicas_builder:'build_replicas_main/6-fun-1'/8
      in call from ns_replicas_builder:try_with_maybe_ignorant_after/2
      in call from ns_replicas_builder:build_replicas_main/6
      ancestors: [<0.28200.4>,<0.19639.4>,<0.19596.4>]
      messages: []
      links: [<0.28200.4>,<0.28204.4>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 2584
      stack_size: 24
      reductions: 241661

      Diagnostics are attached. The jenkins cluster is in the same state if required for diagnosis.

      Attachments

        1. 10.1.3.114-8091-diag.txt.gz
          6.11 MB
        2. 10.1.3.115-8091-diag.txt.gz
          1.50 MB
        3. 10.1.3.116-8091-diag.txt.gz
          6.73 MB
        4. 10.1.3.117-8091-diag.txt.gz
          1.90 MB
        5. 10.1.3.118-8091-diag.txt.gz
          5.02 MB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            alkondratenko Aleksey Kondratenko (Inactive)
            deepkaran.salooja Deepkaran Salooja
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty