Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7949

[windows] Rebalance exited with reason wait_checkpoint_persisted_failed due to nodedown {error,closed}

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.0.1
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
    • Environment:
      2.0.1-179-rel

      Description

      Rebalance exited with reason wait_checkpoint_persisted_failed due to nodedown

      {error,closed}

      Test to reproduce
      ./testrunner -i failover.ini -t autofailovertests.AutoFailoverTests.test_invalid_timeouts,replicas=3,keys-count=1000000

      The initial rebalance fails when data load is in progress.
      Looks similar to http://www.couchbase.com/issues/browse/MB-7111, but that's with lot of data + views

      [error_logger:error,2013-03-20T11:11:23.024,ns_1@10.139.30.84:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.29707.38>
      registered_name: []
      exception exit: {unexpected_exit,
      {'EXIT',<0.29721.38>,
      {{wait_checkpoint_persisted_failed,"default",807,2,
      [{'ns_1@10.144.87.52',
      {'EXIT',
      {{nodedown,'ns_1@10.144.87.52'},
      {gen_server,call,
      [

      {'janitor_agent-default','ns_1@10.144.87.52'}

      ,
      {if_rebalance,<0.27396.37>,
      {wait_checkpoint_persisted,807,2}},
      infinity]}}}}]},
      [

      {ns_single_vbucket_mover, '-wait_checkpoint_persisted_many/5-fun-1-',5}

      ]}}}
      in function ns_single_vbucket_mover:spawn_and_wait/1
      in call from ns_single_vbucket_mover:mover_inner/6
      in call from misc:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.27396.37>,<0.26604.37>]
      messages: [{'EXIT',<0.27396.37>,
      {{bulk_set_vbucket_state_failed,
      [{'ns_1@10.139.30.84',
      {'EXIT',
      {{{{unexpected_reason,
      badmatch,{error,closed,
      [

      {mc_binary,quick_stats_recv,3},
      {mc_binary,quick_stats_loop,5},
      {mc_binary,quick_stats,5},
      {mc_client_binary, get_zero_open_checkpoint_vbuckets,3},
      {ebucketmigrator_srv,handle_call,3},
      {gen_server,handle_msg,5},

      [error_logger:error,2013-03-20T11:11:23.243,ns_1@10.139.30.84:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_vbucket_mover:init/1
      pid: <0.27396.37>
      registered_name: []
      exception exit: {{bulk_set_vbucket_state_failed,
      [{'ns_1@10.139.30.84',
      {'EXIT',
      {{{{unexpected_reason,
      badmatch,{error,closed,
      [{mc_binary,quick_stats_recv,3}

      ,

      {mc_binary,quick_stats_loop,5}

      ,

      {mc_binary,quick_stats,5}

      ,

      {mc_client_binary, get_zero_open_checkpoint_vbuckets,3}

      ,

      {ebucketmigrator_srv,handle_call,3}

      ,

      {gen_server,handle_msg,5},
      {proc_lib,init_p_do_apply,3}]}},
      [{misc,executing_on_new_process,1},
      {tap_replication_manager,change_vbucket_filter,4},
      {tap_replication_manager, '-do_set_incoming_replication_map/3-lc$^5/1-5-', 2},
      {tap_replication_manager, do_set_incoming_replication_map,3},
      {tap_replication_manager,handle_call,3},
      {gen_server,handle_msg,5}

      ,

      {proc_lib,init_p_do_apply,3}

      ]},

      Attaching testrunner logs and collect_info.zip

        Issue Links

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          deepkaran.salooja Deepkaran Salooja added a comment -

          collect_info and testrunner logs posted under https://s3.amazonaws.com/bugdb/jira/MB-7949/

          Show
          deepkaran.salooja Deepkaran Salooja added a comment - collect_info and testrunner logs posted under https://s3.amazonaws.com/bugdb/jira/MB-7949/
          Hide
          ketaki Ketaki Gangal added a comment -

          How frequently do we see these errors? does rebalance eventually succeed? Or is there a workaround?

          Show
          ketaki Ketaki Gangal added a comment - How frequently do we see these errors? does rebalance eventually succeed? Or is there a workaround?
          Hide
          siri Sriram Melkote added a comment -

          As this is due to nodedown and we're tracking unexpected node disconnects in MB-7902, I'll close this as a duplicate. If they turn out to be distinct, please reopen.

          Show
          siri Sriram Melkote added a comment - As this is due to nodedown and we're tracking unexpected node disconnects in MB-7902 , I'll close this as a duplicate. If they turn out to be distinct, please reopen.
          Hide
          maria Maria McDuff (Inactive) added a comment -
          Show
          maria Maria McDuff (Inactive) added a comment - MB-7902

            People

            • Assignee:
              siri Sriram Melkote
              Reporter:
              deepkaran.salooja Deepkaran Salooja
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes