Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5343

memcached dropped connections: Rebalance failed due to replicator_died: exited (ns_single_vbucket_mover)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 1.8.1-release-candidate
    • Fix Version/s: 2.0
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Environment:
      Ubuntu 64 bit
      181-831-rel

      Description

      Failing test is:-
      rebalancetests.RebalanceInOutWithParallelLoad.test_load,get-logs:True,replica:2,num_nodes:7

      [ns_server:info] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.15105.17>:ns_replicas_builder:kill_a_bunch_of_tap_names:209] Killed the following tap names on 'ns_1@10.1.3.109': [<<"replication_building_62_'ns_1@10.1.3.112'">>]
      [ns_server:info] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.15104.17>:ns_single_vbucket_mover:mover_inner:88] Got exit message (parent is <0.13796.17>). Exiting...
      {'EXIT',<0.15105.17>,{replicator_died,

      {'EXIT',<16541.21868.10>,normal}}}
      [ns_server:error] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.15104.17>:ns_replicas_builder:sync_shutdown_many:147] Shutdown of the following failed: [{<0.15105.17>,
      {replicator_died,
      {'EXIT',<16541.21868.10>,normal}

      }}]
      [error_logger:error] [2012-05-21 8:17:31] [ns_1@10.1.3.109:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: erlang:apply/2
      pid: <0.15105.17>
      registered_name: []
      exception exit: {replicator_died,{'EXIT',<16541.21868.10>,normal}}
      in function ns_replicas_builder:'build_replicas_main/6-fun-0'/1
      in call from ns_replicas_builder:observe_wait_all_done_tail/5
      in call from ns_replicas_builder:observe_wait_all_done/5
      in call from ns_replicas_builder:'build_replicas_main/6-fun-1'/8
      in call from ns_replicas_builder:try_with_maybe_ignorant_after/2
      in call from ns_replicas_builder:build_replicas_main/6
      ancestors: [<0.15104.17>,<0.13796.17>,<0.13763.17>]
      messages: [

      {'EXIT',<16541.21868.10>,normal}]
      links: [<0.15104.17>]
      dictionary: []
      trap_exit: true
      status: running
      heap_size: 121393
      stack_size: 24
      reductions: 12423
      neighbours:

      [ns_server:error] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.15104.17>:ns_replicas_builder:try_with_maybe_ignorant_after:68] Eating exception from ignorant after-block:
      {error,{badmatch,[{<0.15105.17>,
      {replicator_died,{'EXIT',<16541.21868.10>,normal}

      }}]},
      [

      {ns_replicas_builder,sync_shutdown_many,1}

      ,

      {ns_replicas_builder,try_with_maybe_ignorant_after,2}

      ,

      {ns_single_vbucket_mover,mover,6}

      ,

      {proc_lib,init_p_do_apply,3}

      ]}
      [rebalance:error] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.13796.17>:ns_vbucket_mover:handle_info:158] <0.15104.17> exited with {exited,
      {'EXIT',<0.15105.17>,
      {replicator_died,
      {'EXIT',<16541.21868.10>,normal}}}}
      [ns_server:info] [2012-05-21 8:17:31] [ns_1@10.1.3.109:<0.9348.1>:ns_port_server:log:161] memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Schedule the backfill for vbucket 61
      memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_VBUCKET_SET with vbucket 61 and state "pending"
      memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 61
      memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Backfill is completed with VBuckets 61,
      memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_OPAQUE with command "close_backfill" and vbucket 61
      memcached<0.9348.1>: Vbucket <61> is going dead.
      memcached<0.9348.1>: TAP (Producer) eq_tapq:rebalance_61 - Sending TAP_VBUCKET_SET with vbucket 61 and state "active"
      memcached<0.9348.1>: TAP takeover is completed. Disconnecting tap stream <eq_tapq:rebalance_61>
      memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Schedule the backfill for vbucket 62
      memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
      memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
      memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 62

      [error_logger:error] [2012-05-21 8:17:31] [ns_1@10.1.3.109:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_single_vbucket_mover:mover/6
      pid: <0.15104.17>
      registered_name: []
      exception exit: {exited,
      {'EXIT',<0.15105.17>,
      {replicator_died,
      {'EXIT',<16541.21868.10>,normal}}}}
      in function ns_single_vbucket_mover:mover_inner/6
      in call from ns_replicas_builder:try_with_maybe_ignorant_after/2
      in call from ns_single_vbucket_mover:mover/6
      ancestors: [<0.13796.17>,<0.13763.17>]
      messages: []
      links: [<0.13796.17>]
      dictionary: [

      {cleanup_list,[<0.15105.17>]}

      ]
      trap_exit: true
      status: running
      heap_size: 987
      stack_size: 24
      reductions: 4014

      1. 10.5.2.13-8091-diag.txt.gz
        2.10 MB
        Andrei Baranouski
      2. 10.5.2.14-8091-diag.txt.gz
        950 kB
        Andrei Baranouski
      3. 10.5.2.15-8091-diag.txt.gz
        955 kB
        Andrei Baranouski
      4. 10.5.2.16-8091-diag.txt.gz
        954 kB
        Andrei Baranouski
      5. 10.5.2.18-8091-diag.txt.gz
        977 kB
        Andrei Baranouski
      6. 10.5.2.19-8091-diag.txt.gz
        1.73 MB
        Andrei Baranouski
      7. 2ae93467-b468-4d0e-966e-7f14cdc3bb1f-10.1.3.82-diag.gz
        17.11 MB
        Karan Kumar
      8. 2ae93467-b468-4d0e-966e-7f14cdc3bb1f-10.1.3.83-diag.gz
        14.09 MB
        Karan Kumar
      9. 2ae93467-b468-4d0e-966e-7f14cdc3bb1f-10.1.3.84-diag.gz
        18.24 MB
        Karan Kumar
      10. 2ae93467-b468-4d0e-966e-7f14cdc3bb1f-10.1.3.85-diag.gz
        17.31 MB
        Karan Kumar
      11. 2ae93467-b468-4d0e-966e-7f14cdc3bb1f-10.1.3.86-diag.gz
        19.52 MB
        Karan Kumar
      12. 32326eea-21d4-4ef0-bf82-233b681a5336-10.1.3.111-diag.gz
        2.46 MB
        Karan Kumar
      13. 32326eea-21d4-4ef0-bf82-233b681a5336-10.1.3.112-diag.gz
        6.90 MB
        Karan Kumar
      14. 32326eea-21d4-4ef0-bf82-233b681a5336-10.1.3.119-diag.gz
        1.21 MB
        Karan Kumar
      15. 32326eea-21d4-4ef0-bf82-233b681a5336-10.1.3.120-diag.gz
        884 kB
        Karan Kumar
      16. 32326eea-21d4-4ef0-bf82-233b681a5336-10.1.3.121-diag.gz
        1.64 MB
        Karan Kumar
      17. f9bf72eb-4450-44a3-ae0e-f6328be40592-10.1.3.109-diag.gz
        7.07 MB
        Karan Kumar
      18. f9bf72eb-4450-44a3-ae0e-f6328be40592-10.1.3.111-diag.gz
        3.69 MB
        Karan Kumar
      19. f9bf72eb-4450-44a3-ae0e-f6328be40592-10.1.3.112-diag.gz
        4.41 MB
        Karan Kumar
      20. f9bf72eb-4450-44a3-ae0e-f6328be40592-10.1.3.119-diag.gz
        3.93 MB
        Karan Kumar
      21. f9bf72eb-4450-44a3-ae0e-f6328be40592-10.1.3.120-diag.gz
        2.55 MB
        Karan Kumar
      22. f9bf72eb-4450-44a3-ae0e-f6328be40592-10.1.3.121-diag.gz
        4.30 MB
        Karan Kumar
      23. f9bf72eb-4450-44a3-ae0e-f6328be40592-10.1.3.160-diag.gz
        2.63 MB
        Karan Kumar
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        ep-engine closed tap-producer side of socket. We don't expect this for replica building sockets. Here are relevant logs from ebucketmigrator (run on destination node):

        [ns_server:info] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:init:181] Setting

        {"10.1.3.112",11209}

        vbucket 62 to state replica
        [rebalance:debug] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:init:186] CheckpointIdsDict:
        {dict,64,16,16,8,80,48,

        {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}

        ,
        {{[[0|1],[16|1],[32|1],[48|1]],
        [[3|1],[19|1],[35|1],[51|1]],
        [[6|1],[22|1],[38|1],[54|1]],
        [[9|1],[25|1],[41|1],[57|1]],
        [[12|1],[28|1],[44|1],[60|1]],
        [[15|1],[31|1],[47|1],[63|1]],
        [[2|1],[18|1],[34|1],[50|1]],
        [[5|1],[21|1],[37|1],[53|1]],
        [[8|1],[24|1],[40|1],[56|1]],
        [[11|1],[27|1],[43|1],[59|1]],
        [[14|1],[30|1],[46|1],[62|1]],
        [[1|1],[17|1],[33|1],[49|1]],
        [[4|1],[20|1],[36|1],[52|1]],
        [[7|1],[23|1],[39|1],[55|1]],
        [[10|1],[26|1],[42|1],[58|1]],
        [[13|1],[29|1],[45|1],[61|1]]}}}
        [ns_server:debug] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:init:209] killing tap named: replication_building_62_'ns_1@10.1.3.112'
        [rebalance:debug] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:init:247] upstream_sender pid: <0.21869.10>
        [rebalance:info] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:process_upstream:447] Initial stream for vbucket 62
        [rebalance:info] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:do_confirm_sent_messages:315] Got close ack!

        And this is logs from memcached on source node (.109)

        memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Schedule the backfill for vbucket 62
        memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0
        memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0
        memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 62

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - ep-engine closed tap-producer side of socket. We don't expect this for replica building sockets. Here are relevant logs from ebucketmigrator (run on destination node): [ns_server:info] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:init:181] Setting {"10.1.3.112",11209} vbucket 62 to state replica [rebalance:debug] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:init:186] CheckpointIdsDict: {dict,64,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]} , {{[ [0|1] , [16|1] , [32|1] , [48|1] ], [ [3|1] , [19|1] , [35|1] , [51|1] ], [ [6|1] , [22|1] , [38|1] , [54|1] ], [ [9|1] , [25|1] , [41|1] , [57|1] ], [ [12|1] , [28|1] , [44|1] , [60|1] ], [ [15|1] , [31|1] , [47|1] , [63|1] ], [ [2|1] , [18|1] , [34|1] , [50|1] ], [ [5|1] , [21|1] , [37|1] , [53|1] ], [ [8|1] , [24|1] , [40|1] , [56|1] ], [ [11|1] , [27|1] , [43|1] , [59|1] ], [ [14|1] , [30|1] , [46|1] , [62|1] ], [ [1|1] , [17|1] , [33|1] , [49|1] ], [ [4|1] , [20|1] , [36|1] , [52|1] ], [ [7|1] , [23|1] , [39|1] , [55|1] ], [ [10|1] , [26|1] , [42|1] , [58|1] ], [ [13|1] , [29|1] , [45|1] , [61|1] ]}}} [ns_server:debug] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:init:209] killing tap named: replication_building_62_'ns_1@10.1.3.112' [rebalance:debug] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:init:247] upstream_sender pid: <0.21869.10> [rebalance:info] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:process_upstream:447] Initial stream for vbucket 62 [rebalance:info] [2012-05-21 8:17:31] [ns_1@10.1.3.112:<0.21868.10>:ebucketmigrator_srv:do_confirm_sent_messages:315] Got close ack! And this is logs from memcached on source node (.109) memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Schedule the backfill for vbucket 62 memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "opaque_enable_auto_nack" and vbucket 0 memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "enable_checkpoint_sync" and vbucket 0 memcached<0.9348.1>: TAP (Producer) eq_tapq:replication_building_62_'ns_1@10.1.3.112' - Sending TAP_OPAQUE with command "initial_vbucket_stream" and vbucket 62
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Passing down to Chiyoung. See message above

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Passing down to Chiyoung. See message above
        Hide
        chiyoung Chiyoung Seo added a comment -

        Karan,

        Did you test it with 64 or 1024 vbuckets?

        I'm running the same regression tests on my 6VM cluster, but would like to know if I can set it to 64 vbuckets to reproduce this issue more quickly.

        Show
        chiyoung Chiyoung Seo added a comment - Karan, Did you test it with 64 or 1024 vbuckets? I'm running the same regression tests on my 6VM cluster, but would like to know if I can set it to 64 vbuckets to reproduce this issue more quickly.
        Hide
        karan Karan Kumar (Inactive) added a comment -

        1024

        Show
        karan Karan Kumar (Inactive) added a comment - 1024
        Hide
        chiyoung Chiyoung Seo added a comment -

        I was trying to reproduce this issue. Is the following command correct?

        python testrunner -i my_vm.ini -t rebalancetests.RebalanceInOutWithParallelLoad.test_load,get-logs=True,replica=2,num_nodes=6 -p skip-cleanup=True

        Show
        chiyoung Chiyoung Seo added a comment - I was trying to reproduce this issue. Is the following command correct? python testrunner -i my_vm.ini -t rebalancetests.RebalanceInOutWithParallelLoad.test_load,get-logs=True,replica=2,num_nodes=6 -p skip-cleanup=True
        Hide
        karan Karan Kumar (Inactive) added a comment -

        yes. Are you having issues?
        Let me know.

        Show
        karan Karan Kumar (Inactive) added a comment - yes. Are you having issues? Let me know.
        Hide
        karan Karan Kumar (Inactive) added a comment -

        Hitting the issue on windows as well. 181-871-rel
        Attaching the logs.

        Show
        karan Karan Kumar (Inactive) added a comment - Hitting the issue on windows as well. 181-871-rel Attaching the logs.
        Hide
        karan Karan Kumar (Inactive) added a comment -

        [ns_server:error] [2012-06-07 1:44:48] [ns_1@10.1.3.82:<0.30696.89>:ns_replicas_builder:build_replicas_main:99] Got premature exit from one of ebucketmigrators:

        {'EXIT',<16944.21895.27>, normal}

        [ns_server:info] [2012-06-07 1:44:48] [ns_1@10.1.3.82:<0.30696.89>:ns_replicas_builder:kill_a_bunch_of_tap_names:209] Killed the following tap names on 'ns_1@10.1.3.86': [<<"replication_building_291_'ns_1@10.1.3.87'">>,
        <<"replication_building_291_'ns_1@10.1.3.85'">>]
        [ns_server:info] [2012-06-07 1:44:48] [ns_1@10.1.3.82:<0.30695.89>:ns_single_vbucket_mover:mover_inner:88] Got exit message (parent is <0.25817.89>). Exiting...
        {'EXIT',<0.30696.89>,{replicator_died,

        {'EXIT',<16944.21895.27>,normal}}}
        [ns_server:error] [2012-06-07 1:44:48] [ns_1@10.1.3.82:<0.30695.89>:ns_replicas_builder:sync_shutdown_many:147] Shutdown of the following failed: [{<0.30696.89>,
        {replicator_died,
        {'EXIT',<16944.21895.27>,normal}

        }}]
        [ns_server:error] [2012-06-07 1:44:48] [ns_1@10.1.3.82:<0.30695.89>:ns_replicas_builder:try_with_maybe_ignorant_after:68] Eating exception from ignorant after-block:
        {error,{badmatch,[{<0.30696.89>,
        {replicator_died,

        {'EXIT',<16944.21895.27>,normal}}}]},
        [{ns_replicas_builder,sync_shutdown_many,1},
        {ns_replicas_builder,try_with_maybe_ignorant_after,2},
        {ns_single_vbucket_mover,mover,6},
        {proc_lib,init_p_do_apply,3}]}


        =========================CRASH REPORT=========================
        crasher:
        initial call: erlang:apply/2
        pid: <0.30696.89>
        registered_name: []
        exception exit: {replicator_died,{'EXIT',<16944.21895.27>,normal}}
        in function ns_replicas_builder:'build_replicas_main/6-fun-0'/1
        in call from ns_replicas_builder:observe_wait_all_done_tail/5
        in call from ns_replicas_builder:observe_wait_all_done/5
        in call from ns_replicas_builder:'build_replicas_main/6-fun-1'/8
        in call from ns_replicas_builder:try_with_maybe_ignorant_after/2
        in call from ns_replicas_builder:build_replicas_main/6
        ancestors: [<0.30695.89>,<0.25817.89>,<0.25765.89>]
        messages: [{'EXIT',<16944.21895.27>,normal}

        ]
        links: [<0.30695.89>]
        dictionary: []
        trap_exit: true
        status: running
        heap_size: 514229
        stack_size: 24
        reductions: 53630
        neighbours:

        =========================CRASH REPORT=========================
        crasher:
        initial call: ns_single_vbucket_mover:mover/6
        pid: <0.30764.89>
        registered_name: []
        exception exit: {exited,
        {'EXIT',<0.25817.89>,
        {exited,
        {'EXIT',<0.30696.89>,
        {replicator_died,{'EXIT',<16944.21895.27>,normal}}}}}}
        in function ns_single_vbucket_mover:mover_inner/6
        in call from ns_replicas_builder:try_with_maybe_ignorant_after/2
        in call from ns_single_vbucket_mover:mover/6
        ancestors: [<0.25817.89>,<0.25765.89>]
        messages: [{'EXIT',<0.25817.89>,
        {exited,
        {'EXIT',<0.30696.89>,
        {replicator_died,

        {'EXIT',<16944.21895.27>,normal}

        }}}}]
        links: [<0.25817.89>,<0.30766.89>]
        dictionary: [

        {cleanup_list,[<0.30765.89>]}

        ]
        trap_exit: true
        status: running
        heap_size: 2584
        stack_size: 24
        reductions: 1767
        neighbours:

        Show
        karan Karan Kumar (Inactive) added a comment - [ns_server:error] [2012-06-07 1:44:48] [ns_1@10.1.3.82:<0.30696.89>:ns_replicas_builder:build_replicas_main:99] Got premature exit from one of ebucketmigrators: {'EXIT',<16944.21895.27>, normal} [ns_server:info] [2012-06-07 1:44:48] [ns_1@10.1.3.82:<0.30696.89>:ns_replicas_builder:kill_a_bunch_of_tap_names:209] Killed the following tap names on 'ns_1@10.1.3.86': [<<"replication_building_291_'ns_1@10.1.3.87'">>, <<"replication_building_291_'ns_1@10.1.3.85'">>] [ns_server:info] [2012-06-07 1:44:48] [ns_1@10.1.3.82:<0.30695.89>:ns_single_vbucket_mover:mover_inner:88] Got exit message (parent is <0.25817.89>). Exiting... {'EXIT',<0.30696.89>,{replicator_died, {'EXIT',<16944.21895.27>,normal}}} [ns_server:error] [2012-06-07 1:44:48] [ns_1@10.1.3.82:<0.30695.89>:ns_replicas_builder:sync_shutdown_many:147] Shutdown of the following failed: [{<0.30696.89>, {replicator_died, {'EXIT',<16944.21895.27>,normal} }}] [ns_server:error] [2012-06-07 1:44:48] [ns_1@10.1.3.82:<0.30695.89>:ns_replicas_builder:try_with_maybe_ignorant_after:68] Eating exception from ignorant after-block: {error,{badmatch,[{<0.30696.89>, {replicator_died, {'EXIT',<16944.21895.27>,normal}}}]}, [{ns_replicas_builder,sync_shutdown_many,1}, {ns_replicas_builder,try_with_maybe_ignorant_after,2}, {ns_single_vbucket_mover,mover,6}, {proc_lib,init_p_do_apply,3}]} =========================CRASH REPORT========================= crasher: initial call: erlang:apply/2 pid: <0.30696.89> registered_name: [] exception exit: {replicator_died,{'EXIT',<16944.21895.27>,normal}} in function ns_replicas_builder:' build_replicas_main/6-fun-0 '/1 in call from ns_replicas_builder:observe_wait_all_done_tail/5 in call from ns_replicas_builder:observe_wait_all_done/5 in call from ns_replicas_builder:' build_replicas_main/6-fun-1 '/8 in call from ns_replicas_builder:try_with_maybe_ignorant_after/2 in call from ns_replicas_builder:build_replicas_main/6 ancestors: [<0.30695.89>,<0.25817.89>,<0.25765.89>] messages: [{'EXIT',<16944.21895.27>,normal} ] links: [<0.30695.89>] dictionary: [] trap_exit: true status: running heap_size: 514229 stack_size: 24 reductions: 53630 neighbours: =========================CRASH REPORT========================= crasher: initial call: ns_single_vbucket_mover:mover/6 pid: <0.30764.89> registered_name: [] exception exit: {exited, {'EXIT',<0.25817.89>, {exited, {'EXIT',<0.30696.89>, {replicator_died,{'EXIT',<16944.21895.27>,normal}}}}}} in function ns_single_vbucket_mover:mover_inner/6 in call from ns_replicas_builder:try_with_maybe_ignorant_after/2 in call from ns_single_vbucket_mover:mover/6 ancestors: [<0.25817.89>,<0.25765.89>] messages: [{'EXIT',<0.25817.89>, {exited, {'EXIT',<0.30696.89>, {replicator_died, {'EXIT',<16944.21895.27>,normal} }}}}] links: [<0.25817.89>,<0.30766.89>] dictionary: [ {cleanup_list,[<0.30765.89>]} ] trap_exit: true status: running heap_size: 2584 stack_size: 24 reductions: 1767 neighbours:
        Hide
        karan Karan Kumar (Inactive) added a comment -

        Hitting this occasionally on 181-907-rel

        Show
        karan Karan Kumar (Inactive) added a comment - Hitting this occasionally on 181-907-rel
        Hide
        chiyoung Chiyoung Seo added a comment -

        Can you upload the latest diag logs?

        Show
        chiyoung Chiyoung Seo added a comment - Can you upload the latest diag logs?
        Hide
        karan Karan Kumar (Inactive) added a comment -

        We have logs from June 7th.

        Show
        karan Karan Kumar (Inactive) added a comment - We have logs from June 7th.
        Hide
        karan Karan Kumar (Inactive) added a comment -

        Hitting this on 181-910-rel (latest)

        I will upload the logs.
        @Chiyoung, does it look to be a issue similar to memcached dropping connection?

        [ns_server:info] [2012-06-14 2:09:36] [ns_1@10.1.3.109:ns_port_memcached:ns_port_server:log:166] memcached<0.252.0>: Schedule cleanup of "eq_tapq:anon_23189"
        memcached<0.252.0>: TAP (Producer) eq_tapq:replication_building_94_'ns_1@10.1.3.112' - Clear the tap queues by force
        memcached<0.252.0>: TAP (Consumer) eq_tapq:anon_23186 - disconnected
        memcached<0.252.0>: Schedule cleanup of "eq_tapq:anon_23186"
        memcached<0.252.0>: TAP (Producer) eq_tapq:replication_ns_1@10.1.3.112 - disconnected, keep alive for 300 seconds
        memcached<0.252.0>: TAP (Consumer) eq_tapq:anon_23187 - disconnected
        memcached<0.252.0>: Schedule cleanup of "eq_tapq:anon_23187"

        [ns_server:info] [2012-06-14 2:09:36] [ns_1@10.1.3.109:<0.19238.14>:ns_replicas_builder:kill_a_bunch_of_tap_names:219] Killed the following tap names on 'ns_1@10.1.3.112': [<<"replication_building_606_'ns_1@10.1.3.109'">>,
        <<"replication_building_606_'ns_1@10.1.3.111'">>]
        [ns_server:info] [2012-06-14 2:09:36] [ns_1@10.1.3.109:<0.19238.14>:ns_replicas_builder:build_replicas_main:143] Got exit:

        {'EXIT',<0.19237.14>,shutdown}

        [ns_server:error] [2012-06-14 2:09:36] [ns_1@10.1.3.109:<0.19272.14>:ns_replicas_builder:build_replicas_main:109] Got premature exit from one of ebucketmigrators:

        {'EXIT',<16817.4853.4>, normal}
        Show
        karan Karan Kumar (Inactive) added a comment - Hitting this on 181-910-rel (latest) I will upload the logs. @Chiyoung, does it look to be a issue similar to memcached dropping connection? [ns_server:info] [2012-06-14 2:09:36] [ns_1@10.1.3.109:ns_port_memcached:ns_port_server:log:166] memcached<0.252.0>: Schedule cleanup of "eq_tapq:anon_23189" memcached<0.252.0>: TAP (Producer) eq_tapq:replication_building_94_'ns_1@10.1.3.112' - Clear the tap queues by force memcached<0.252.0>: TAP (Consumer) eq_tapq:anon_23186 - disconnected memcached<0.252.0>: Schedule cleanup of "eq_tapq:anon_23186" memcached<0.252.0>: TAP (Producer) eq_tapq:replication_ns_1@10.1.3.112 - disconnected, keep alive for 300 seconds memcached<0.252.0>: TAP (Consumer) eq_tapq:anon_23187 - disconnected memcached<0.252.0>: Schedule cleanup of "eq_tapq:anon_23187" [ns_server:info] [2012-06-14 2:09:36] [ns_1@10.1.3.109:<0.19238.14>:ns_replicas_builder:kill_a_bunch_of_tap_names:219] Killed the following tap names on 'ns_1@10.1.3.112': [<<"replication_building_606_'ns_1@10.1.3.109'">>, <<"replication_building_606_'ns_1@10.1.3.111'">>] [ns_server:info] [2012-06-14 2:09:36] [ns_1@10.1.3.109:<0.19238.14>:ns_replicas_builder:build_replicas_main:143] Got exit: {'EXIT',<0.19237.14>,shutdown} [ns_server:error] [2012-06-14 2:09:36] [ns_1@10.1.3.109:<0.19272.14>:ns_replicas_builder:build_replicas_main:109] Got premature exit from one of ebucketmigrators: {'EXIT',<16817.4853.4>, normal}
        Hide
        ketaki Ketaki Gangal added a comment -

        Seeing same issue on Swap-Rebalance on 181-916 build.
        Rebalance exited with reason {exited,
        {'EXIT',<0.12083.14>,
        {replicator_died,
        {'EXIT',<15753.26224.12>,normal}}}}

        Steps:
        1.A large cluster (18 nodes), with 3 buckets ( 50M , 22M, 0.5 M items each).
        2. Enable auto-failover
        3. Remove master-node -105, add node 126 and Rebalance

        Output
        Rebalance fails with "replicator_died" error.

        Logs attached at -
        https://s3.amazonaws.com/bugdb/jira/bug3-rebalance-181/bug3.tar

        Show
        ketaki Ketaki Gangal added a comment - Seeing same issue on Swap-Rebalance on 181-916 build. Rebalance exited with reason {exited, {'EXIT',<0.12083.14>, {replicator_died, {'EXIT',<15753.26224.12>,normal}}}} Steps: 1.A large cluster (18 nodes), with 3 buckets ( 50M , 22M, 0.5 M items each). 2. Enable auto-failover 3. Remove master-node -105, add node 126 and Rebalance Output Rebalance fails with "replicator_died" error. Logs attached at - https://s3.amazonaws.com/bugdb/jira/bug3-rebalance-181/bug3.tar
        Hide
        ketaki Ketaki Gangal added a comment -

        jira-error : Not able to delete above comment- Please ignore.

        Show
        ketaki Ketaki Gangal added a comment - jira-error : Not able to delete above comment- Please ignore.
        Hide
        peter peter added a comment -

        Let's see if we catch in 2.0 regression runs.

        Show
        peter peter added a comment - Let's see if we catch in 2.0 regression runs.
        Hide
        andreibaranouski Andrei Baranouski added a comment -

        got the same error on build-1554 for test:
        ./testrunner -i andrei.ini -t view.createdeleteview.CreateDeleteViewTests.rebalance_in_and_out_with_ddoc_ops,ddoc_ops=delete,test_with_view=True,num_ddocs=2,num_views_per_ddoc=3,items=2000

        [error_logger:error] [2012-08-08 12:54:03] [ns_1@10.5.2.11:error_logger:ale_error_logger_handler:log_report:72]
        =========================CRASH REPORT=========================
        crasher:
        initial call: ns_single_vbucket_mover:mover/6
        pid: <0.17517.2>
        registered_name: []
        exception error: {bulk_set_vbucket_state_failed,
        [{'ns_1@10.5.2.19',
        {'EXIT',
        {{{normal,
        {gen_server,call,
        [<20615.11732.0>,

        {set_state,[],"k", [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17, 18,19,20,21,22,23,24,25,26,27,28,29,30,31,32, 33,34,35,36,37,38,39,40,41,42,43,44,45,46,47, 48,49,50,51,52,53,54,55,56,57,58,59,60,61,62, 63,64,65,66,67,68,69,70,71,72,73,74,75,76,77, 78,79,80,81,82,83,84,85,86,87,88,89,90,91,92, 93,94,95,96,97,98,99,100,101,102,103,104,105, 106,108,109,110,111,112,113,114,115,116,117, 118,119,120,121,122,123,124,125,126,127]}

        ,
        infinity]}},
        {gen_server,call,

        attached logs

        Show
        andreibaranouski Andrei Baranouski added a comment - got the same error on build-1554 for test: ./testrunner -i andrei.ini -t view.createdeleteview.CreateDeleteViewTests.rebalance_in_and_out_with_ddoc_ops,ddoc_ops=delete,test_with_view=True,num_ddocs=2,num_views_per_ddoc=3,items=2000 [error_logger:error] [2012-08-08 12:54:03] [ns_1@10.5.2.11:error_logger:ale_error_logger_handler:log_report:72] =========================CRASH REPORT========================= crasher: initial call: ns_single_vbucket_mover:mover/6 pid: <0.17517.2> registered_name: [] exception error: {bulk_set_vbucket_state_failed, [{'ns_1@10.5.2.19', {'EXIT', {{{normal, {gen_server,call, [<20615.11732.0>, {set_state,[],"k", [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17, 18,19,20,21,22,23,24,25,26,27,28,29,30,31,32, 33,34,35,36,37,38,39,40,41,42,43,44,45,46,47, 48,49,50,51,52,53,54,55,56,57,58,59,60,61,62, 63,64,65,66,67,68,69,70,71,72,73,74,75,76,77, 78,79,80,81,82,83,84,85,86,87,88,89,90,91,92, 93,94,95,96,97,98,99,100,101,102,103,104,105, 106,108,109,110,111,112,113,114,115,116,117, 118,119,120,121,122,123,124,125,126,127]} , infinity]}}, {gen_server,call, attached logs
        Hide
        andreibaranouski Andrei Baranouski added a comment -

        It seems like it's another issue, fill a separate bug http://www.couchbase.com/issues/browse/MB-6160 . please, skip previous comment

        Show
        andreibaranouski Andrei Baranouski added a comment - It seems like it's another issue, fill a separate bug http://www.couchbase.com/issues/browse/MB-6160 . please, skip previous comment
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Hm. less than year has passed but I'm failing to locate relevant commits

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Hm. less than year has passed but I'm failing to locate relevant commits

          People

          • Assignee:
            frank Frank Weigel (Inactive)
            Reporter:
            karan Karan Kumar (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes