Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6384

inability to reach some node should not cause entire per-bucket supervisor to fail [was: Rebalance 5->4 nodes is failed with reason bulk_set_vbucket_state_failed]

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
    • Environment:
      centOS, 64 -bit, 4 cores VMs, build #1620

      Description

      1.Rebalance in 1->5 nodes
      2. Load data (1M), no views or ddocs are created
      3. Start rebalance out
      4. Created 3 ddocs, 2 view per ddoc
      5. Rebalance is failed

      2012-08-22 18:18:41.623 ns_orchestrator:4:info:message(ns_1@10.3.3.58) - Starting rebalance, KeepNodes = ['ns_1@10.3.3.64','ns_1@10.3.3.68',
      'ns_1@10.3.3.58','ns_1@10.3.3.71'], EjectNodes = ['ns_1@10.3.3.73']

      2012-08-22 18:18:41.933 ns_rebalancer:0:info:message(ns_1@10.3.3.58) - Started rebalancing bucket default
      2012-08-22 18:18:42.512 ns_vbucket_mover:0:info:message(ns_1@10.3.3.58) - Bucket "default" rebalance does not seem to be swap rebalance
      2012-08-22 18:18:45.428 ns_memcached:2:info:message(ns_1@10.3.3.73) - Shutting down bucket "default" on 'ns_1@10.3.3.73' for server shutdown
      2012-08-22 18:18:45.747 ns_orchestrator:2:info:message(ns_1@10.3.3.58) - Rebalance exited with reason {{bulk_set_vbucket_state_failed,
      [{'ns_1@10.3.3.64',
      {'EXIT',
      {killed,
      {gen_server,call,
      [

      {'janitor_agent-default', 'ns_1@10.3.3.64'}

      ,
      {if_rebalance,<0.22908.39>,
      {update_vbucket_state,820,replica,
      undefined,'ns_1@10.3.3.58'}},
      infinity]}}}}]},
      [

      {janitor_agent,bulk_set_vbucket_state,4}

      ,

      {ns_vbucket_mover, update_replication_post_move,3}

      ,

      {ns_vbucket_mover,handle_info,2}

      ,

      {gen_server,handle_msg,5}

      ,

      {proc_lib,init_p_do_apply,3}

      ]}

      1. 10.3.3.73-8091-diag.txt.gz
        3.59 MB
        Iryna
      2. 10.3.3.71-8091-diag.txt.gz
        5.83 MB
        Iryna
      3. 10.3.3.68-8091-diag.txt.gz
        14.42 MB
        Iryna
      4. 10.3.3.64-8091-diag.txt.gz
        6.19 MB
        Iryna
      5. 10.3.3.58-8091-diag.txt.gz
        15.72 MB
        Iryna
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Root cause is problem in MB-6385. But this is causing per-bucket supervisor of .64 to fail because .73 deletes bucket incorrectly thinking there's server shutdown.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Root cause is problem in MB-6385 . But this is causing per-bucket supervisor of .64 to fail because .73 deletes bucket incorrectly thinking there's server shutdown.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        regressions are marked as bockers

        Show
        farshid Farshid Ghods (Inactive) added a comment - regressions are marked as bockers
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Should be done as well

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Should be done as well
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-ns-server-2-0 #453 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/453/)
        MB-6384: don't shutdown bucket unless we're deleting it (Revision 2e7b50a5c0faa23a1f5367536e75358e105a0d19)
        MB-6384: changed replicators' supervision type to termporary (Revision b5ab81c848aef02d010062a5eb10361ed2965088)

        Result = SUCCESS
        Aliaksey Kandratsenka :
        Files :

        • src/ns_memcached.erl

        Aliaksey Kandratsenka :
        Files :

        • src/ns_vbm_new_sup.erl
        • src/replication_changes.erl
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-ns-server-2-0 #453 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/453/ ) MB-6384 : don't shutdown bucket unless we're deleting it (Revision 2e7b50a5c0faa23a1f5367536e75358e105a0d19) MB-6384 : changed replicators' supervision type to termporary (Revision b5ab81c848aef02d010062a5eb10361ed2965088) Result = SUCCESS Aliaksey Kandratsenka : Files : src/ns_memcached.erl Aliaksey Kandratsenka : Files : src/ns_vbm_new_sup.erl src/replication_changes.erl
        Hide
        iryna iryna added a comment -

        verified

        Show
        iryna iryna added a comment - verified

          People

          • Assignee:
            alkondratenko Aleksey Kondratenko (Inactive)
            Reporter:
            iryna iryna
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes