Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6490

Rebalance failed with reason "Partition 687 not in active nor passive set" in add in node rebalance

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 2.0
    • Component/s: ns_server, view-engine
    • Security Level: Public
    • Labels:
      None
    • Environment:
      4 cores VMs CentOS, centos 6.2 64bit
      build #1653, build 2.0.0-1781

      Description

      Rebalance failed with error

      Rebalance exited with reason {{{{badmatch,
      {error,

      {error, <<"Partition 36 not in active nor passive set">>}

      }},
      [

      {capi_set_view_manager,handle_call,3}

      ,

      {gen_server,handle_msg,5}

      ,

      {gen_server,init_it,6}

      ,

      {proc_lib,init_p_do_apply,3}

      ]},
      {gen_server,call,
      ['capi_set_view_manager-saslbucket',

      {wait_index_updated,36}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-saslbucket','ns_1@10.6.2.44'}

      ,
      {if_rebalance,<0.32719.854>,
      {wait_index_updated,36}},
      infinity]}}

      with or without consistent view enable.

        • In orange cluser with build 2.0.0-1781, consistent view is enable by default and rebalance failed when add 2 nodes to cluster.
        • In Iryna cluster, consistent view is disable. She got rebalance failed with the same error as she mentioned in the following:

      index_aware_rebalance_disabled set false, 5 ddocs, 500K items
      4 nodes cluster, remove 2 nodes and add 1 node, start rebalance

      Rebalance exited with reason {{error,
      <<"Partition 687 not in active nor passive set">>},
      {gen_server,call,
      [

      {'janitor_agent-bucket-0', 'ns_1@10.3.121.120'}

      ,
      {if_rebalance,<0.14888.6>,
      {wait_index_updated,953}},
      infinity]}}

      1. 10.3.121.104-8091-diag.txt.gz
        16.80 MB
        Iryna
      2. 10.3.121.105-8091-diag.txt.gz
        16.61 MB
        Iryna
      3. 10.3.121.110-8091-diag.txt.gz
        16.22 MB
        Iryna
      4. 10.3.121.111-8091-diag.txt.gz
        15.93 MB
        Iryna
      5. 10.3.121.120-8091-diag.txt.gz
        15.72 MB
        Iryna
      6. 10.3.3.58-8091-diag.txt.gz
        15.31 MB
        Iryna
      7. 10.3.3.64-8091-diag.txt.gz
        13.99 MB
        Iryna
      8. 10.3.3.68-8091-diag.txt.gz
        15.40 MB
        Iryna
      9. 10.3.3.71-8091-diag.txt.gz
        15.34 MB
        Iryna
      10. 10.3.3.73-8091-diag.txt.gz
        15.77 MB
        Iryna
      11. narrowed.txt
        522 kB
        Aleksey Kondratenko
      12. 4c82d8b6-9739-40f2-885f-e2335ddb0b54-10.3.3.58-diag.txt.gz
        15.43 MB
        Iryna
      13. 4c82d8b6-9739-40f2-885f-e2335ddb0b54-10.3.3.64-diag.txt.gz
        16.23 MB
        Iryna
      14. 4c82d8b6-9739-40f2-885f-e2335ddb0b54-10.3.3.68-diag.txt.gz
        15.12 MB
        Iryna
      15. 4c82d8b6-9739-40f2-885f-e2335ddb0b54-10.3.3.71-diag.txt.gz
        14.49 MB
        Iryna
      16. 4c82d8b6-9739-40f2-885f-e2335ddb0b54-10.3.3.73-diag.txt.gz
        16.86 MB
        Iryna
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        There's limited information in the logs, due to rotation.

        But looking at file 4c82d8b6-9739-40f2-885f-e2335ddb0b54-10.3.3.58-diag.txt,

        The last occurrence of the error, line 2416042, the error seems valid from view engine point of view. Going up above that line, none of the indexes has vbucket 624 in the active nor passive state.

        Above that line, I also see that ns_server marks vbucket 624 for cleanup in several indexes, but doesn't mark it as active/passive after. Example in line 2410127:

        [views:info,2012-10-09T14:46:46.117,ns_1@10.3.3.58:'capi_set_view_manager-default':capi_set_view_manager:apply_index_states:472]
        couch_set_view:set_partition_states([<<"default">>,

        From what I can see, the error is valid, might be a bad coordination from ns_server.

        Would also help here if ns_server logged the name of the respective index (design doc) when such error happens. Makes it easier to troubleshoot when there are many indexes.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - There's limited information in the logs, due to rotation. But looking at file 4c82d8b6-9739-40f2-885f-e2335ddb0b54-10.3.3.58-diag.txt, The last occurrence of the error, line 2416042, the error seems valid from view engine point of view. Going up above that line, none of the indexes has vbucket 624 in the active nor passive state. Above that line, I also see that ns_server marks vbucket 624 for cleanup in several indexes, but doesn't mark it as active/passive after. Example in line 2410127: [views:info,2012-10-09T14:46:46.117,ns_1@10.3.3.58:'capi_set_view_manager-default':capi_set_view_manager:apply_index_states:472] couch_set_view:set_partition_states([<<"default">>, From what I can see, the error is valid, might be a bad coordination from ns_server. Would also help here if ns_server logged the name of the respective index (design doc) when such error happens. Makes it easier to troubleshoot when there are many indexes.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Appears to be problem in waiting for persisted checkpoint. Which is causing us to assume vbucket is 'ready' too soon.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Appears to be problem in waiting for persisted checkpoint. Which is causing us to assume vbucket is 'ready' too soon.
        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - http://review.couchbase.org/#/c/21552/ and http://review.couchbase.org/#/c/21553/
        Hide
        iryna iryna added a comment -

        reproduced in 1850
        <manifest><remote name="couchbase" fetch="git://10.1.1.210/"/><remote name="membase" fetch="git://10.1.1.210/"/><remote name="apache" fetch="git://github.com/apache/"/><remote name="erlang" fetch="git://github.com/erlang/"/><default remote="couchbase" revision="master"/><project name="tlm" path="tlm" revision="ab70f6d42f46621ec576889e57cb37ac2d64a84b"><copyfile src="Makefile.top" dest="Makefile"/></project><project name="bucket_engine" path="bucket_engine" revision="70b3624abc697b7d18bf3d57f331b7674544e1e7"/><project name="ep-engine" path="ep-engine" revision="25b403263ccd67ffe3205a474d8f93a21f2936d0"/><project name="libconflate" path="libconflate" revision="2cc8eff8e77d497d9f03a30fafaecb85280535d6"/><project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/><project name="libvbucket" path="libvbucket" revision="00d3763593c116e8e5d97aa0b646c42885727398"/><project name="membase-cli" path="membase-cli" revision="c82db287eab652d25116b042d4627a6931722a8e" remote="membase"/><project name="memcached" path="memcached" revision="858731183b08cd6b72fa6e68c1fb4208cb87570d" remote="membase"/><project name="moxi" path="moxi" revision="52a5fa887bfff0bf719c4ee5f29634dd8707500e"/><project name="ns_server" path="ns_server" revision="65e7ebe2d45904e82e1226ddeca257a2cd9d5075"/><project name="portsigar" path="portsigar" revision="1bc865e1622fb93a3fe0d1a4cdf18eb97ed9d600"/><project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/><project name="couchbase-examples" path="couchbase-examples" revision="21e6161a1d064979b5c6aa99cd34ccc41c9d7aca"/><project name="couchbase-python-client" path="couchbase-python-client" revision="86b398e4fbc1f2e38d356e14df0c1bb4e3d2427b"/><project name="couchdb" path="couchdb" revision="23cec9997b38ac82cab310b7560d01db529c1ae2"/><project name="couchdbx-app" path="couchdbx-app" revision="d196377b5b1ba3ce25f1b92066e2741898b01a1e"/><project name="couchstore" path="couchstore" revision="29579bd47f7c916c43116722b8f4962b4ea9fff0"/><project name="geocouch" path="geocouch" revision="b0bd742551639c52030c070e5bf9390edbb536ba"/><project name="mccouch" path="mccouch" revision="88701cc326bc3dde4ed072bb8441be83adcfb2a5"/><project name="testrunner" path="testrunner" revision="48fc95d4e1009d0f40a2c4e2e59448dc3e4fcad3"/><project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/><project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/><project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/><project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/><project name="gperftools" path="gperftools" revision="8f60ba949fb8576c530ef4be148bff97106ddc59" remote="couchbase"/><project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/></manifest>

        logs:
        http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.114-diag.txt.gz
        http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.115-diag.txt.gz
        http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.121-diag.txt.gz
        http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.122-diag.txt.gz

        2012-10-17 01:31:46.990 ns_orchestrator:2:info:message(ns_1@10.3.3.115) - Rebalance exited with reason {{{{badmatch,
        {error,

        {error, <<"Partition 672 not in active nor passive set">>}

        }},
        [

        {capi_set_view_manager,handle_call,3}

        ,

        {gen_server,handle_msg,5}

        ,

        {gen_server,init_it,6}

        ,

        {proc_lib,init_p_do_apply,3}

        ]},
        {gen_server,call,
        ['capi_set_view_manager-default',

        {wait_index_updated,672}

        ,
        infinity]}},
        {gen_server,call,
        [

        {'janitor_agent-default','ns_1@10.3.3.115'}

        ,
        {if_rebalance,<0.7393.100>,
        {wait_index_updated,672}},
        infinity]}}

        Show
        iryna iryna added a comment - reproduced in 1850 <manifest><remote name="couchbase" fetch="git://10.1.1.210/"/><remote name="membase" fetch="git://10.1.1.210/"/><remote name="apache" fetch="git://github.com/apache/"/><remote name="erlang" fetch="git://github.com/erlang/"/><default remote="couchbase" revision="master"/><project name="tlm" path="tlm" revision="ab70f6d42f46621ec576889e57cb37ac2d64a84b"><copyfile src="Makefile.top" dest="Makefile"/></project><project name="bucket_engine" path="bucket_engine" revision="70b3624abc697b7d18bf3d57f331b7674544e1e7"/><project name="ep-engine" path="ep-engine" revision="25b403263ccd67ffe3205a474d8f93a21f2936d0"/><project name="libconflate" path="libconflate" revision="2cc8eff8e77d497d9f03a30fafaecb85280535d6"/><project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/><project name="libvbucket" path="libvbucket" revision="00d3763593c116e8e5d97aa0b646c42885727398"/><project name="membase-cli" path="membase-cli" revision="c82db287eab652d25116b042d4627a6931722a8e" remote="membase"/><project name="memcached" path="memcached" revision="858731183b08cd6b72fa6e68c1fb4208cb87570d" remote="membase"/><project name="moxi" path="moxi" revision="52a5fa887bfff0bf719c4ee5f29634dd8707500e"/><project name="ns_server" path="ns_server" revision="65e7ebe2d45904e82e1226ddeca257a2cd9d5075"/><project name="portsigar" path="portsigar" revision="1bc865e1622fb93a3fe0d1a4cdf18eb97ed9d600"/><project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/><project name="couchbase-examples" path="couchbase-examples" revision="21e6161a1d064979b5c6aa99cd34ccc41c9d7aca"/><project name="couchbase-python-client" path="couchbase-python-client" revision="86b398e4fbc1f2e38d356e14df0c1bb4e3d2427b"/><project name="couchdb" path="couchdb" revision="23cec9997b38ac82cab310b7560d01db529c1ae2"/><project name="couchdbx-app" path="couchdbx-app" revision="d196377b5b1ba3ce25f1b92066e2741898b01a1e"/><project name="couchstore" path="couchstore" revision="29579bd47f7c916c43116722b8f4962b4ea9fff0"/><project name="geocouch" path="geocouch" revision="b0bd742551639c52030c070e5bf9390edbb536ba"/><project name="mccouch" path="mccouch" revision="88701cc326bc3dde4ed072bb8441be83adcfb2a5"/><project name="testrunner" path="testrunner" revision="48fc95d4e1009d0f40a2c4e2e59448dc3e4fcad3"/><project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/><project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/><project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/><project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/><project name="gperftools" path="gperftools" revision="8f60ba949fb8576c530ef4be148bff97106ddc59" remote="couchbase"/><project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/></manifest> logs: http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.114-diag.txt.gz http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.115-diag.txt.gz http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.121-diag.txt.gz http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.122-diag.txt.gz 2012-10-17 01:31:46.990 ns_orchestrator:2:info:message(ns_1@10.3.3.115) - Rebalance exited with reason {{{{badmatch, {error, {error, <<"Partition 672 not in active nor passive set">>} }}, [ {capi_set_view_manager,handle_call,3} , {gen_server,handle_msg,5} , {gen_server,init_it,6} , {proc_lib,init_p_do_apply,3} ]}, {gen_server,call, ['capi_set_view_manager-default', {wait_index_updated,672} , infinity]}}, {gen_server,call, [ {'janitor_agent-default','ns_1@10.3.3.115'} , {if_rebalance,<0.7393.100>, {wait_index_updated,672}}, infinity]}}
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Thanks for report. I managed to understand what happened by looking at MB-6955. Fix is coming soon.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Thanks for report. I managed to understand what happened by looking at MB-6955 . Fix is coming soon.

          People

          • Assignee:
            alkondratenko Aleksey Kondratenko (Inactive)
            Reporter:
            iryna iryna
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes