Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6490

Rebalance failed with reason "Partition 687 not in active nor passive set" in add in node rebalance

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 2.0
    • Component/s: ns_server, view-engine
    • Security Level: Public
    • Labels:
      None
    • Environment:
      4 cores VMs CentOS, centos 6.2 64bit
      build #1653, build 2.0.0-1781

      Description

      Rebalance failed with error

      Rebalance exited with reason {{{{badmatch,
      {error,

      {error, <<"Partition 36 not in active nor passive set">>}

      }},
      [

      {capi_set_view_manager,handle_call,3}

      ,

      {gen_server,handle_msg,5}

      ,

      {gen_server,init_it,6}

      ,

      {proc_lib,init_p_do_apply,3}

      ]},
      {gen_server,call,
      ['capi_set_view_manager-saslbucket',

      {wait_index_updated,36}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-saslbucket','ns_1@10.6.2.44'}

      ,
      {if_rebalance,<0.32719.854>,
      {wait_index_updated,36}},
      infinity]}}

      with or without consistent view enable.

        • In orange cluser with build 2.0.0-1781, consistent view is enable by default and rebalance failed when add 2 nodes to cluster.
        • In Iryna cluster, consistent view is disable. She got rebalance failed with the same error as she mentioned in the following:

      index_aware_rebalance_disabled set false, 5 ddocs, 500K items
      4 nodes cluster, remove 2 nodes and add 1 node, start rebalance

      Rebalance exited with reason {{error,
      <<"Partition 687 not in active nor passive set">>},
      {gen_server,call,
      [

      {'janitor_agent-bucket-0', 'ns_1@10.3.121.120'}

      ,
      {if_rebalance,<0.14888.6>,
      {wait_index_updated,953}},
      infinity]}}

      1. 10.3.121.104-8091-diag.txt.gz
        16.80 MB
        Iryna
      2. 10.3.121.105-8091-diag.txt.gz
        16.61 MB
        Iryna
      3. 10.3.121.110-8091-diag.txt.gz
        16.22 MB
        Iryna
      4. 10.3.121.111-8091-diag.txt.gz
        15.93 MB
        Iryna
      5. 10.3.121.120-8091-diag.txt.gz
        15.72 MB
        Iryna
      6. 10.3.3.58-8091-diag.txt.gz
        15.31 MB
        Iryna
      7. 10.3.3.64-8091-diag.txt.gz
        13.99 MB
        Iryna
      8. 10.3.3.68-8091-diag.txt.gz
        15.40 MB
        Iryna
      9. 10.3.3.71-8091-diag.txt.gz
        15.34 MB
        Iryna
      10. 10.3.3.73-8091-diag.txt.gz
        15.77 MB
        Iryna
      11. 4c82d8b6-9739-40f2-885f-e2335ddb0b54-10.3.3.58-diag.txt.gz
        15.43 MB
        Iryna
      12. 4c82d8b6-9739-40f2-885f-e2335ddb0b54-10.3.3.64-diag.txt.gz
        16.23 MB
        Iryna
      13. 4c82d8b6-9739-40f2-885f-e2335ddb0b54-10.3.3.68-diag.txt.gz
        15.12 MB
        Iryna
      14. 4c82d8b6-9739-40f2-885f-e2335ddb0b54-10.3.3.71-diag.txt.gz
        14.49 MB
        Iryna
      15. 4c82d8b6-9739-40f2-885f-e2335ddb0b54-10.3.3.73-diag.txt.gz
        16.86 MB
        Iryna
      16. narrowed.txt
        522 kB
        Aleksey Kondratenko
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        iryna iryna created issue -
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Thanks. That's the bug I was seeing too. Diags should help me a lot.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Thanks. That's the bug I was seeing too. Diags should help me a lot.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Ok. That's simple race.

        We're starting to monitor indexing of vbucket that we've waited to be 'ready' inside ep-engine.

        The problem is there's gap between ep-engine gets stuff in ram and same stuff is ready on disk. So we need to wait while vbucket actually gets to disk.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Ok. That's simple race. We're starting to monitor indexing of vbucket that we've waited to be 'ready' inside ep-engine. The problem is there's gap between ep-engine gets stuff in ram and same stuff is ready on disk. So we need to wait while vbucket actually gets to disk.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Done

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Done
        alkondratenko Aleksey Kondratenko (Inactive) made changes -
        Field Original Value New Value
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-ns-server-2-0 #461 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/461/)
        MB-6490: killed needless nesting in per-bucket supervisor (Revision b1ead4cf929528b6a590810de7d35704f6ff45d2)
        MB-6490: untangled ddoc replication from cb_generic_replication (Revision 91261dfe63c67ef3036da4ffa81b259b42dd24e8)
        MB-6490: untangled xdcr rdoc from cb_generic_replication_srv (Revision d0463831630262f78ce24caa31e44218c6814946)
        MB-6490: removed unused cb_generic_replication_srv (Revision 520ecd2708cab22796c5c32e43c7b5556617322e)
        MB-6490: replicate ddocs in capi_set_view_manager (Revision 0395d10743d743a5faef748885cd9c9f9565cab2)
        MB-6490: moved waiting for index updates to capi_set_view_manager (Revision 62802038b5864f6c26bbc007f5fa09d91806b880)

        Result = SUCCESS
        pwansch :
        Files :

        • src/ns_memcached_sup.erl
        • src/single_bucket_sup.erl

        pwansch :
        Files :

        • src/capi_ddoc_replication_srv.erl

        pwansch :
        Files :

        • src/xdc_rdoc_replication_srv.erl

        pwansch :
        Files :

        • src/cb_generic_replication_srv.erl

        pwansch :
        Files :

        • src/single_bucket_sup.erl
        • src/capi_ddoc_replication_srv.erl
        • src/capi_set_view_manager.erl

        pwansch :
        Files :

        • src/capi_set_view_manager.erl
        • src/janitor_agent.erl
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-ns-server-2-0 #461 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/461/ ) MB-6490 : killed needless nesting in per-bucket supervisor (Revision b1ead4cf929528b6a590810de7d35704f6ff45d2) MB-6490 : untangled ddoc replication from cb_generic_replication (Revision 91261dfe63c67ef3036da4ffa81b259b42dd24e8) MB-6490 : untangled xdcr rdoc from cb_generic_replication_srv (Revision d0463831630262f78ce24caa31e44218c6814946) MB-6490 : removed unused cb_generic_replication_srv (Revision 520ecd2708cab22796c5c32e43c7b5556617322e) MB-6490 : replicate ddocs in capi_set_view_manager (Revision 0395d10743d743a5faef748885cd9c9f9565cab2) MB-6490 : moved waiting for index updates to capi_set_view_manager (Revision 62802038b5864f6c26bbc007f5fa09d91806b880) Result = SUCCESS pwansch : Files : src/ns_memcached_sup.erl src/single_bucket_sup.erl pwansch : Files : src/capi_ddoc_replication_srv.erl pwansch : Files : src/xdc_rdoc_replication_srv.erl pwansch : Files : src/cb_generic_replication_srv.erl pwansch : Files : src/single_bucket_sup.erl src/capi_ddoc_replication_srv.erl src/capi_set_view_manager.erl pwansch : Files : src/capi_set_view_manager.erl src/janitor_agent.erl
        Hide
        iryna iryna added a comment -

        verified

        Show
        iryna iryna added a comment - verified
        iryna iryna made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Assignee Aleksey Kondratenko [ alkondratenko ] Iryna Mironava [ iryna ]
        Hide
        iryna iryna added a comment -

        reproduced in 1707

        Show
        iryna iryna added a comment - reproduced in 1707
        iryna iryna made changes -
        Resolution Fixed [ 1 ]
        Status Closed [ 6 ] Reopened [ 4 ]
        Assignee Iryna Mironava [ iryna ] Aleksey Kondratenko [ alkondratenko ]
        iryna iryna made changes -
        Attachment 10.3.3.58-8091-diag.txt.gz [ 14971 ]
        Attachment 10.3.3.64-8091-diag.txt.gz [ 14972 ]
        Attachment 10.3.3.68-8091-diag.txt.gz [ 14973 ]
        Attachment 10.3.3.71-8091-diag.txt.gz [ 14974 ]
        Attachment 10.3.3.73-8091-diag.txt.gz [ 14975 ]
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        promoting this to blocker since this happens more frequenetly now and its easy to reproduce

        Show
        farshid Farshid Ghods (Inactive) added a comment - promoting this to blocker since this happens more frequenetly now and its easy to reproduce
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Narrowed last phase phase of things.

        capi_set_view_manager seemingly correctly added 842 to passive state in all indexes.

        Then we start monitoring index update and crash. Which tells us 842 is neither active nor passive.

        Could be related with ongoing 842 cleanup and fact that 842 passivation is pending

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Narrowed last phase phase of things. capi_set_view_manager seemingly correctly added 842 to passive state in all indexes. Then we start monitoring index update and crash. Which tells us 842 is neither active nor passive. Could be related with ongoing 842 cleanup and fact that 842 passivation is pending
        alkondratenko Aleksey Kondratenko (Inactive) made changes -
        Attachment narrowed.txt [ 15001 ]
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Farshid, you mentioned it's blocker, but it's not according to ticket.

        Please, update and pass to Filipe. I need his attention here, from logs it appears capi_set_view_manager is doing it right.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Farshid, you mentioned it's blocker, but it's not according to ticket. Please, update and pass to Filipe. I need his attention here, from logs it appears capi_set_view_manager is doing it right.
        alkondratenko Aleksey Kondratenko (Inactive) made changes -
        Assignee Aleksey Kondratenko [ alkondratenko ] Farshid Ghods [ farshid ]
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        Yes this is a 2.0 blocker , not a 2.0 beta
        will assign this to Filipe .
        Thanks for traige.

        Show
        farshid Farshid Ghods (Inactive) added a comment - Yes this is a 2.0 blocker , not a 2.0 beta will assign this to Filipe . Thanks for traige.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        Per Alk comments

        Show
        farshid Farshid Ghods (Inactive) added a comment - Per Alk comments
        farshid Farshid Ghods (Inactive) made changes -
        Assignee Farshid Ghods [ farshid ] Filipe Manana [ filipemanana ]
        farshid Farshid Ghods (Inactive) made changes -
        Priority Major [ 3 ] Blocker [ 1 ]
        Hide
        thuan Thuan Nguyen added a comment - - edited

        Hit this bug in add 2 nodes in system test with build 2.0.0-1781 with consistent view enable. I wiill get collect_info from all nodes and update this bug.

        Rebalance exited with reason {{{{badmatch,
        {error,

        {error, <<"Partition 145 not in active nor passive set">>}

        }},
        [

        {capi_set_view_manager,handle_call,3}

        ,

        {gen_server,handle_msg,5}

        ,

        {gen_server,init_it,6}

        ,

        {proc_lib,init_p_do_apply,3}

        ]},
        {gen_server,call,
        ['capi_set_view_manager-saslbucket',

        {wait_index_updated,145}

        ,
        infinity]}},
        {gen_server,call,
        [

        {'janitor_agent-saslbucket','ns_1@10.6.2.38'}

        ,
        {if_rebalance,<0.1357.852>,
        {wait_index_updated,145}},
        infinity]}}

        Show
        thuan Thuan Nguyen added a comment - - edited Hit this bug in add 2 nodes in system test with build 2.0.0-1781 with consistent view enable. I wiill get collect_info from all nodes and update this bug. Rebalance exited with reason {{{{badmatch, {error, {error, <<"Partition 145 not in active nor passive set">>} }}, [ {capi_set_view_manager,handle_call,3} , {gen_server,handle_msg,5} , {gen_server,init_it,6} , {proc_lib,init_p_do_apply,3} ]}, {gen_server,call, ['capi_set_view_manager-saslbucket', {wait_index_updated,145} , infinity]}}, {gen_server,call, [ {'janitor_agent-saslbucket','ns_1@10.6.2.38'} , {if_rebalance,<0.1357.852>, {wait_index_updated,145}}, infinity]}}
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        Thanks Thuan.

        I'm aware of the problem after ns_server's fix. Different problem (and component) but same final error.
        Started working on it already last week.

        There's no need to keep testing for this or posting new results - the old logs are clear enough to understand the problem.
        Don't bother investing more time here before I finished my change and it gets merged. Thanks.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - Thanks Thuan. I'm aware of the problem after ns_server's fix. Different problem (and component) but same final error. Started working on it already last week. There's no need to keep testing for this or posting new results - the old logs are clear enough to understand the problem. Don't bother investing more time here before I finished my change and it gets merged. Thanks.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        Tony,
        can you please rephrase the bug description to reflect the use case better.
        i was confused as the title says it happens only when consisten views is disabled and if it does not happen with consistent views then the priority is different. so please be more specific.

        also as Filipe mentioned for this exact error le'ts not file seperate bugs

        Show
        farshid Farshid Ghods (Inactive) added a comment - Tony, can you please rephrase the bug description to reflect the use case better. i was confused as the title says it happens only when consisten views is disabled and if it does not happen with consistent views then the priority is different. so please be more specific. also as Filipe mentioned for this exact error le'ts not file seperate bugs
        thuan Thuan Nguyen made changes -
        Summary Rebalance 2 in 1 out with index_aware_rebalance_disabled=false exited with reason Partition 687 not in active nor passive set Rebalance failed with reason "Partition 687 not in active nor passive set" in add in node rebalance
        Environment 4 cores VMs CentOS
        build #1653
        4 cores VMs CentOS, centos 6.2 64bit
        build #1653, build 2.0.0-1781
        thuan Thuan Nguyen made changes -
        Description  index_aware_rebalance_disabled set false, 5 ddocs, 500K items
        4 nodes cluster, remove 2 nodes and add 1 node, start rebalance

        Rebalance exited with reason {{error,
                                          <<"Partition 687 not in active nor passive set">>},
                                      {gen_server,call,
                                          [{'janitor_agent-bucket-0',
                                               'ns_1@10.3.121.120'},
                                           {if_rebalance,<0.14888.6>,
                                               {wait_index_updated,953}},
                                           infinity]}}
        Rebalance failed with error

        Rebalance exited with reason {{{{badmatch,
        {error,
        {error,
        <<"Partition 36 not in active nor passive set">>}}},
        [{capi_set_view_manager,handle_call,3},
        {gen_server,handle_msg,5},
        {gen_server,init_it,6},
        {proc_lib,init_p_do_apply,3}]},
        {gen_server,call,
        ['capi_set_view_manager-saslbucket',
        {wait_index_updated,36},
        infinity]}},
        {gen_server,call,
        [{'janitor_agent-saslbucket','ns_1@10.6.2.44'},
        {if_rebalance,<0.32719.854>,
        {wait_index_updated,36}},
        infinity]}}

        with or without consistent view enable.

        ** In orange cluser with build 2.0.0-1781, consistent view is enable by default and rebalance failed when add 2 nodes to cluster.

        ** In Iryna cluster, consistent view is disable. She got rebalance failed with the same error as she mentioned in the following:

         index_aware_rebalance_disabled set false, 5 ddocs, 500K items
        4 nodes cluster, remove 2 nodes and add 1 node, start rebalance

        Rebalance exited with reason {{error,
                                          <<"Partition 687 not in active nor passive set">>},
                                      {gen_server,call,
                                          [{'janitor_agent-bucket-0',
                                               'ns_1@10.3.121.120'},
                                           {if_rebalance,<0.14888.6>,
                                               {wait_index_updated,953}},
                                           infinity]}}
        peter peter made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        FilipeManana Filipe Manana (Inactive) made changes -
        Component/s view-engine [ 10060 ]
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-couchdb-preview #509 (See http://qa.hq.northscale.net/job/github-couchdb-preview/509/)
        MB-6490 Fix race condition in test 20-debug-params.t (Revision 80cbce15112b2b60f2c3463673c81139c3731f0d)
        MB-6490 Allow unindexable partitions in the pending transition (Revision 63a94ebe8da325c89972dedac0db41c5a7a36aed)
        MB-6490 Don't error when monitoring partitions in pending transition (Revision 780f5c88c84c6c9319c8f12638cc8946b8b842f5)

        Result = SUCCESS
        pwansch :
        Files :

        • src/couch_set_view/test/20-debug-params.t

        pwansch :
        Files :

        • src/couch_set_view/src/couch_set_view_group.erl
        • src/couch_set_view/src/couch_set_view_updater.erl
        • src/couch_set_view/include/couch_set_view.hrl
        • src/couch_set_view/test/16-pending-transition.t
        • src/couch_set_view/src/couch_set_view_util.erl

        pwansch :
        Files :

        • src/couch_set_view/test/16-pending-transition.t
        • src/couch_set_view/src/couch_set_view_group.erl
        • src/couch_set_view/src/couch_set_view_util.erl
        • src/couch_set_view/src/couch_db_set.erl
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-couchdb-preview #509 (See http://qa.hq.northscale.net/job/github-couchdb-preview/509/ ) MB-6490 Fix race condition in test 20-debug-params.t (Revision 80cbce15112b2b60f2c3463673c81139c3731f0d) MB-6490 Allow unindexable partitions in the pending transition (Revision 63a94ebe8da325c89972dedac0db41c5a7a36aed) MB-6490 Don't error when monitoring partitions in pending transition (Revision 780f5c88c84c6c9319c8f12638cc8946b8b842f5) Result = SUCCESS pwansch : Files : src/couch_set_view/test/20-debug-params.t pwansch : Files : src/couch_set_view/src/couch_set_view_group.erl src/couch_set_view/src/couch_set_view_updater.erl src/couch_set_view/include/couch_set_view.hrl src/couch_set_view/test/16-pending-transition.t src/couch_set_view/src/couch_set_view_util.erl pwansch : Files : src/couch_set_view/test/16-pending-transition.t src/couch_set_view/src/couch_set_view_group.erl src/couch_set_view/src/couch_set_view_util.erl src/couch_set_view/src/couch_db_set.erl
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-couchdb-preview #510 (See http://qa.hq.northscale.net/job/github-couchdb-preview/510/)
        MB-6490 Add missing checks to state transition requests (Revision bf5c23b6af2f31656dcd96f9892fc9c2c66b5b48)

        Result = SUCCESS
        Farshid Ghods :
        Files :

        • src/couch_set_view/src/couch_set_view_group.erl
        • src/couch_set_view/test/16-pending-transition.t
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-couchdb-preview #510 (See http://qa.hq.northscale.net/job/github-couchdb-preview/510/ ) MB-6490 Add missing checks to state transition requests (Revision bf5c23b6af2f31656dcd96f9892fc9c2c66b5b48) Result = SUCCESS Farshid Ghods : Files : src/couch_set_view/src/couch_set_view_group.erl src/couch_set_view/test/16-pending-transition.t
        Hide
        iryna iryna added a comment -

        reproduced in 1820:
        manifest:
        <manifest><remote name="couchbase" fetch="git://10.1.1.210/"/><remote name="membase" fetch="git://10.1.1.210/"/><remote name="apache" fetch="git://github.com/apache/"/><remote name="erlang" fetch="git://github.com/erlang/"/><default remote="couchbase" revision="master"/><project name="tlm" path="tlm" revision="ab70f6d42f46621ec576889e57cb37ac2d64a84b"><copyfile dest="Makefile" src="Makefile.top"/></project><project name="bucket_engine" path="bucket_engine" revision="70b3624abc697b7d18bf3d57f331b7674544e1e7"/><project name="ep-engine" path="ep-engine" revision="3d545832ed84650e480855cf3abae6fef9fccf9d"/><project name="libconflate" path="libconflate" revision="3cf7107eaa5b52b34cc9f887cf0e2edb3465988e"/><project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/><project name="libvbucket" path="libvbucket" revision="00d3763593c116e8e5d97aa0b646c42885727398"/><project name="membase-cli" path="membase-cli" revision="0bc659c78e1f2d822e658778f857c8dacc7a01e5" remote="membase"/><project name="memcached" path="memcached" revision="858731183b08cd6b72fa6e68c1fb4208cb87570d" remote="membase"/><project name="moxi" path="moxi" revision="52a5fa887bfff0bf719c4ee5f29634dd8707500e"/><project name="ns_server" path="ns_server" revision="a4fd05a0fa64f090800baccc887bbd416b9f8f27"/><project name="portsigar" path="portsigar" revision="1bc865e1622fb93a3fe0d1a4cdf18eb97ed9d600"/><project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/><project name="couchbase-examples" path="couchbase-examples" revision="21e6161a1d064979b5c6aa99cd34ccc41c9d7aca"/><project name="couchbase-python-client" path="couchbase-python-client" revision="86b398e4fbc1f2e38d356e14df0c1bb4e3d2427b"/><project name="couchdb" path="couchdb" revision="6b9fa5f115e675ba345bf5ffa17e57423efd86ba"/><project name="couchdbx-app" path="couchdbx-app" revision="d196377b5b1ba3ce25f1b92066e2741898b01a1e"/><project name="couchstore" path="couchstore" revision="29579bd47f7c916c43116722b8f4962b4ea9fff0"/><project name="geocouch" path="geocouch" revision="7782df1a53104e9c8bb9ef941a9b499bbc7cd61e"/><project name="mccouch" path="mccouch" revision="88701cc326bc3dde4ed072bb8441be83adcfb2a5"/><project name="testrunner" path="testrunner" revision="bc501cfa4c3453f9c2a7b8cf48ac81da3dca053c"/><project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/><project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/><project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/><project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/><project name="gperftools" path="gperftools" revision="8f60ba949fb8576c530ef4be148bff97106ddc59" remote="couchbase"/><project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/></manifest>

        attaching new logs

        Show
        iryna iryna added a comment - reproduced in 1820: manifest: <manifest><remote name="couchbase" fetch="git://10.1.1.210/"/><remote name="membase" fetch="git://10.1.1.210/"/><remote name="apache" fetch="git://github.com/apache/"/><remote name="erlang" fetch="git://github.com/erlang/"/><default remote="couchbase" revision="master"/><project name="tlm" path="tlm" revision="ab70f6d42f46621ec576889e57cb37ac2d64a84b"><copyfile dest="Makefile" src="Makefile.top"/></project><project name="bucket_engine" path="bucket_engine" revision="70b3624abc697b7d18bf3d57f331b7674544e1e7"/><project name="ep-engine" path="ep-engine" revision="3d545832ed84650e480855cf3abae6fef9fccf9d"/><project name="libconflate" path="libconflate" revision="3cf7107eaa5b52b34cc9f887cf0e2edb3465988e"/><project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/><project name="libvbucket" path="libvbucket" revision="00d3763593c116e8e5d97aa0b646c42885727398"/><project name="membase-cli" path="membase-cli" revision="0bc659c78e1f2d822e658778f857c8dacc7a01e5" remote="membase"/><project name="memcached" path="memcached" revision="858731183b08cd6b72fa6e68c1fb4208cb87570d" remote="membase"/><project name="moxi" path="moxi" revision="52a5fa887bfff0bf719c4ee5f29634dd8707500e"/><project name="ns_server" path="ns_server" revision="a4fd05a0fa64f090800baccc887bbd416b9f8f27"/><project name="portsigar" path="portsigar" revision="1bc865e1622fb93a3fe0d1a4cdf18eb97ed9d600"/><project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/><project name="couchbase-examples" path="couchbase-examples" revision="21e6161a1d064979b5c6aa99cd34ccc41c9d7aca"/><project name="couchbase-python-client" path="couchbase-python-client" revision="86b398e4fbc1f2e38d356e14df0c1bb4e3d2427b"/><project name="couchdb" path="couchdb" revision="6b9fa5f115e675ba345bf5ffa17e57423efd86ba"/><project name="couchdbx-app" path="couchdbx-app" revision="d196377b5b1ba3ce25f1b92066e2741898b01a1e"/><project name="couchstore" path="couchstore" revision="29579bd47f7c916c43116722b8f4962b4ea9fff0"/><project name="geocouch" path="geocouch" revision="7782df1a53104e9c8bb9ef941a9b499bbc7cd61e"/><project name="mccouch" path="mccouch" revision="88701cc326bc3dde4ed072bb8441be83adcfb2a5"/><project name="testrunner" path="testrunner" revision="bc501cfa4c3453f9c2a7b8cf48ac81da3dca053c"/><project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/><project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/><project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/><project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/><project name="gperftools" path="gperftools" revision="8f60ba949fb8576c530ef4be148bff97106ddc59" remote="couchbase"/><project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/></manifest> attaching new logs
        iryna iryna made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Hide
        iryna iryna added a comment -

        logs from build 1820

        Show
        iryna iryna added a comment - logs from build 1820
        Hide
        iryna iryna added a comment -

        reproduced also on build 1827

        Show
        iryna iryna added a comment - reproduced also on build 1827
        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        There's limited information in the logs, due to rotation.

        But looking at file 4c82d8b6-9739-40f2-885f-e2335ddb0b54-10.3.3.58-diag.txt,

        The last occurrence of the error, line 2416042, the error seems valid from view engine point of view. Going up above that line, none of the indexes has vbucket 624 in the active nor passive state.

        Above that line, I also see that ns_server marks vbucket 624 for cleanup in several indexes, but doesn't mark it as active/passive after. Example in line 2410127:

        [views:info,2012-10-09T14:46:46.117,ns_1@10.3.3.58:'capi_set_view_manager-default':capi_set_view_manager:apply_index_states:472]
        couch_set_view:set_partition_states([<<"default">>,

        From what I can see, the error is valid, might be a bad coordination from ns_server.

        Would also help here if ns_server logged the name of the respective index (design doc) when such error happens. Makes it easier to troubleshoot when there are many indexes.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - There's limited information in the logs, due to rotation. But looking at file 4c82d8b6-9739-40f2-885f-e2335ddb0b54-10.3.3.58-diag.txt, The last occurrence of the error, line 2416042, the error seems valid from view engine point of view. Going up above that line, none of the indexes has vbucket 624 in the active nor passive state. Above that line, I also see that ns_server marks vbucket 624 for cleanup in several indexes, but doesn't mark it as active/passive after. Example in line 2410127: [views:info,2012-10-09T14:46:46.117,ns_1@10.3.3.58:'capi_set_view_manager-default':capi_set_view_manager:apply_index_states:472] couch_set_view:set_partition_states([<<"default">>, From what I can see, the error is valid, might be a bad coordination from ns_server. Would also help here if ns_server logged the name of the respective index (design doc) when such error happens. Makes it easier to troubleshoot when there are many indexes.
        FilipeManana Filipe Manana (Inactive) made changes -
        Assignee Filipe Manana [ filipemanana ] Iryna Mironava [ iryna ]
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Appears to be problem in waiting for persisted checkpoint. Which is causing us to assume vbucket is 'ready' too soon.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Appears to be problem in waiting for persisted checkpoint. Which is causing us to assume vbucket is 'ready' too soon.
        alkondratenko Aleksey Kondratenko (Inactive) made changes -
        Assignee Iryna Mironava [ iryna ] Aleksey Kondratenko [ alkondratenko ]
        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - http://review.couchbase.org/#/c/21552/ and http://review.couchbase.org/#/c/21553/
        alkondratenko Aleksey Kondratenko (Inactive) made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        iryna iryna added a comment -

        reproduced in 1850
        <manifest><remote name="couchbase" fetch="git://10.1.1.210/"/><remote name="membase" fetch="git://10.1.1.210/"/><remote name="apache" fetch="git://github.com/apache/"/><remote name="erlang" fetch="git://github.com/erlang/"/><default remote="couchbase" revision="master"/><project name="tlm" path="tlm" revision="ab70f6d42f46621ec576889e57cb37ac2d64a84b"><copyfile src="Makefile.top" dest="Makefile"/></project><project name="bucket_engine" path="bucket_engine" revision="70b3624abc697b7d18bf3d57f331b7674544e1e7"/><project name="ep-engine" path="ep-engine" revision="25b403263ccd67ffe3205a474d8f93a21f2936d0"/><project name="libconflate" path="libconflate" revision="2cc8eff8e77d497d9f03a30fafaecb85280535d6"/><project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/><project name="libvbucket" path="libvbucket" revision="00d3763593c116e8e5d97aa0b646c42885727398"/><project name="membase-cli" path="membase-cli" revision="c82db287eab652d25116b042d4627a6931722a8e" remote="membase"/><project name="memcached" path="memcached" revision="858731183b08cd6b72fa6e68c1fb4208cb87570d" remote="membase"/><project name="moxi" path="moxi" revision="52a5fa887bfff0bf719c4ee5f29634dd8707500e"/><project name="ns_server" path="ns_server" revision="65e7ebe2d45904e82e1226ddeca257a2cd9d5075"/><project name="portsigar" path="portsigar" revision="1bc865e1622fb93a3fe0d1a4cdf18eb97ed9d600"/><project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/><project name="couchbase-examples" path="couchbase-examples" revision="21e6161a1d064979b5c6aa99cd34ccc41c9d7aca"/><project name="couchbase-python-client" path="couchbase-python-client" revision="86b398e4fbc1f2e38d356e14df0c1bb4e3d2427b"/><project name="couchdb" path="couchdb" revision="23cec9997b38ac82cab310b7560d01db529c1ae2"/><project name="couchdbx-app" path="couchdbx-app" revision="d196377b5b1ba3ce25f1b92066e2741898b01a1e"/><project name="couchstore" path="couchstore" revision="29579bd47f7c916c43116722b8f4962b4ea9fff0"/><project name="geocouch" path="geocouch" revision="b0bd742551639c52030c070e5bf9390edbb536ba"/><project name="mccouch" path="mccouch" revision="88701cc326bc3dde4ed072bb8441be83adcfb2a5"/><project name="testrunner" path="testrunner" revision="48fc95d4e1009d0f40a2c4e2e59448dc3e4fcad3"/><project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/><project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/><project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/><project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/><project name="gperftools" path="gperftools" revision="8f60ba949fb8576c530ef4be148bff97106ddc59" remote="couchbase"/><project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/></manifest>

        logs:
        http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.114-diag.txt.gz
        http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.115-diag.txt.gz
        http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.121-diag.txt.gz
        http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.122-diag.txt.gz

        2012-10-17 01:31:46.990 ns_orchestrator:2:info:message(ns_1@10.3.3.115) - Rebalance exited with reason {{{{badmatch,
        {error,

        {error, <<"Partition 672 not in active nor passive set">>}

        }},
        [

        {capi_set_view_manager,handle_call,3}

        ,

        {gen_server,handle_msg,5}

        ,

        {gen_server,init_it,6}

        ,

        {proc_lib,init_p_do_apply,3}

        ]},
        {gen_server,call,
        ['capi_set_view_manager-default',

        {wait_index_updated,672}

        ,
        infinity]}},
        {gen_server,call,
        [

        {'janitor_agent-default','ns_1@10.3.3.115'}

        ,
        {if_rebalance,<0.7393.100>,
        {wait_index_updated,672}},
        infinity]}}

        Show
        iryna iryna added a comment - reproduced in 1850 <manifest><remote name="couchbase" fetch="git://10.1.1.210/"/><remote name="membase" fetch="git://10.1.1.210/"/><remote name="apache" fetch="git://github.com/apache/"/><remote name="erlang" fetch="git://github.com/erlang/"/><default remote="couchbase" revision="master"/><project name="tlm" path="tlm" revision="ab70f6d42f46621ec576889e57cb37ac2d64a84b"><copyfile src="Makefile.top" dest="Makefile"/></project><project name="bucket_engine" path="bucket_engine" revision="70b3624abc697b7d18bf3d57f331b7674544e1e7"/><project name="ep-engine" path="ep-engine" revision="25b403263ccd67ffe3205a474d8f93a21f2936d0"/><project name="libconflate" path="libconflate" revision="2cc8eff8e77d497d9f03a30fafaecb85280535d6"/><project name="libmemcached" path="libmemcached" revision="ca739a890349ac36dc79447e37da7caa9ae819f5" remote="membase"/><project name="libvbucket" path="libvbucket" revision="00d3763593c116e8e5d97aa0b646c42885727398"/><project name="membase-cli" path="membase-cli" revision="c82db287eab652d25116b042d4627a6931722a8e" remote="membase"/><project name="memcached" path="memcached" revision="858731183b08cd6b72fa6e68c1fb4208cb87570d" remote="membase"/><project name="moxi" path="moxi" revision="52a5fa887bfff0bf719c4ee5f29634dd8707500e"/><project name="ns_server" path="ns_server" revision="65e7ebe2d45904e82e1226ddeca257a2cd9d5075"/><project name="portsigar" path="portsigar" revision="1bc865e1622fb93a3fe0d1a4cdf18eb97ed9d600"/><project name="sigar" path="sigar" revision="63a3cd1b316d2d4aa6dd31ce8fc66101b983e0b0"/><project name="couchbase-examples" path="couchbase-examples" revision="21e6161a1d064979b5c6aa99cd34ccc41c9d7aca"/><project name="couchbase-python-client" path="couchbase-python-client" revision="86b398e4fbc1f2e38d356e14df0c1bb4e3d2427b"/><project name="couchdb" path="couchdb" revision="23cec9997b38ac82cab310b7560d01db529c1ae2"/><project name="couchdbx-app" path="couchdbx-app" revision="d196377b5b1ba3ce25f1b92066e2741898b01a1e"/><project name="couchstore" path="couchstore" revision="29579bd47f7c916c43116722b8f4962b4ea9fff0"/><project name="geocouch" path="geocouch" revision="b0bd742551639c52030c070e5bf9390edbb536ba"/><project name="mccouch" path="mccouch" revision="88701cc326bc3dde4ed072bb8441be83adcfb2a5"/><project name="testrunner" path="testrunner" revision="48fc95d4e1009d0f40a2c4e2e59448dc3e4fcad3"/><project name="otp" path="otp" revision="b6dc1a844eab061d0a7153d46e7e68296f15a504" remote="erlang"/><project name="icu4c" path="icu4c" revision="26359393672c378f41f2103a8699c4357c894be7" remote="couchbase"/><project name="snappy" path="snappy" revision="5681dde156e9d07adbeeab79666c9a9d7a10ec95" remote="couchbase"/><project name="v8" path="v8" revision="447decb75060a106131ab4de934bcc374648e7f2" remote="couchbase"/><project name="gperftools" path="gperftools" revision="8f60ba949fb8576c530ef4be148bff97106ddc59" remote="couchbase"/><project name="pysqlite" path="pysqlite" revision="0ff6e32ea05037fddef1eb41a648f2a2141009ea" remote="couchbase"/></manifest> logs: http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.114-diag.txt.gz http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.115-diag.txt.gz http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.121-diag.txt.gz http://qa.hq.northscale.net/job/centos-64-2.0-view-query-tests/516/artifact/logs/testrunner-12-Oct-16_14-53-26/d1387940-7fbd-4e43-91ef-460736b2e37d-10.3.3.122-diag.txt.gz 2012-10-17 01:31:46.990 ns_orchestrator:2:info:message(ns_1@10.3.3.115) - Rebalance exited with reason {{{{badmatch, {error, {error, <<"Partition 672 not in active nor passive set">>} }}, [ {capi_set_view_manager,handle_call,3} , {gen_server,handle_msg,5} , {gen_server,init_it,6} , {proc_lib,init_p_do_apply,3} ]}, {gen_server,call, ['capi_set_view_manager-default', {wait_index_updated,672} , infinity]}}, {gen_server,call, [ {'janitor_agent-default','ns_1@10.3.3.115'} , {if_rebalance,<0.7393.100>, {wait_index_updated,672}}, infinity]}}
        iryna iryna made changes -
        Resolution Fixed [ 1 ]
        Status Resolved [ 5 ] Reopened [ 4 ]
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Thanks for report. I managed to understand what happened by looking at MB-6955. Fix is coming soon.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Thanks for report. I managed to understand what happened by looking at MB-6955 . Fix is coming soon.
        alkondratenko Aleksey Kondratenko (Inactive) made changes -
        Status Reopened [ 4 ] Resolved [ 5 ]
        Resolution Duplicate [ 3 ]
        farshid Farshid Ghods (Inactive) made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            alkondratenko Aleksey Kondratenko (Inactive)
            Reporter:
            iryna iryna
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes