Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-8677

[system test] [windows] rebalance failed with errors Partition xx not in active nor passive set

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 2.2.0
    • 2.1.1
    • ns_server
    • Security Level: Public
    • windows 2008 R2 64-bit
    • Windows 64-bit

    Description

      Environment
      9 nodes windows 2008 R2 64-bit, 8GB RAM, 4 core cpu, SSD drive
      1:10.3.121.173
      2:10.3.121.169
      3:10.3.3.171
      4:10.3.3.214
      5:10.3.121.47
      6:10.3.3.180
      7:10.3.3.181
      8:10.3.3.182
      9:10.3.121.243

      Couchbase cluster setup:

      • create 7 nodes couchbase server cluster with build 7.1.1-763
      • create 2 buckets, default (3GB) and sasl (3GB)
      • each bucket has one doc and one view
      • at initial phase, load keys into both bucket until active resident ration went down to 70%
      • at access phase, do update, set, delete and expired items at both bucket.

      Test phases:

      • do rebalance in, out, swap rebalance. ==> passed
      • restart couchbase server on one node with and without load ==> passed
      • failover and add back ==> rebalance failed.

      Error from failover rebalance

      Rebalance exited with reason {unexpected_exit,
      {'EXIT',<0.10446.55>,
      {{{{badmatch,
      {error,

      {error, <<"Partition 1023 not in active nor passive set">>}}},
      [{capi_set_view_manager,handle_call,3}, {gen_server,handle_msg,5}, {gen_server,init_it,6}, {proc_lib,init_p_do_apply,3}]},
      {gen_server,call,
      ['capi_set_view_manager-sasl', {wait_index_updated,1023},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-sasl','ns_1@10.3.121.173'},
      {if_rebalance,<0.25850.54>,
      {wait_index_updated,1023}},
      infinity]}}}}
      ns_orchestrator002
      ns_1@10.3.121.243
      11:07:36 - Sun Jul 21, 2013
      <0.9735.55> exited with {unexpected_exit,
      {'EXIT',<0.10446.55>,
      {{{{badmatch,
      {error,{error, <<"Partition 1023 not in active nor passive set">>}

      }},
      [

      {capi_set_view_manager,handle_call,3}, {gen_server,handle_msg,5}, {gen_server,init_it,6}, {proc_lib,init_p_do_apply,3}]},
      {gen_server,call,
      ['capi_set_view_manager-sasl', {wait_index_updated,1023},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-sasl','ns_1@10.3.121.173'},
      {if_rebalance,<0.25850.54>,
      {wait_index_updated,1023}},
      infinity]}}}}
      ns_vbucket_mover000
      ns_1@10.3.121.243
      11:07:33 - Sun Jul 21, 2013
      Client-side error-report for user "Administrator" on node 'ns_1@10.3.3.180':
      User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:21.0) Gecko/20100101 Firefox/21.0
      Got unhandled error: TypeError: operation is undefined
      At: http://10.3.3.180:8091/js/cells.js:739
      Backtrace:
      Function: collectBacktraceViaCaller
      Args:

      ---------
      Function: appOnError
      Args:
      "TypeError: operation is undefined"
      "http://10.3.3.180:8091/js/cells.js"
      739
      ---------

      menelaus_web102
      ns_1@10.3.3.180
      11:07:29 - Sun Jul 21, 2013


      The second failover test (without add back) ==> failed
      Error from failover rebalance almost the same with the failover rebalace failed above

      Rebalance exited with reason {unexpected_exit,
      {'EXIT',<0.30880.69>,
      {{{{badmatch,
      {error, {error, <<"Partition 925 not in active nor passive set">>}}},
      [{capi_set_view_manager,handle_call,3}

      ,

      {gen_server,handle_msg,5}, {gen_server,init_it,6}, {proc_lib,init_p_do_apply,3}]},
      {gen_server,call,
      ['capi_set_view_manager-sasl', {wait_index_updated,925},
      infinity]}},
      {gen_server,call,
      [{'janitor_agent-sasl','ns_1@10.3.121.47'},
      {if_rebalance,<0.3640.69>,
      {wait_index_updated,925}},
      infinity]}}}}
      ns_orchestrator002
      ns_1@10.3.121.243
      15:49:44 - Sun Jul 21, 2013
      <0.24899.69> exited with {unexpected_exit,
      {'EXIT',<0.30880.69>,
      {{{{badmatch,
      {error, {error, <<"Partition 925 not in active nor passive set">>}}},
      [{capi_set_view_manager,handle_call,3},{gen_server,handle_msg,5}

      ,

      {gen_server,init_it,6}

      ,

      {proc_lib,init_p_do_apply,3}

      ]},
      {gen_server,call,
      ['capi_set_view_manager-sasl',

      {wait_index_updated,925}

      ,
      infinity]}},
      {gen_server,call,
      [

      {'janitor_agent-sasl','ns_1@10.3.121.47'}

      ,
      {if_rebalance,<0.3640.69>,
      {wait_index_updated,925}},
      infinity]}}}}
      ns_vbucket_mover000
      ns_1@10.3.121.243
      15:49:43 - Sun Jul 21, 2013
      Bucket "sasl" rebalance does not seem to be swap rebalance
      ns_vbucket_mover000
      ns_1@10.3.121.243
      15:35:53 - Sun Jul 21, 2013

      In both cases, the second rebalance after the first failed was passed.

      Link to manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-enterprise_x86_2.1.1-763-rel.setup.exe.manifest.xml

      Collect info file will upload soon

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            thuan Thuan Nguyen
            thuan Thuan Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty