Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-8008

[system test]rebalance failure due to bad replicators after rebalance

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.1.0
    • Component/s: ns_server
    • Security Level: Public
    • Environment:
      windows server 2008 R2 64bit ssd vms

      Description

      Install couchbase server 2.0.1-185 on 9 windows ssd vms.
      Create a cluster with 7 nodes
      10.3.121.173
      10.3.121.169
      10.3.121.171
      10.3.3.214
      10.3.121.47
      10.3.3.180
      10.3.3.181

      Create 2 buckets:
      "buckets" : {"default" :

      {"quota": "3500 MB", "replicas": "1", "replica_index": "enable"}

      ,
      "sasl": {"count": "1", "quota": "2000MB", "replicas": "1", "replica_index": "disable"}}

      Create 2 design docs:
      "ddocs" : {"create": [{"ddoc":"ddoc1", "view":"view1", "map":"function(doc,meta)

      {emit(doc.city,null);}", "bucket":"default"},
      {"ddoc":"ddoc1", "view":"view2", "map":"function(doc,meta){emit(doc.city, [doc.st, doc.email]);}", "bucket":"default"},
      {"ddoc":"ddoc2", "view":"view1", "map":"function(doc,meta){emit(doc.city,null);}

      ", "bucket":"saslbucket"},
      {"ddoc":"ddoc2", "view":"view2", "map":"function(doc,meta)

      {emit(doc.city, [doc.st, doc.email]);}

      ", "bucket":"saslbucket"}]}

      no xdcr created

      Load 20+ million items to both bucket until resident ratio on both bucket around 70%
      Access cluster in 3 hours with spec in this page and run query with 200~500 ops per second on both buckets
      5% expire - 5% delete - 5% add - 5% update , 80% gets - 3 hours access phase for default
      70% expire - 20% delete - 5% add - 5% gets - 3 hours access phase for saslbucket

      Then the test go through a "swap rebalance the orchestrator node" phase and a "rebalance in one node" phase
      In "rebalance out one node" phase, rebalance failed with error:

      Started rebalancing bucket saslbucket ns_rebalancer000 ns_1@10.3.121.171 03:42:54 - Sat Mar 30, 2013
      Starting rebalance, KeepNodes = ['ns_1@10.3.3.181','ns_1@10.3.121.243',
      'ns_1@10.3.121.47','ns_1@10.3.3.182',
      'ns_1@10.3.3.214','ns_1@10.3.121.171',
      'ns_1@10.3.121.169'], EjectNodes = ['ns_1@10.3.121.173']
      ns_orchestrator004 ns_1@10.3.121.171 03:42:54 - Sat Mar 30, 2013

      Bad replicators after rebalance:
      Missing = [

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',173}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',174}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',175}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',176}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',177}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',178}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',246}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',247}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',248}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',249}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',250}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',251}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',252}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',253}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',254}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',255}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',256}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',257}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',258}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',259}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',260}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',261}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',262}

      ,

      {'ns_1@10.3.121.171','ns_1@10.3.3.181',263}

      ]
      Extras = [] ns_rebalancer002 ns_1@10.3.121.171 06:18:36 - Sat Mar 30, 2013
      Rebalance exited with reason bad_replicas ns_orchestrator002 ns_1@10.3.121.171 06:18:36 - Sat Mar 30, 2013
      Shutting down bucket "saslbucket" on 'ns_1@10.3.121.173' for deletion ns_memcached002 ns_1@10.3.121.173 06:18:37 - Sat Mar 30, 2013

      Link to manifest file of this build http://builds.hq.northscale.net/latestbuilds/couchbase-server-community_x86_64_2.0.1-185-rel.setup.exe.manifest.xml

      Diags link https://s3.amazonaws.com/bugdb/jira/9ssd-win2.0.1-185-ns-diag-20130330.txt.zip

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        maria Maria McDuff (Inactive) added a comment -

        moving to 2.0.2.

        Show
        maria Maria McDuff (Inactive) added a comment - moving to 2.0.2.
        Hide
        maria Maria McDuff (Inactive) added a comment -

        siri will do profiling and may need help from alk.

        Show
        maria Maria McDuff (Inactive) added a comment - siri will do profiling and may need help from alk.
        Hide
        kzeller kzeller added a comment -

        Confirmed with Abhinav 4/16 - not a RN candidate

        Show
        kzeller kzeller added a comment - Confirmed with Abhinav 4/16 - not a RN candidate
        Hide
        maria Maria McDuff (Inactive) added a comment -

        per bug triage, bumping up to critical.

        Show
        maria Maria McDuff (Inactive) added a comment - per bug triage, bumping up to critical.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Duplicate of MB-7902

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Duplicate of MB-7902
        Hide
        maria Maria McDuff (Inactive) added a comment -
        Show
        maria Maria McDuff (Inactive) added a comment - MB-7902

          People

          • Assignee:
            alkondratenko Aleksey Kondratenko (Inactive)
            Reporter:
            Chisheng Chisheng Hong (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes