Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-38418

[Magma]: Rebalancing in 3 nodes failed with mover crashed.

    XMLWordPrintable

Details

    Description

      Steps:
      1. Create a 2 node cluster:

      +--------------+-----------------+--------------+
      | Nodes        | Services        | Status       |
      +--------------+-----------------+--------------+
      | 172.23.106.9 | index, kv, n1ql | Cluster node |
      | 172.23.106.8 | None            | <--- IN ---  |
      +--------------+-----------------+--------------+
      

      2. Create a default bucket and load 10M items.

      Bucket statistics
      +---------+---------+----------+-----+----------+------------+------------+------------+
      | Bucket  | Type    | Replicas | TTL | Items    | RAM Quota  | RAM Used   | Disk Used  |
      +---------+---------+----------+-----+----------+------------+------------+------------+
      | default | membase | 1        | 0   | 10000000 | 2986344448 | 1952474768 | 6289864111 |
      +---------+---------+----------+-----+----------+------------+------------+------------+
      

      3. Rebalance in 3 nodes:

      +----------------+-----------------+--------------+
      | Nodes          | Services        | Status       |
      +----------------+-----------------+--------------+
      | 172.23.106.9   | index, kv, n1ql | Cluster node |
      | 172.23.106.8   | kv              | Cluster node |
      | 172.23.104.201 | None            | <--- IN ---  |
      | 172.23.104.222 | None            | <--- IN ---  |
      | 172.23.104.199 | None            | <--- IN ---  |
      +----------------+-----------------+--------------+
      

      4. While rebalance in running in step-3 update 5M docs.
      5. Rebalance failed.

      {u'code': 0, u'module': u'ns_log', u'type': u'info', u'node': u'ns_1@172.23.106.8', u'tstamp': 1585080573616L, u'shortText': u'message', u'serverTime': u'2020-03-24T13:09:33.616Z', u'text': u"Service 'memcached' exited with status 139. Restarting. Messages:\n2020-03-24T13:09:33.402115-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f6822f67000+0x21c67c]\n2020-03-24T13:09:33.402122-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f6822f67000+0x858c5]\n2020-03-24T13:09:33.402128-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f6822f67000+0x88749]\n2020-03-24T13:09:33.402133-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f6822f67000+0x8aaa0]\n2020-03-24T13:09:33.402137-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f6822f67000+0x8adfe]\n2020-03-24T13:09:33.402145-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f6822f67000+0x140c13]\n2020-03-24T13:09:33.402150-07:00 CRITICAL     /opt/couchbase/bin/../lib/libep.so() [0x7f6822f67000+0x13990f]\n2020-03-24T13:09:33.402158-07:00 CRITICAL     /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f6821ac5000+0x10397]\n2020-03-24T13:09:33.402166-07:00 CRITICAL     /lib64/libpthread.so.0() [0x7f681f10b000+0x7dd5]\n2020-03-24T13:09:33.402356-07:00 CRITICAL     /lib64/libc.so.6(clone+0x6d) [0x7f681ed3e000+0xfdead]"}
      2020-03-24 13:09:34,250 | test  | ERROR   | pool-8-thread-6 | [rest_client:print_UI_logs:2528] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.106.9', u'tstamp': 1585080573593L, u'shortText': u'message', u'serverTime': u'2020-03-24T13:09:33.593Z', u'text': u'Rebalance exited with reason {mover_crashed,\n                              {unexpected_exit,\n                               {\'EXIT\',<0.29055.0>,\n                                {{{{nocatch,{error,closed}},\n                                   [{mc_binary,recv_with_data,4,\n                                     [{file,"src/mc_binary.erl"},{line,45}]},\n                                    {mc_binary,quick_stats_recv,3,\n                                     [{file,"src/mc_binary.erl"},{line,52}]},\n                                    {mc_binary,quick_stats_loop_enter,5,\n                                     [{file,"src/mc_binary.erl"},{line,104}]},\n                                    {mc_binary,quick_stats,5,\n                                     [{file,"src/mc_binary.erl"},{line,89}]},\n                                    {mc_client_binary,get_dcp_docs_estimate,\n                                     3,\n                                     [{file,"src/mc_client_binary.erl"},\n                                      {line,714}]},\n                                    {ns_memcached,do_handle_call,3,\n                                     [{file,"src/ns_memcached.erl"},\n                                      {line,565}]},\n                                    {ns_memcached,worker_loop,3,\n                                     [{file,"src/ns_memcached.erl"},\n                                      {line,247}]},\n                                    {proc_lib,init_p_do_apply,3,\n                                     [{file,"proc_lib.erl"},{line,247}]}]},\n                                  {gen_server,call,\n                                   [\'ns_memcached-default\',\n                                    {get_dcp_docs_estimate,53,\n                                     "replication:ns_1@172.23.106.8->ns_1@172.23.104.222:default"},\n                                    180000]}},\n                                 {gen_server,call,\n                                  [{\'janitor_agent-default\',\n                                    \'ns_1@172.23.106.8\'},\n                                   {if_rebalance,<0.22748.0>,\n                                    {wait_dcp_data_move,\n                                     [\'ns_1@172.23.104.222\'],\n                                     52}},\n                                   infinity]}}}}}.\nRebalance Operation Id = 46b59b02f1ae924b6bffd2dd9f0682d6'}
      2020-03-24 13:09:34,250 | test  | ERROR   | pool-8-thread-6 | [rest_client:print_UI_logs:2528] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.106.9', u'tstamp': 1585080573583L, u'shortText': u'message', u'serverTime': u'2020-03-24T13:09:33.583Z', u'text': u'Worker <0.29029.0> (for action {move,{52,\n                                      [\'ns_1@172.23.106.8\',\n                                       \'ns_1@172.23.106.9\'],\n                                      [\'ns_1@172.23.104.222\',\n                                       \'ns_1@172.23.106.8\'],\n                                      []}}) exited with reason {unexpected_exit,\n                                                                {\'EXIT\',\n                                                                 <0.29055.0>,\n                                                                 {{{{nocatch,\n                                                                     {error,\n                                                                      closed}},\n                                                                    [{mc_binary,\n                                                                      recv_with_data,\n                                                                      4,\n                                                                      [{file,\n                                                                        "src/mc_binary.erl"},\n                                                                       {line,\n                                                                        45}]},\n                                                                     {mc_binary,\n                                                                      quick_stats_recv,\n                                                                      3,\n                                                                      [{file,\n                                                                        "src/mc_binary.erl"},\n                                                                       {line,\n                                                                        52}]},\n                                                                     {mc_binary,\n                                                                      quick_stats_loop_enter,\n                                                                      5,\n                                                                      [{file,\n                                                                        "src/mc_binary.erl"},\n                                                                       {line,\n                                                                        104}]},\n                                                                     {mc_binary,\n                                                                      quick_stats,\n                                                                      5,\n                                                                      [{file,\n                                                                        "src/mc_binary.erl"},\n                                                                       {line,\n                                                                        89}]},\n                                                                     {mc_client_binary,\n                                                                      get_dcp_docs_estimate,\n                                                                      3,\n                                                                      [{file,\n                                                                        "src/mc_client_binary.erl"},\n                                                                       {line,\n                                                                        714}]},\n                                                                     {ns_memcached,\n                                                                      do_handle_call,\n                                                                      3,\n                                                                      [{file,\n                                                                        "src/ns_memcached.erl"},\n                                                                       {line,\n                                                                        565}]},\n                                                                     {ns_memcached,\n                                                                      worker_loop,\n                                                                      3,\n                                                                      [{file,\n                                                                        "src/ns_memcached.erl"},\n                                                                       {line,\n                                                                        247}]},\n                                                                     {proc_lib,\n                                                                      init_p_do_apply,\n                                                                      3,\n                                                                      [{file,\n                                                                        "proc_lib.erl"},\n                                                                       {line,\n                                                                        247}]}]},\n                                                                   {gen_server,\n                                                                    call,\n                                                                    [\'ns_memcached-default\',\n                                                                     {get_dcp_docs_estimate,\n                                                                      53,\n                                                                      "replication:ns_1@172.23.106.8->ns_1@172.23.104.222:default"},\n                                                                     180000]}},\n                                                                  {gen_server,\n                                                                   call,\n                                                                   [{\'janitor_agent-default\',\n                                                                     \'ns_1@172.23.106.8\'},\n                                                                    {if_rebalance,\n                                                                     <0.22748.0>,\n                                                                     {wait_dcp_data_move,\n                                                                      [\'ns_1@172.23.104.222\'],\n                                                                      52}},\n                                                                    infinity]}}}}'}
      

      QE test:

      num_items=10000000,GROUP=P0;magma,bucket_storage=magma,bucket_eviction_policy=fullEviction,randomize_value=True,vbuckets=128 -t rebalance_new.rebalance_in.RebalanceInTests.test_rebalance_in_with_ops,nodes_init=2,nodes_in=3,replicas=1,num_items=50000,doc_ops=update,max_verify=10000,value_size=1024,GROUP=P0;SET1;magma
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              scott.lashley Scott Lashley (Inactive)
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty