Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
7.0.0-1622
-
Untriaged
-
Unknown
Description
Steps:
1. Create a 2 node cluster:
+--------------+-----------------+--------------+
|
| Nodes | Services | Status |
|
+--------------+-----------------+--------------+
|
| 172.23.106.9 | index, kv, n1ql | Cluster node |
|
| 172.23.106.8 | None | <--- IN --- |
|
+--------------+-----------------+--------------+
|
2. Create a default bucket and load 10M items.
Bucket statistics
|
+---------+---------+----------+-----+----------+------------+------------+------------+
|
| Bucket | Type | Replicas | TTL | Items | RAM Quota | RAM Used | Disk Used |
|
+---------+---------+----------+-----+----------+------------+------------+------------+
|
| default | membase | 1 | 0 | 10000000 | 2986344448 | 1952474768 | 6289864111 |
|
+---------+---------+----------+-----+----------+------------+------------+------------+
|
3. Rebalance in 3 nodes:
+----------------+-----------------+--------------+
|
| Nodes | Services | Status |
|
+----------------+-----------------+--------------+
|
| 172.23.106.9 | index, kv, n1ql | Cluster node |
|
| 172.23.106.8 | kv | Cluster node |
|
| 172.23.104.201 | None | <--- IN --- |
|
| 172.23.104.222 | None | <--- IN --- |
|
| 172.23.104.199 | None | <--- IN --- |
|
+----------------+-----------------+--------------+
|
4. While rebalance in running in step-3 update 5M docs.
5. Rebalance failed.
{u'code': 0, u'module': u'ns_log', u'type': u'info', u'node': u'ns_1@172.23.106.8', u'tstamp': 1585080573616L, u'shortText': u'message', u'serverTime': u'2020-03-24T13:09:33.616Z', u'text': u"Service 'memcached' exited with status 139. Restarting. Messages:\n2020-03-24T13:09:33.402115-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f6822f67000+0x21c67c]\n2020-03-24T13:09:33.402122-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f6822f67000+0x858c5]\n2020-03-24T13:09:33.402128-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f6822f67000+0x88749]\n2020-03-24T13:09:33.402133-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f6822f67000+0x8aaa0]\n2020-03-24T13:09:33.402137-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f6822f67000+0x8adfe]\n2020-03-24T13:09:33.402145-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f6822f67000+0x140c13]\n2020-03-24T13:09:33.402150-07:00 CRITICAL /opt/couchbase/bin/../lib/libep.so() [0x7f6822f67000+0x13990f]\n2020-03-24T13:09:33.402158-07:00 CRITICAL /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0() [0x7f6821ac5000+0x10397]\n2020-03-24T13:09:33.402166-07:00 CRITICAL /lib64/libpthread.so.0() [0x7f681f10b000+0x7dd5]\n2020-03-24T13:09:33.402356-07:00 CRITICAL /lib64/libc.so.6(clone+0x6d) [0x7f681ed3e000+0xfdead]"}
|
2020-03-24 13:09:34,250 | test | ERROR | pool-8-thread-6 | [rest_client:print_UI_logs:2528] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.106.9', u'tstamp': 1585080573593L, u'shortText': u'message', u'serverTime': u'2020-03-24T13:09:33.593Z', u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n {\'EXIT\',<0.29055.0>,\n {{{{nocatch,{error,closed}},\n [{mc_binary,recv_with_data,4,\n [{file,"src/mc_binary.erl"},{line,45}]},\n {mc_binary,quick_stats_recv,3,\n [{file,"src/mc_binary.erl"},{line,52}]},\n {mc_binary,quick_stats_loop_enter,5,\n [{file,"src/mc_binary.erl"},{line,104}]},\n {mc_binary,quick_stats,5,\n [{file,"src/mc_binary.erl"},{line,89}]},\n {mc_client_binary,get_dcp_docs_estimate,\n 3,\n [{file,"src/mc_client_binary.erl"},\n {line,714}]},\n {ns_memcached,do_handle_call,3,\n [{file,"src/ns_memcached.erl"},\n {line,565}]},\n {ns_memcached,worker_loop,3,\n [{file,"src/ns_memcached.erl"},\n {line,247}]},\n {proc_lib,init_p_do_apply,3,\n [{file,"proc_lib.erl"},{line,247}]}]},\n {gen_server,call,\n [\'ns_memcached-default\',\n {get_dcp_docs_estimate,53,\n "replication:ns_1@172.23.106.8->ns_1@172.23.104.222:default"},\n 180000]}},\n {gen_server,call,\n [{\'janitor_agent-default\',\n \'ns_1@172.23.106.8\'},\n {if_rebalance,<0.22748.0>,\n {wait_dcp_data_move,\n [\'ns_1@172.23.104.222\'],\n 52}},\n infinity]}}}}}.\nRebalance Operation Id = 46b59b02f1ae924b6bffd2dd9f0682d6'}
|
2020-03-24 13:09:34,250 | test | ERROR | pool-8-thread-6 | [rest_client:print_UI_logs:2528] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.106.9', u'tstamp': 1585080573583L, u'shortText': u'message', u'serverTime': u'2020-03-24T13:09:33.583Z', u'text': u'Worker <0.29029.0> (for action {move,{52,\n [\'ns_1@172.23.106.8\',\n \'ns_1@172.23.106.9\'],\n [\'ns_1@172.23.104.222\',\n \'ns_1@172.23.106.8\'],\n []}}) exited with reason {unexpected_exit,\n {\'EXIT\',\n <0.29055.0>,\n {{{{nocatch,\n {error,\n closed}},\n [{mc_binary,\n recv_with_data,\n 4,\n [{file,\n "src/mc_binary.erl"},\n {line,\n 45}]},\n {mc_binary,\n quick_stats_recv,\n 3,\n [{file,\n "src/mc_binary.erl"},\n {line,\n 52}]},\n {mc_binary,\n quick_stats_loop_enter,\n 5,\n [{file,\n "src/mc_binary.erl"},\n {line,\n 104}]},\n {mc_binary,\n quick_stats,\n 5,\n [{file,\n "src/mc_binary.erl"},\n {line,\n 89}]},\n {mc_client_binary,\n get_dcp_docs_estimate,\n 3,\n [{file,\n "src/mc_client_binary.erl"},\n {line,\n 714}]},\n {ns_memcached,\n do_handle_call,\n 3,\n [{file,\n "src/ns_memcached.erl"},\n {line,\n 565}]},\n {ns_memcached,\n worker_loop,\n 3,\n [{file,\n "src/ns_memcached.erl"},\n {line,\n 247}]},\n {proc_lib,\n init_p_do_apply,\n 3,\n [{file,\n "proc_lib.erl"},\n {line,\n 247}]}]},\n {gen_server,\n call,\n [\'ns_memcached-default\',\n {get_dcp_docs_estimate,\n 53,\n "replication:ns_1@172.23.106.8->ns_1@172.23.104.222:default"},\n 180000]}},\n {gen_server,\n call,\n [{\'janitor_agent-default\',\n \'ns_1@172.23.106.8\'},\n {if_rebalance,\n <0.22748.0>,\n {wait_dcp_data_move,\n [\'ns_1@172.23.104.222\'],\n 52}},\n infinity]}}}}'}
|
QE test:
num_items=10000000,GROUP=P0;magma,bucket_storage=magma,bucket_eviction_policy=fullEviction,randomize_value=True,vbuckets=128 -t rebalance_new.rebalance_in.RebalanceInTests.test_rebalance_in_with_ops,nodes_init=2,nodes_in=3,replicas=1,num_items=50000,doc_ops=update,max_verify=10000,value_size=1024,GROUP=P0;SET1;magma
|
Attachments
Issue Links
- relates to
-
MB-38682 Rebalance failed in the second step while updating replica from 0 > 1 > 2.
- Closed