Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
7.1.0
-
7.1.0-1696
-
Untriaged
-
Centos 64-bit
-
1
-
Yes
Description
Script to Repro
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops-temp_rebalance_magma_win.ini rerun=False,disk_optimized_thread_settings=True,get-cbcollect-info=True,bucket_storage=magma -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_swap_rebalance,nodes_init=5,nodes_swap=2,bucket_spec=magma_dgm.5_percent_dgm.5_node_2_replica_magma_512,doc_size=512,randomize_value=True,data_load_stage=during,skip_validations=False,GROUP=swap_rebalance_P0_set0'
|
Steps to Repro
1. Create a 5 node cluster
2021-11-12 01:34:25,729 | test | INFO | MainThread | [table_view:display:72] Cluster statistics
|
+----------------+----------+-----------------+-----------+----------+---------------------+-------------------+-----------------------+
|
| Node | Services | CPU_utilization | Mem_total | Mem_free | Swap_mem_used | Active / Replica | Version |
|
+----------------+----------+-----------------+-----------+----------+---------------------+-------------------+-----------------------+
|
| 172.23.136.106 | kv | 0.156666666667 | 6.00 GiB | 4.58 GiB | 1.63 GiB / 7.00 GiB | 0 / 0 | 7.1.0-1696-enterprise |
|
| 172.23.136.103 | kv | 0.991553433713 | 6.00 GiB | 4.52 GiB | 1.68 GiB / 7.00 GiB | 0 / 0 | 7.1.0-1696-enterprise |
|
| 172.23.136.104 | kv | 0.91475119769 | 6.00 GiB | 4.54 GiB | 1.67 GiB / 7.00 GiB | 0 / 0 | 7.1.0-1696-enterprise |
|
| 172.23.136.101 | kv | 0.910015166919 | 6.00 GiB | 4.48 GiB | 1.72 GiB / 7.00 GiB | 713544 / 1420352 | 7.1.0-1696-enterprise |
|
| 172.23.136.102 | kv | 3.10156328122 | 6.00 GiB | 4.55 GiB | 1.63 GiB / 7.00 GiB | 0 / 0 | 7.1.0-1696-enterprise |
|
+----------------+----------+-----------------+-----------+----------+---------------------+-------------------+-----------------------+
|
2. Create bucket/scopes/collections/data
2021-11-12 01:38:40,154 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
|
+-----------------------------------------------------------------+-----------+-----------------+----------+------------+-----+---------+-----------+----------+-----------+-----+
|
| Bucket | Type | Storage Backend | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used | ARR |
|
+-----------------------------------------------------------------+-----------+-----------------+----------+------------+-----+---------+-----------+----------+-----------+-----+
|
| L5VT2_xKXtFBcLfd_QevK4bT0zv3QiDW6IaZPG42CevEkRkGoe%jkb-26-12000 | couchbase | magma | 2 | none | 0 | 3000000 | 9.77 GiB | 6.15 GiB | 3.50 GiB | 100 |
|
+-----------------------------------------------------------------+-----------+-----------------+----------+------------+-----+---------+-----------+----------+-----------+-----+
|
3.Add one node (172.23.136.105) and Remove 2 nodes(172.23.136.106 and 172.23.136.102) and start rebalance.
2021-11-12 01:38:52,213 | test | INFO | pool-6-thread-22 | [table_view:display:72] Rebalance Overview
|
+----------------+----------+-----------------------+---------------+--------------+
|
| Nodes | Services | Version | CPU | Status |
|
+----------------+----------+-----------------------+---------------+--------------+
|
| 172.23.136.106 | kv | 7.1.0-1696-enterprise | 10.4666666667 | --- OUT ---> |
|
| 172.23.136.103 | kv | 7.1.0-1696-enterprise | 12.5886140348 | Cluster node |
|
| 172.23.136.104 | kv | 7.1.0-1696-enterprise | 9.06166666667 | Cluster node |
|
| 172.23.136.101 | kv | 7.1.0-1696-enterprise | 10.4714921418 | Cluster node |
|
| 172.23.136.102 | kv | 7.1.0-1696-enterprise | 9.06166666667 | --- OUT ---> |
|
| 172.23.136.105 | kv | 7.1.0-1696-enterprise | 0 | Cluster node |
|
+----------------+----------+-----------------------+---------------+--------------+
|
Rebalance fails as shown below because we see a memcached crash.
172.23.136.101:
2021-11-12 01:39:35,654 | test | ERROR | pool-6-thread-22 | [rest_client:print_UI_logs:2784] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.136.101', u'tstamp': 1636709967895L, u'shortText': u'message', u'serverTime': u'2021-11-12T01:39:27.895Z', u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n {\'EXIT\',<0.1835.1>,\n {{{lost_connection,\n [{ns_memcached,worker_loop,3,\n [{file,"src/ns_memcached.erl"},\n {line,209}]},\n {proc_lib,init_p_do_apply,3,\n [{file,"proc_lib.erl"},{line,226}]}]},\n {gen_server,call,\n [\'ns_memcached-L5VT2_xKXtFBcLfd_QevK4bT0zv3QiDW6IaZPG42CevEkRkGoe%jkb-26-12000\',\n {set_vbucket,994,active,\n [[\'ns_1@172.23.136.106\',\n \'ns_1@172.23.136.104\',\n \'ns_1@172.23.136.101\']]},\n 180000]}},\n {gen_server,call,\n [{\'janitor_agent-L5VT2_xKXtFBcLfd_QevK4bT0zv3QiDW6IaZPG42CevEkRkGoe%jkb-26-12000\',\n \'ns_1@172.23.136.106\'},\n {if_rebalance,<0.511.1>,\n {update_vbucket_state,994,active,paused,\n undefined,\n [[\'ns_1@172.23.136.106\',\n \'ns_1@172.23.136.104\',\n \'ns_1@172.23.136.101\']]}},\n infinity]}}}}}.\nRebalance Operation Id = b413500a9e7ef72f63f40ab4f1e36723'}
|
grep CRITICAL on 172.23.136.106
$grep CRITICAL memcached.log.0000*
|
memcached.log.000013.txt:2021-11-12T01:39:27.297618-08:00 CRITICAL Breakpad caught a crash (Couchbase version 7.1.0-1696). Writing crash dump to c:/Program Files/Couchbase/Server/var/lib/couchbase/crash\d754b25b-b14b-4f36-9d18-fa19f722e211.dmp before terminating.
|
memcached.log.000013.txt:2021-11-12T01:39:27.297632-08:00 CRITICAL Stack backtrace of crashed thread:
|
memcached.log.000013.txt:2021-11-12T01:39:27.299891-08:00 CRITICAL #0 c:\Program Files\Couchbase\Server\bin\memcached.exe(FileOpsInterface::set_mprotect_enabled+10787324) [0x00007FF61B17D4C0]
|
memcached.log.000013.txt:2021-11-12T01:39:27.299923-08:00 CRITICAL #1 c:\Program Files\Couchbase\Server\bin\memcached.exe(FileOpsInterface::set_mprotect_enabled+11330637) [0x00007FF61B201F11]
|
memcached.log.000013.txt:2021-11-12T01:39:27.299947-08:00 CRITICAL #2 C:\Windows\System32\KERNEL32.DLL(BaseThreadInitThunk+20) [0x00007FFF53EB84D4]
|
memcached.log.000013.txt:2021-11-12T01:39:27.299969-08:00 CRITICAL #3 C:\Windows\SYSTEM32\ntdll.dll(RtlUserThreadStart+33) [0x00007FFF5483E8B1]
|
|
Administrator@WIN-1T98IIFH727 /cygdrive/c/Program Files/Couchbase/Server/var/lib/couchbase/logs
|
We see minidump d754b25b-b14b-4f36-9d18-fa19f722e211.dmp on 172.23.136.106. We just started running magma tests on windows, so we don't have an baseline.
cbcollect_info attached.
Attachments
Issue Links
- duplicates
-
MB-49465 Magma implicit compaction crashes when constructing compaction context if VBucket no longer exists in memory
-
- Closed
-