Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.2.0
-
7.2.0-5318-enterprise
-
Untriaged
-
Linux x86_64
-
-
0
-
Yes
-
KV 2023-2
Description
Steps:
- 4 node cluster, 3 buckets
+----------------+-----------------+-----------+-----------+-------------------+
| Node | CPU_utilization | Mem_total | Mem_free | Active / Replica |
+----------------+-----------------+-----------+-----------+-------------------+
| 172.23.122.141 | 8.14710609887 | 11.74 GiB | 10.06 GiB | 4930800 / 4932600 |
| 172.23.122.150 | 8.33134978994 | 11.74 GiB | 9.82 GiB | 4930800 / 4934000 |
| 172.23.122.139 | 7.76516863607 | 11.74 GiB | 10.00 GiB | 4931700 / 4924100 |
| 172.23.122.163 | 7.32479479762 | 11.74 GiB | 9.89 GiB | 4931700 / 4934300 |
+----------------+-----------------+-----------+-----------+-------------------++---------+-----------+------------+----------+----------+-----------+------------+------------+-----+
| Bucket | Type | Storage | Replicas | Items | RAM Quota | RAM Used | Disk Used | ARR |
+---------+-----------+------------+----------+----------+-----------+------------+------------+-----+
| bucket1 | couchbase | couchstore | 1 | 100000 | 7.81 GiB | 209.11 MiB | 205.54 MiB | 100 |
| bucket2 | couchbase | magma | 1 | 50000 | 3.91 GiB | 292.70 MiB | 232.96 MiB | 100 |
| default | couchbase | magma | 1 | 19575000 | 2.00 GiB | 1.35 GiB | 15.65 GiB | 5.25|
+---------+-----------+------------+----------+----------+-----------+------------+------------+-----+
- Load initial data into all buckets
- Start dedupe load and rebalance-in a node - Time: 2023-04-27 22:37:16,607
+----------------+-----------------------+------+--------------+
| Nodes | Version | CPU | Status |
+----------------+-----------------------+------+--------------+
| 172.23.122.141 | 7.2.0-5318-enterprise | 5.86 | Cluster node |
| 172.23.122.150 | 7.2.0-5318-enterprise | 6.47 | Cluster node |
| 172.23.122.139 | 7.2.0-5318-enterprise | 5.82 | Cluster node |
| 172.23.122.163 | 7.2.0-5318-enterprise | 7.32 | Cluster node |
| 172.23.122.161 | | | <--- IN --- |
+----------------+-----------------------+------+--------------+
- Stop rebalance when rebalance reaches greater then 22% - Time: 2023-04-27 22:37:27,526
- Perform data load
- Add new node '172.23.122.162' and start rebalance again - Time: 2023-04-27 22:42:47,959
- Stop rebalance when rebalance reaches greater then 45% - Time: 2023-04-27 22:43:08,229
- Start data load and perform the rebalance of the cluster - Time: 2023-04-27 22:48:43,766
+----------------+------+--------------+-----------------------+
|
| Nodes | CPU | Status | Membership / Recovery |
|
+----------------+------+--------------+-----------------------+
|
| 172.23.122.162 | 2.05 | Cluster node | active / none |
|
| 172.23.122.141 | 4.12 | Cluster node | active / none |
|
| 172.23.122.161 | 1.89 | Cluster node | active / none |
|
| 172.23.122.150 | 4.10 | Cluster node | active / none |
|
| 172.23.122.139 | 3.88 | Cluster node | active / none |
|
| 172.23.122.163 | 4.26 | Cluster node | active / none |
|
+----------------+------+--------------+-----------------------+
|
- Stop rebalance when it reaches 69% - Time: 2023-04-27 22:49:04,309
- Again rebalance the cluster (Time: 2023-04-27 22:53:41,253) and stop when it reaches around 81% (Time: 2023-04-27 22:54:22,801)
- Trigger rebalance again (Fails this time with the crash on .141 node) full_bt.log
[ns_server:error,2023-04-27T22:59:19.588-07:00,ns_1@172.23.122.141:<0.24762.5>:dcp_proxy:handle_info:111]Socket #Port<0.27408> was closed. Closing myself. State = {state,
|
#Port<0.27408>,
|
{consumer,
|
"replication:ns_1@172.23.122.163->ns_1@172.23.122.141:default",
|
'ns_1@172.23.122.141',
|
"default"},
|
undefined,<<>>,
|
dcp_consumer_conn,
|
{state,idle,
|
[856,859,860,861,
|
862,866,867,868,
|
869,871,874,876,
|
877,878,883,887,
|
888,889,890,891,
|
893,894,897,899,
|
900,903,904,905,
|
906,907,911,917,
|
919,920,924,926,
|
927,929,931,932,
|
934,968]},
|
#Port<0.27409>,
|
<0.24763.5>,false}
|
[ns_server:error,2023-04-27T22:59:19.588-07:00,ns_1@172.23.122.141:<0.24746.5>:dcp_proxy:handle_info:111]Socket #Port<0.27402> was closed. Closing myself. State = {state,
|
#Port<0.27402>,
|
{consumer,
|
"replication:ns_1@172.23.122.161->ns_1@172.23.122.141:default",
|
'ns_1@172.23.122.141',
|
"default"},
|
undefined,<<>>,
|
dcp_consumer_conn,
|
{state,idle,
|
[6,8,13,19,22,24,
|
29,31,34,64,68,
|
84,603,605,620,
|
630,634,641,643,
|
648,649,652,663,
|
855,870,875,879,
|
880,881,885,886,
|
892,896,898,928]},
|
#Port<0.27403>,
|
<0.24748.5>,true}
|
[ns_server:error,2023-04-27T22:59:19.589-07:00,ns_1@172.23.122.141:<0.2733.3>:dcp_proxy:handle_info:111]Socket #Port<0.16029> was closed. Closing myself. State = {state,
|
#Port<0.16029>,
|
{consumer,
|
"replication:ns_1@172.23.122.150->ns_1@172.23.122.141:bucket2",
|
'ns_1@172.23.122.141',
|
"bucket2"},
|
undefined,<<>>,
|
dcp_consumer_conn,
|
{state,idle,
|
[597,598,606,611,
|
612,615,617,618,
|
619,623,625,631,
|
632,636,639,640,
|
642,645,646,647,
|
655,656,659,661,
|
662,664,665,666,
|
667,671,672,675,
|
676,678,681]},
|
#Port<0.16030>,
|
<0.2734.3>,true}
|
|
From 172.23.122.141 dmsg,
Apr 27 22:59:19 sc1503-deb10 kernel: [3714532.939633] mc:worker_01[15525]: segfault at 170 ip 00007f9a4ebdd6c0 sp 00007f9a41fe8e68 error 4 in libpthread-2.28.so[7f9a4ebd9000+f000]
|
Apr 27 22:59:19 sc1503-deb10 kernel: [3714532.939648] Code: ff 48 8d 0d c2 af 00 00 ba b1 01 00 00 48 8d 35 4b ae 00 00 48 8d 3d ff ae 00 00 e8 1a bb ff ff 8b 03 e9 45 fa ff ff 0f 1f 00 <8b> 47 10 89 c2 81 e2 7f 01 00 00 83 e0 7c 0f 85 9c 00 00 00 48 83
|
Attachments
Issue Links
- is caused by
-
MB-56275 DCP Consumer StreamEnd when buffering can leak the flow control ack_bytes (was: [System test] [CDC]: System test rebalance hangs)
- Closed