Details
-
Bug
-
Status: Closed
-
Critical
-
Resolution: Fixed
-
6.6.5
-
Enterprise Edition 7.0.0 build 5238
-
Untriaged
-
Windows 64-bit
-
-
1
-
Unknown
Description
Build: 7.0.0-5238
Scenario:
Rebalancing out Eventing node from the cluster with multiple services enabled.
(Operation Id = b8f8038679f6dd16dc26c2e7eb755ba3)
+----------------+-------------+-----------------------+---------------+--------------+
|
| Nodes | Services | Version | CPU | Status |
|
+----------------+-------------+-----------------------+---------------+--------------+
|
| 172.23.107.142 | eventing | 7.0.0-5238-enterprise | 15.3928202393 | Cluster node |
|
| 172.23.106.116 | backup | 7.0.0-5238-enterprise | 6.95321744638 | Cluster node |
|
| 172.23.107.127 | cbas | 7.0.0-5238-enterprise | 2.52666666667 | Cluster node |
|
| 172.23.107.129 | kv | 7.0.0-5238-enterprise | 39.6083333333 | Cluster node |
|
| 172.23.107.126 | cbas | 7.0.0-5238-enterprise | 7.94460276986 | Cluster node |
|
| 172.23.104.247 | kv | 7.0.0-5238-enterprise | 47.6291271521 | Cluster node |
|
| 172.23.105.137 | kv | 7.0.0-5238-enterprise | 49.554159236 | Cluster node |
|
| 172.23.105.1 | index, n1ql | 7.0.0-5238-enterprise | 17.5264594289 | Cluster node |
|
| 172.23.105.183 | eventing | 7.0.0-5238-enterprise | 42.136226522 | --- OUT ---> |
|
| 172.23.107.131 | index, n1ql | 7.0.0-5238-enterprise | 8.54319094682 | Cluster node |
|
+----------------+-------------+-----------------------+---------------+--------------+
|
Observation:
Eventing rebalance stuck around 79% and not proceeding further for 2.5 hrs.
Also seeing failures and timeouts in the deployed Eventing function "a3_users_search"
Note: Possible regression due to MB-46543
Steps to reproduce:
1. Setup a cluster with 2 KV nodes DataNode-A, DataNode-B and 1 eventing node - EvtNode-C
2. Create 3 buckets for source, metadata and destination
3. Deploy a function listening to src, metadata at meta and destination bucket binding as "dst".
4. Have onUpdate code as :
function OnUpdate(meta, doc) {
dst[meta.id] = doc
}
5. Deploy the function
6. Push 5-10 documents on the source bucket. The OnUpdate handler will create these many documents in the destination bucket.
7. Rebalance out both KV nodes DataNode-A, DataNode-B and rebalance in DataNode-D Make sure there are no operations being pushed to source bucket while this topology change in going on.
8. Once rebalance in of DataNode-D is complete, push 5-10 documents again to the source bucket.
—
Observation without the fix:
Eventing function continuously fails with LCB_AUTH_ERR while processing mutations from Step 8. This is because of the stale cluster map with the libcouchbase instance that still assumes DataNode-A and DataNode-B to be part of the cluster.
Observation with the fix:
No error observed. On the first LCB_AUTH_ERR, eventing will detect this error and will repair the connections with latest cluster map.