Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.2
-
7.6.2-3672
-
Untriaged
-
-
0
-
Unknown
Description
Steps
- 3KV and 6 FTS nodes cluster. Each of 16C, 64G. FTS RAM quota is set to 50G
- Load 50M base64 encoded 1536 dim vectors to kv.
- Build FTS index on it.
- Run 1000 upserts per sec on KV.
- Start 2 FTS query threads.
- Scale UP to 7 nodes with Loading of docs
- Scale UP to 8 nodes with Loading of docs
- Scale DOWN to 7 nodes with Loading of docs
- Scale DOWN to 6 nodes with Loading of docs
- Scale Disk with Loading of docs. Triggers swap rebalance for all 6 nodes. One at a time
- Scale Disk with Loading of docs. Triggers swap rebalance for all 6 nodes. One at a time
- Scale Compute to 32C, 128G with Loading of docs. Triggers swap rebalance for all 6 nodes. One at a time
- Scale Compute back to 16C, 32G with Loading of docs. Triggers swap rebalance for all 6 nodes one node at a time. One of the swap rebalance is failed due to 137 kill. To bring back the cluster back to the healthy state CP added back the evicting node back to the cluster and triggered rebalance IN. That rebalance is hung.
Swap Rebalamce |
Starting rebalance, KeepNodes = ['ns_1@svc-d-node-001.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-d-node-002.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-d-node-003.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-s-node-053.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-s-node-054.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-s-node-055.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-s-node-056.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-s-node-057.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-s-node-058.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com'], EjectNodes = ['ns_1@svc-s-node-050.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = f6a458ab8b6d862f8607a8a97d4bb17b
|
Rebalance Failed due to oom on node 058 |
Service 'fts' exited with status 137. Restarting. Messages:
|
2024-05-25T09:49:12.931+00:00 [INFO] feed_dcp_gocbcore: newGocbcoreDCPFeed, name: default0_VXXJEVolumeCollection3_fts_idx_1_36cb12525ef39e55_75e1d5b4, indexName: default0_VXXJEVolumeCollection3_fts_idx_1, server: http://127.0.0.1:8091, bucketName: default0_VXXJE, bucketUUID: 0b2837845dbddcd5a6693ecd21a530de
|
2024-05-25T09:49:12.931+00:00 [INFO] feed_dcp_gocbcore: Start, name: default0_VXXJEVolumeCollection3_fts_idx_1_36cb12525ef39e55_75e1d5b4, num streams: 28, manifestUID: 6, streamOptions: {FilterOptions: &{ScopeID:0 CollectionIDs:[13]}, StreamOptions: &{StreamID:270}}, vbuckets: [492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519]
|
2024-05-25T09:49:12.932+00:00 [INFO] janitor: awakes, op: kick, msg: feed init kick for pindex: default0_VXXJEVolumeCollection3_fts_idx_1_36cb12525ef39e55_75e1d5b4
|
2024-05-25T09:49:13.059+00:00 [INFO] janitor: pindexes to remove: 0
|
2024-05-25T09:49:13.059+00:00 [INFO] janitor: pindexes to add: 0
|
2024-05-25T09:49:13.059+00:00 [INFO] janitor: pindexes to restart: 0
|
2024-05-25T09:49:13.059+00:00 [INFO] janitor: pindexes to hibernate: 0
|
2024-05-25T09:49:13.061+00:00 [INFO] janitor: feeds to remove: 0
|
2024-05-25T09:49:13.061+00:00 [INFO] janitor: feeds to add: 0
|
|
Rebalance exited with reason {service_rebalance_failed,fts,
|
{agent_died,<37081.3452.0>,
|
{lost_connection,
|
{'ns_1@svc-s-node-058.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
shutdown}}}}.
|
Rebalance Operation Id = f6a458ab8b6d862f8607a8a97d4bb17b
|
CP triggered Rebalance in for 050 node |
Starting rebalance, KeepNodes = ['ns_1@svc-d-node-001.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-d-node-002.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-d-node-003.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-s-node-050.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-s-node-053.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-s-node-054.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-s-node-055.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-s-node-056.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-s-node-057.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com',
|
'ns_1@svc-s-node-058.fwyc9tdnlqdmx5jy.sandbox.nonprod-project-avengers.com'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 3348564277ee8878d4bd56fd39a500bf
|
This one is hung!