Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.0
-
Untriaged
-
0
-
Unknown
Description
The following test was done on Capella cluster with ami -
couchbase-cloud-server-7.6.0-2149-x86_64-v1.0.2
Initial config was 3 nodes with all services colocated
Cluster had 10 million vector documents, 1 vector index - 'idx1' , 2 empty text indexes.
At this point cluster config was changed to ->
3 data+query nodes of 8v32 compute.
5 fts nodes of 16 v 64 compute.
The scaling triggered and completed after 30 minutes. The the new nodes were added and were healthy.
But since then the rebalance is starting and getting completed every minute.
It is neither getting completed nor getting failed.
server logs -
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-20T174526-ns_1%40svc-dq-node-012.vv3j1x-sy90isjyb.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-20T174526-ns_1%40svc-dq-node-013.vv3j1x-sy90isjyb.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-20T174526-ns_1%40svc-dq-node-014.vv3j1x-sy90isjyb.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-20T174526-ns_1%40svc-s-node-015.vv3j1x-sy90isjyb.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-20T174526-ns_1%40svc-s-node-016.vv3j1x-sy90isjyb.sandbox.nonprod-project-avengers.com.zip
the config was update at - 10:19:48 IST Feb 20.
Not sure if this is an ns_server issue or another case of wrong rebalance status returned by FTS as seen here - https://issues.couchbase.com/browse/MB-60803
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
EDIT1 : with 2153 build.
We are still hitting this issue with 2153 build as well. The same set of steps except we introduced 5 fts nodes with config change and that resulted in rebalances being triggered every minute.
ami - couchbase-cloud-server-7.6.0-2153-x86_64-v1.0.28
server logs -
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-21T113431-ns_1%40svc-d-node-004.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-21T113431-ns_1%40svc-d-node-005.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-21T113431-ns_1%40svc-d-node-006.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-21T113431-ns_1%40svc-s-node-007.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-21T113431-ns_1%40svc-s-node-008.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-21T113431-ns_1%40svc-s-node-009.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-21T113431-ns_1%40svc-s-node-010.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-21T113431-ns_1%40svc-s-node-011.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-21T113431-ns_1%40svc-s-node-012.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-21T113431-ns_1%40svc-s-node-013.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com.zip
https://cb-engineering.s3.amazonaws.com/aman/collectinfo-2024-02-21T113431-ns_1%40svc-s-node-014.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com.zip
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{
"rebalanceStatus":"running",
"balanced":false,
"nodes":[
{
"clusterMembership":"active",
"status":"healthy",
"recoveryType":"none",
"hostname":"svc-d-node-004.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com:8091",
"otpNode":"ns_1@svc-d-node-004.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com",
"version":"7.6.0-2153-enterprise",
"services":[
"kv"
],
"serverGroup":"group:1",
"limits":{
"kv":
},
"utilization":{
"kv":
},
"defragmented":{
"kv":
},
"memoryTotal":32574582784
},
{
"clusterMembership":"active",
"status":"healthy",
"recoveryType":"none",
"hostname":"svc-d-node-005.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com:8091",
"otpNode":"ns_1@svc-d-node-005.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com",
"version":"7.6.0-2153-enterprise",
"services":[
"kv"
],
"serverGroup":"group:1",
"limits":{
"kv":
},
"utilization":{
"kv":
},
"defragmented":{
"kv":
},
"memoryTotal":32809459712
},
{
"clusterMembership":"active",
"status":"healthy",
"recoveryType":"none",
"hostname":"svc-d-node-006.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com:8091",
"otpNode":"ns_1@svc-d-node-006.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com",
"version":"7.6.0-2153-enterprise",
"services":[
"kv"
],
"serverGroup":"group:1",
"limits":{
"kv":
},
"utilization":{
"kv":
},
"defragmented":{
"kv":
},
"memoryTotal":32574574592
},
{
"clusterMembership":"active",
"status":"healthy",
"recoveryType":"none",
"hostname":"svc-s-node-007.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com:8091",
"otpNode":"ns_1@svc-s-node-007.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com",
"version":"7.6.0-2153-enterprise",
"services":[
"fts"
],
"serverGroup":"group:1",
"limits":{
"kv":
},
"utilization":{
"kv":
},
"defragmented":{
"kv":
},
"memoryTotal":65877340160
},
{
"clusterMembership":"active",
"status":"healthy",
"recoveryType":"none",
"hostname":"svc-s-node-008.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com:8091",
"otpNode":"ns_1@svc-s-node-008.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com",
"version":"7.6.0-2153-enterprise",
"services":[
"fts"
],
"serverGroup":"group:1",
"limits":{
"kv":
},
"utilization":{
"kv":
},
"defragmented":{
"kv":
},
"memoryTotal":66581975040
},
{
"clusterMembership":"active",
"status":"healthy",
"recoveryType":"none",
"hostname":"svc-s-node-009.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com:8091",
"otpNode":"ns_1@svc-s-node-009.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com",
"version":"7.6.0-2153-enterprise",
"services":[
"fts"
],
"serverGroup":"group:1",
"limits":{
"kv":
},
"utilization":{
"kv":
},
"defragmented":{
"kv":
},
"memoryTotal":65877340160
},
{
"clusterMembership":"active",
"status":"healthy",
"recoveryType":"none",
"hostname":"svc-s-node-010.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com:8091",
"otpNode":"ns_1@svc-s-node-010.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com",
"version":"7.6.0-2153-enterprise",
"services":[
"fts"
],
"serverGroup":"group:1",
"limits":{
"kv":
},
"utilization":{
"kv":
},
"defragmented":{
"kv":
},
"memoryTotal":66581975040
},
{
"clusterMembership":"active",
"status":"healthy",
"recoveryType":"none",
"hostname":"svc-s-node-011.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com:8091",
"otpNode":"ns_1@svc-s-node-011.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com",
"version":"7.6.0-2153-enterprise",
"services":[
"fts"
],
"serverGroup":"group:1",
"limits":{
"kv":
},
"utilization":{
"kv":
},
"defragmented":{
"kv":
},
"memoryTotal":66581966848
},
{
"clusterMembership":"active",
"status":"healthy",
"recoveryType":"none",
"hostname":"svc-s-node-012.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com:8091",
"otpNode":"ns_1@svc-s-node-012.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com",
"version":"7.6.0-2153-enterprise",
"services":[
"fts"
],
"serverGroup":"group:1",
"limits":{
"kv":
},
"utilization":{
"kv":
},
"defragmented":{
"kv":
},
"memoryTotal":66581975040
},
{
"clusterMembership":"active",
"status":"healthy",
"recoveryType":"none",
"hostname":"svc-s-node-013.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com:8091",
"otpNode":"ns_1@svc-s-node-013.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com",
"version":"7.6.0-2153-enterprise",
"services":[
"fts"
],
"serverGroup":"group:1",
"limits":{
"kv":
},
"utilization":{
"kv":
},
"defragmented":{
"kv":
},
"memoryTotal":66581975040
},
{
"clusterMembership":"active",
"status":"healthy",
"recoveryType":"none",
"hostname":"svc-s-node-014.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com:8091",
"otpNode":"ns_1@svc-s-node-014.s8hzx7urdnl7wg5a.sandbox.nonprod-project-avengers.com",
"version":"7.6.0-2153-enterprise",
"services":[
"fts"
],
"serverGroup":"group:1",
"limits":{
"kv":
},
"utilization":{
"kv":
},
"defragmented":{
"kv":
},
"memoryTotal":66581975040
}
],
"memoryQuota":24852,
"indexMemoryQuota":4172,
"ftsMemoryQuota":50260,
"cbasMemoryQuota":4172,
"eventingMemoryQuota":4172,
"queryMemoryQuota":3337
}
note - "rebalanceStatus" keeps oscillating between 'none' and 'running'
Attachments
Issue Links
- relates to
-
MB-60803 Cluster stuck in a repetitive Rebalancing cycle. 760-2119
- Closed