Description
Build: 7.1.0-1300
Test: -test tests/fts/cheshire-cat/test_fts_clusterops_cheshire_cat_coll_crud_freetier.yml -scope tests/fts/cheshire-cat/scope_fts_cheshire_cat_free_tier.yml
- Cluster with 3 nodes having kv,n1ql, search, index on all the nodes
- Create 1 bucket, 100 scopes and 50 collections in each scopes
- Created 500 indexes: one index (1 partition) on each collection
- Create 2500 GSI indexes ( 5 on each collection)
- Load documents on some of the collections
- Run fts query workload
- Kill fts on 172.23.106.242 and wait for 25 mins
- Rebalance in 172.23.107.89 with fts service
- Rebalance failed due OOM kill of fts node on 172.23.106.242
From : 172.23.106.253
[ns_server:error,2021-09-17T16:12:55.608-07:00,ns_1@172.23.106.253:service_rebalancer-fts<0.26279.44>:service_rebalancer:run_rebalance_worker:130]Agent terminated during the rebalance: {'DOWN',
|
#Ref<0.2884519677.227278849.177132>,
|
process,<31346.30956.27>,
|
{lost_connection,
|
{'ns_1@172.23.106.242',shutdown}}}
|
[ns_server:info,2021-09-17T16:12:55.613-07:00,ns_1@172.23.106.253:rebalance_agent<0.8298.0>:rebalance_agent:handle_down:290]Rebalancer process <0.26322.44> died (reason {service_rebalance_failed,fts,
|
{agent_died,<31346.30956.27>,
|
{lost_connection,
|
{'ns_1@172.23.106.242',
|
shutdown}}}}).
|
[ns_server:error,2021-09-17T16:12:55.613-07:00,ns_1@172.23.106.253:service_agent-fts<0.9178.0>:service_agent:handle_info:281]Rebalancer <0.26279.44> died unexpectedly: {agent_died,<31346.30956.27>,
|
{lost_connection,
|
{'ns_1@172.23.106.242',
|
shutdown}}}
|
[user:error,2021-09-17T16:12:55.617-07:00,ns_1@172.23.106.253:<0.9322.0>:ns_orchestrator:log_rebalance_completion:1412]Rebalance exited with reason {service_rebalance_failed,fts,
|
{agent_died,<31346.30956.27>,
|
{lost_connection,
|
{'ns_1@172.23.106.242',shutdown}}}}.
|
Rebalance Operation Id = 54061800f5f01c3d63359735e10b16d0
|
From 172.23.106.242:
2021-09-17T16:12:46.954-07:00 [INFO] app_herder: indexing over indexQuota: 4592640000, memUsed: 8862700440, preIndexingMemory: 426816, indexes: 145, waiting: 243
|
2021-09-17T16:12:52.080-07:00 [INFO] app_herder: query ended, indexes: 145, waiting: 243
|
2021-09-17T16:12:52.181-07:00 [INFO] app_herder: indexing over indexQuota: 4592640000, memUsed: 9077805976, preIndexingMemory: 426816, indexes: 145, waiting: 243
|
2021-09-17T16:12:52.184-07:00 [INFO] app_herder: query ended, indexes: 145, waiting: 243
|
2021-09-17T16:12:56.813-07:00 [INFO] main: /opt/couchbase/bin/cbft started (v0.6.0/5.5.0)
|
2021-09-17T16:12:56.830-07:00 [INFO] main: file descriptor limit current: 200000 max: 200000
|
2021-09-17T16:12:56.830-07:00 [INFO] -authType="cbauth"
|
2021-09-17T16:12:56.830-07:00 [INFO] -bindGrpc="172.23.106.242:9130,0.0.0.0:9130"
|
2021-09-17T16:12:56.830-07:00 [INFO] -bindGrpcSsl="172.23.106.242:19130,0.0.0.0:19130"
|
2021-09-17T16:12:56.830-07:00 [INFO] -bindHttp="172.23.106.242:8094,0.0.0.0:8094"
|
2021-09-17T16:12:56.830-07:00 [INFO] -bindHttps=":18094"
|
2021-09-17T16:12:56.830-07:00 [INFO] -cfgConnect="metakv"
|
2021-09-17T16:12:56.830-07:00 [INFO] -container=""
|
Is there a reason for 172.23.106.242 being overloaded while other nodes are fine.
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1631924630/collectinfo-2021-09-18T002351-ns_1%40172.23.106.242.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1631924630/collectinfo-2021-09-18T002351-ns_1%40172.23.106.243.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1631924630/collectinfo-2021-09-18T002351-ns_1%40172.23.106.253.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1631924630/collectinfo-2021-09-18T002351-ns_1%40172.23.107.89.zip