Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
7.6.2
-
None
-
Untriaged
-
0
-
Yes
Description
Build - 7.6.2-3674
Steps
- Cluster config - kv:n1ql-kv:n1ql-index-index-index-index-index
- Create bucket and one named keyspace and load docs and create indexes in all the keyspaces
- Keep running index scans in the background
- Load docs until indexer resident ratio reaches 20%
- Fill the disk upto 80% capacity on all indexer nodes. Below cmd is used
dd if=/dev/mapper/tmpl--deb10--vg-root of=/opt/couchbase/var/lib/couchbase/data/DUMMY_FILE_DELETE_IF_STILL_PRESENT bs=1M
- Rebalance out 2 indexer nodes
- During the rebalance a CPU and memory stress is induced on all the nodes in the cluster using the below command
stress --cpu 1 --vm-bytes 365M --vm 1 --timeout 1800 -d 1 & > /dev/null && echo 1 || echo 0 |
- The ongoing rebalance fails with the below error
{'status': 'none', 'errorMessage': 'Rebalance failed. See logs for detailed reason. You can try again.'} - rebalance failed |
[2024-05-25 13:16:27,573] - [on_prem_rest_client:4324] INFO - Latest logs from UI on 172.23.123.48: |
[2024-05-25 13:16:27,573] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.122.61', 'type': 'info', 'code': 0, 'module': 'menelaus_web_alerts_srv', 'tstamp': 1716668185410, 'shortText': 'message', 'text': "The time on node 'ns_1@172.23.122.61' is not synchronized. Please ensure that NTP is set up correctly on all nodes and that clocks are synchronized.", 'serverTime': '2024-05-25T13:16:25.410Z'} |
[2024-05-25 13:16:27,573] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.120.101', 'type': 'critical', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1716668180054, 'shortText': 'message', 'text': 'Rebalance exited with reason {{badmatch,\n {leader_activities_error,\n {default,rebalance},\n {quorum_lost,\n {lease_lost,\'ns_1@172.23.121.135\'}}}},\n [{ns_rebalancer,rebalance,7,\n [{file,"src/ns_rebalancer.erl"},{line,456}]},\n {proc_lib,init_p_do_apply,3,\n [{file,"proc_lib.erl"},{line,240}]}]}.\nRebalance Operation Id = 4c14e220ff46693203c2da33c8b8697d', 'serverTime': '2024-05-25T13:16:20.054Z'} |
[2024-05-25 13:16:27,574] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.121.160', 'type': 'info', 'code': 0, 'module': 'menelaus_web_alerts_srv', 'tstamp': 1716668175242, 'shortText': 'message', 'text': 'Warning: approaching low index resident percentage. Indexer RAM percentage on node "172.23.121.160" is 9%, which is under the threshold of 10%.', 'serverTime': '2024-05-25T13:16:15.242Z'} |
[2024-05-25 13:16:27,574] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.121.160', 'type': 'info', 'code': 0, 'module': 'menelaus_web_alerts_srv', 'tstamp': 1716668175241, 'shortText': 'message', 'text': "The time on node 'ns_1@172.23.121.160' is not synchronized. Please ensure that NTP is set up correctly on all nodes and that clocks are synchronized.", 'serverTime': '2024-05-25T13:16:15.241Z'} |
[2024-05-25 13:16:27,574] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.120.101', 'type': 'info', 'code': 0, 'module': 'ns_vbucket_mover', 'tstamp': 1716668174184, 'shortText': 'message', 'text': 'Bucket "test_bucket" rebalance appears to be swap rebalance', 'serverTime': '2024-05-25T13:16:14.184Z'} |
[2024-05-25 13:16:27,574] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.120.101', 'type': 'info', 'code': 0, 'module': 'menelaus_web_alerts_srv', 'tstamp': 1716668173811, 'shortText': 'message', 'text': 'Warning: approaching low index resident percentage. Indexer RAM percentage on node "172.23.120.101" is 0%, which is under the threshold of 10%.', 'serverTime': '2024-05-25T13:16:13.811Z'} |
[2024-05-25 13:16:27,574] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.122.123', 'type': 'info', 'code': 0, 'module': 'menelaus_web_alerts_srv', 'tstamp': 1716668160746, 'shortText': 'message', 'text': "The time on node 'ns_1@172.23.122.123' is not synchronized. Please ensure that NTP is set up correctly on all nodes and that clocks are synchronized.", 'serverTime': '2024-05-25T13:16:00.746Z'} |
[2024-05-25 13:16:27,574] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.121.66', 'type': 'info', 'code': 0, 'module': 'menelaus_web_alerts_srv', 'tstamp': 1716668150283, 'shortText': 'message', 'text': 'Warning: approaching low index resident percentage. Indexer RAM percentage on node "172.23.121.66" is 4%, which is under the threshold of 10%.', 'serverTime': '2024-05-25T13:15:50.283Z'} |
[2024-05-25 13:16:27,574] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.121.66', 'type': 'info', 'code': 0, 'module': 'menelaus_web_alerts_srv', 'tstamp': 1716668150282, 'shortText': 'message', 'text': "The time on node 'ns_1@172.23.121.66' is not synchronized. Please ensure that NTP is set up correctly on all nodes and that clocks are synchronized.", 'serverTime': '2024-05-25T13:15:50.282Z'} |
[2024-05-25 13:16:27,574] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.120.101', 'type': 'info', 'code': 0, 'module': 'ns_rebalancer', 'tstamp': 1716668149423, 'shortText': 'message', 'text': 'Started rebalancing bucket test_bucket', 'serverTime': '2024-05-25T13:15:49.423Z'} |
[2024-05-25 13:16:27,576] - [remote_util:306] INFO - SSH Connecting to 172.23.120.101 with username:root, attempt#1 of 5 |
[2024-05-25 13:16:27,848] - [remote_util:344] INFO - SSH Connected to 172.23.120.101 as root |
[2024-05-25 13:16:27,988] - [remote_util:3520] INFO - os_distro: Ubuntu, os_version: debian 10, is_linux_distro: True |
[2024-05-25 13:16:28,284] - [remote_util:3690] INFO - extract_remote_info-->distribution_type: Ubuntu, distribution_version: debian 10 |
[2024-05-25 13:16:28,285] - [remote_util:3356] INFO - running command.raw on 172.23.120.101: rm -f /opt/couchbase/var/lib/couchbase/data/DUMMY_FILE_DELETE_IF_STILL_PRESENT |
[2024-05-25 13:16:31,179] - [on_prem_rest_client:2078] ERROR - {'status': 'none', 'errorMessage': 'Rebalance failed. See logs for detailed reason. You can try again.'} - rebalance failed |
[2024-05-25 13:16:31,192] - [on_prem_rest_client:4324] INFO - Latest logs from UI on 172.23.123.48: |
[2024-05-25 13:16:31,192] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.122.61', 'type': 'info', 'code': 0, 'module': 'menelaus_web_alerts_srv', 'tstamp': 1716668185410, 'shortText': 'message', 'text': "The time on node 'ns_1@172.23.122.61' is not synchronized. Please ensure that NTP is set up correctly on all nodes and that clocks are synchronized.", 'serverTime': '2024-05-25T13:16:25.410Z'} |
[2024-05-25 13:16:31,192] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.120.101', 'type': 'critical', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1716668180054, 'shortText': 'message', 'text': 'Rebalance exited with reason {{badmatch,\n {leader_activities_error,\n {default,rebalance},\n {quorum_lost,\n {lease_lost,\'ns_1@172.23.121.135\'}}}},\n [{ns_rebalancer,rebalance,7,\n [{file,"src/ns_rebalancer.erl"},{line,456}]},\n {proc_lib,init_p_do_apply,3,\n [{file,"proc_lib.erl"},{line,240}]}]}.\nRebalance Operation Id = 4c14e220ff46693203c2da33c8b8697d', 'serverTime': '2024-05-25T13:16:20.054Z'} |
[2024-05-25 13:16:31,192] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.121.160', 'type': 'info', 'code': 0, 'module': 'menelaus_web_alerts_srv', 'tstamp': 1716668175242, 'shortText': 'message', 'text': 'Warning: approaching low index resident percentage. Indexer RAM percentage on node "172.23.121.160" is 9%, which is under the threshold of 10%.', 'serverTime': '2024-05-25T13:16:15.242Z'} |
[2024-05-25 13:16:31,192] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.121.160', 'type': 'info', 'code': 0, 'module': 'menelaus_web_alerts_srv', 'tstamp': 1716668175241, 'shortText': 'message', 'text': "The time on node 'ns_1@172.23.121.160' is not synchronized. Please ensure that NTP is set up correctly on all nodes and that clocks are synchronized.", 'serverTime': '2024-05-25T13:16:15.241Z'} |
[2024-05-25 13:16:31,193] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.120.101', 'type': 'info', 'code': 0, 'module': 'ns_vbucket_mover', 'tstamp': 1716668174184, 'shortText': 'message', 'text': 'Bucket "test_bucket" rebalance appears to be swap rebalance', 'serverTime': '2024-05-25T13:16:14.184Z'} |
[2024-05-25 13:16:31,193] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.120.101', 'type': 'info', 'code': 0, 'module': 'menelaus_web_alerts_srv', 'tstamp': 1716668173811, 'shortText': 'message', 'text': 'Warning: approaching low index resident percentage. Indexer RAM percentage on node "172.23.120.101" is 0%, which is under the threshold of 10%.', 'serverTime': '2024-05-25T13:16:13.811Z'} |
[2024-05-25 13:16:31,193] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.122.123', 'type': 'info', 'code': 0, 'module': 'menelaus_web_alerts_srv', 'tstamp': 1716668160746, 'shortText': 'message', 'text': "The time on node 'ns_1@172.23.122.123' is not synchronized. Please ensure that NTP is set up correctly on all nodes and that clocks are synchronized.", 'serverTime': '2024-05-25T13:16:00.746Z'} |
[2024-05-25 13:16:31,193] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.121.66', 'type': 'info', 'code': 0, 'module': 'menelaus_web_alerts_srv', 'tstamp': 1716668150283, 'shortText': 'message', 'text': 'Warning: approaching low index resident percentage. Indexer RAM percentage on node "172.23.121.66" is 4%, which is under the threshold of 10%.', 'serverTime': '2024-05-25T13:15:50.283Z'} |
[2024-05-25 13:16:31,193] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.121.66', 'type': 'info', 'code': 0, 'module': 'menelaus_web_alerts_srv', 'tstamp': 1716668150282, 'shortText': 'message', 'text': "The time on node 'ns_1@172.23.121.66' is not synchronized. Please ensure that NTP is set up correctly on all nodes and that clocks are synchronized.", 'serverTime': '2024-05-25T13:15:50.282Z'} |
[2024-05-25 13:16:31,193] - [on_prem_rest_client:4325] ERROR - {'node': 'ns_1@172.23.120.101', 'type': 'info', 'code': 0, 'module': 'ns_rebalancer', 'tstamp': 1716668149423, 'shortText': 'message', 'text': 'Started rebalancing bucket test_bucket', 'serverTime': '2024-05-25T13:15:49.423Z'} |
Logs
Let me know if i can tweak the load for disk filling