Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: 7.0.0
Affects Version/s: Cheshire-Cat
Component/s: qe
Labels:

Triage:
Untriaged
Story Points:
1
Is this a Regression?:
No

Description

7.0.0-4960

6 GB RAM and 6 core boxes

Test:
./testrunner -i /tmp/win10-gsi.ini -p get-cbcollect-info=True -t clitest.collectinfotest.CollectinfoTests.collectinfo_test,sasl_buckets=1,standard_buckets=1,GROUP=P0

[2021-04-22 19:32:07,487] - [rest_client:1873] ERROR -

{'status': 'none', 'errorMessage': 'Rebalance failed. See logs for detailed reason. You can try again.'}

- rebalance failed
[2021-04-22 19:32:10,762] - [rest_client:3804] INFO - Latest logs from UI on 172.23.106.249:
[2021-04-22 19:32:10,763] - [rest_client:3805] ERROR - {'node': 'ns_1@172.23.106.249', 'type': 'critical', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1619145120056, 'shortText': 'message', 'text': 'Rebalance exited with reason {buckets_shutdown_wait_failed,\n [{\'ns_1@172.23.106.249\',\n {\'EXIT\',\n

{old_buckets_shutdown_wait_failed,\n ["standard_bucket0"]}

}}]}.\nRebalance Operation Id = 9bc9227c38dc7da499ddf0916205a12e', 'serverTime': '2021-04-22T19:32:00.056Z'}
[2021-04-22 19:32:10,763] - [rest_client:3805] ERROR - {'node': 'ns_1@172.23.106.249', 'type': 'critical', 'code': 0, 'module': 'ns_rebalancer', 'tstamp': 1619145120054, 'shortText': 'message', 'text': 'Failed to wait deletion of some buckets on some nodes: [{\'ns_1@172.23.106.249\',\n {\'EXIT\',\n

{old_buckets_shutdown_wait_failed,\n ["standard_bucket0"]}

}}]\n', 'serverTime': '2021-04-22T19:32:00.054Z'}
[2021-04-22 19:32:10,763] - [rest_client:3805] ERROR -

{'node': 'ns_1@172.23.106.249', 'type': 'info', 'code': 0, 'module': 'ns_orchestrator', 'tstamp': 1619145060052, 'shortText': 'message', 'text': "Starting rebalance, KeepNodes = ['ns_1@172.23.106.249'], EjectNodes = ['ns_1@172.23.136.127',\n 'ns_1@172.23.136.129',\n 'ns_1@172.23.136.252',\n 'ns_1@172.23.136.253'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 9bc9227c38dc7da499ddf0916205a12e", 'serverTime': '2021-04-22T19:31:00.052Z'}

[2021-04-22 19:32:10,763] - [rest_client:3805] ERROR -

{'node': 'ns_1@172.23.106.249', 'type': 'warning', 'code': 102, 'module': 'menelaus_web', 'tstamp': 1619145060049, 'shortText': 'client-side error report', 'text': 'Client-side error-report for user "Administrator" on node \'ns_1@172.23.106.249\':\nUser-Agent:Python-httplib2/0.13.1 (gzip)\nStarting rebalance from test, ejected nodes [\'ns_1@172.23.136.127\', \'ns_1@172.23.136.129\', \'ns_1@172.23.136.252\', \'ns_1@172.23.136.253\']', 'serverTime': '2021-04-22T19:31:00.049Z'}

[2021-04-22 19:32:10,763] - [rest_client:3805] ERROR -

{'node': 'ns_1@172.23.106.249', 'type': 'info', 'code': 0, 'module': 'ns_memcached', 'tstamp': 1619145024855, 'shortText': 'message', 'text': 'Shutting down bucket "standard_bucket0" on \'ns_1@172.23.106.249\' for deletion', 'serverTime': '2021-04-22T19:30:24.855Z'}

[2021-04-22 19:32:10,763] - [rest_client:3805] ERROR -

{'node': 'ns_1@172.23.106.249', 'type': 'info', 'code': 11, 'module': 'menelaus_web', 'tstamp': 1619145018262, 'shortText': 'message', 'text': 'Deleted bucket "default"\n', 'serverTime': '2021-04-22T19:30:18.262Z'}

[2021-04-22 19:32:10,763] - [rest_client:3805] ERROR -

{'node': 'ns_1@172.23.106.249', 'type': 'info', 'code': 0, 'module': 'auto_failover', 'tstamp': 1619144999705, 'shortText': 'message', 'text': 'Enabled auto-failover with timeout 120 and max count 1 (repeated 1 times, last seen 13.904235 secs ago)', 'serverTime': '2021-04-22T19:29:59.705Z'}

[2021-04-22 19:32:10,763] - [rest_client:3805] ERROR -

{'node': 'ns_1@172.23.106.249', 'type': 'info', 'code': 0, 'module': 'ns_memcached', 'tstamp': 1619144989195, 'shortText': 'message', 'text': 'Shutting down bucket "default" on \'ns_1@172.23.106.249\' for deletion', 'serverTime': '2021-04-22T19:29:49.195Z'}

[2021-04-22 19:32:10,763] - [rest_client:3805] ERROR -

{'node': 'ns_1@172.23.106.249', 'type': 'info', 'code': 0, 'module': 'auto_failover', 'tstamp': 1619144982686, 'shortText': 'message', 'text': 'Enabled auto-failover with timeout 120 and max count 1', 'serverTime': '2021-04-22T19:29:42.686Z'}

[2021-04-22 19:32:10,763] - [rest_client:3805] ERROR -

{'node': 'ns_1@172.23.106.249', 'type': 'info', 'code': 0, 'module': 'ns_memcached', 'tstamp': 1619144940096, 'shortText': 'message', 'text': 'Shutting down bucket "bucket0" on \'ns_1@172.23.106.249\' for deletion', 'serverTime': '2021-04-22T19:29:00.096Z'}

Cluster instance shutdown with force

Attaching logs

Attachments

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

172.23.106.249-20210422-1932-diag.zip
12.34 MB
22/Apr/21 9:18 PM
172.23.136.127-20210422-1932-diag.zip
1.75 MB
22/Apr/21 9:18 PM
172.23.136.129-20210422-1932-diag.zip
1.86 MB
22/Apr/21 9:18 PM
172.23.136.250-20210422-1932-diag.zip
1.32 MB
22/Apr/21 9:18 PM
172.23.136.252-20210422-1932-diag.zip
1.99 MB
22/Apr/21 9:18 PM
172.23.136.253-20210422-1932-diag.zip
2.05 MB
22/Apr/21 9:18 PM
screenshot-1.png
48 kB
26/Apr/21 10:32 PM
screenshot-2.png
32 kB
26/Apr/21 10:32 PM
screenshot-3.png
25 kB
26/Apr/21 10:34 PM
screenshot-4.png
33 kB
26/Apr/21 10:43 PM
screenshot-5.png
128 kB
26/Apr/21 10:44 PM
sys_cpu_utilization_rate.png
30 kB
23/Apr/21 10:11 AM

Issue Links

relates to

MB-44452 [couchstore]:Graceful Failover -> Full Recovery -> Rebalance failed due to buckets_shutdown_wait_failed

Closed

MB-45618 [Windows] - Dismantling a cluster on windows fails with buckets_shutdown_wait_failed

Closed

Activity

People

Assignee:: Arunkumar Senthilnathan (Inactive)

Reporter:: Arunkumar Senthilnathan (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 22/Apr/21 9:21 PM

Updated:: 17/Jun/21 3:57 PM

Resolved:: 27/Apr/21 10:00 AM

PagerDuty