Details
-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
7.2.0
-
7.2.0-5242
-
Untriaged
-
-
0
-
Unknown
Description
Build: 7.2.0-5242
Steps:
- Cluster setup
+----------------+----------+-----------------+-----------+-----------+
| Node | Services | CPU_utilization | Mem_total | Mem_free |
+----------------+----------+-----------------+-----------+-----------+
| 172.23.106.94 | kv | 1.55735873766 | 11.74 GiB | 10.93 GiB |
| 172.23.106.87 | kv | 2.19224316187 | 11.74 GiB | 10.77 GiB |
| 172.23.106.92 | kv | 0 | 11.74 GiB | 10.84 GiB |
| 172.23.107.147 | kv | 0.575418225985 | 11.74 GiB | 10.84 GiB |
+----------------+----------+-----------------+-----------+-----------+
+---------+-----------+-----------------+----------+----------+-----------+------------+------------+---------------+
| Bucket | Type | Storage Backend | Replicas | Items | RAM Quota | RAM Used | Disk Used | ARR |
+---------+-----------+-----------------+----------+----------+-----------+------------+------------+---------------+
| bucket1 | couchbase | couchstore | 1 | 99000 | 7.81 GiB | 122.72 MiB | 130.06 MiB | 100 |
| bucket2 | couchbase | magma | 1 | 49500 | 3.91 GiB | 253.15 MiB | 197.57 MiB | 100 |
| default | couchbase | magma | 1 | 19496700 | 2.00 GiB | 1.35 GiB | 4.81 GiB | 19.6319479707 |
+---------+-----------+-----------------+----------+----------+-----------+------------+------------+---------------+
- Load initial data and trigger disk fo on node 172.23.106.94
{u'code': 0, u'module': u'menelaus_web_alerts_srv', u'type': u'info', u'node': u'ns_1@172.23.106.94', u'tstamp': 1678566553989L, u'shortText': u'message',
u'serverTime': u'2023-03-11T12:29:13.989Z', u'text': u'Approaching full disk warning. Usage of disk "/root" on node "172.23.106.94" is around 100%.'}
- Recover the node and add-back (delta recovery) + trigger rebalance
Observation:
Rebalance failed with reason `timeout`
2023-03-11 12:35:56,904 :: Adding back node 172.23.106.94
|
{u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'type': u'rebalance', u'masterRequestTimedOut': False, u'statusId': u'34a518175ed901f99862a525551600b5', u'subtype': u'rebalance', u'statusIsStale': False, u'lastReportURI': u'/logs/rebalanceReport?reportID=58cf9617f036f87676781eda12cddb53', u'status': u'notRunning'} - rebalance failed
|
Latest logs from UI on 172.23.106.87:
|
{u'code': 0, u'module': u'ns_memcached', u'type': u'info', u'node': u'ns_1@172.23.106.94', u'tstamp': 1678567257577L, u'shortText': u'message',
|
u'serverTime': u'2023-03-11T12:40:57.577Z', u'text': u'Shutting down bucket "bucket1" on \'ns_1@172.23.106.94\' for deletion'}
|
{u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.106.87', u'tstamp': 1678567257572L, u'shortText': u'message',
|
u'serverTime': u'2023-03-11T12:40:57.572Z', u'text': u'Rebalance exited with reason
|
{prepare_delta_recovery_failed,"bucket1",{error,{failed_nodes,[{\'ns_1@172.23.106.94\',{error,timeout}}]}}}.
|
Rebalance Operation Id = 0ee524e95ab99868a11b518dfc4fe7d3'}
|
{u'code': 0, u'module': u'menelaus_web_alerts_srv', u'type': u'info', u'node': u'ns_1@172.23.106.94', u'tstamp': 1678566973995L, u'shortText': u'message',
|
u'serverTime': u'2023-03-11T12:36:13.995Z', u'text': u'Write Commit Failure. Disk write failed for item in Bucket "bucket2" on node 172.23.106.94.'}
|
{u'code': 0, u'module': u'menelaus_web_alerts_srv', u'type': u'info', u'node': u'ns_1@172.23.106.94', u'tstamp': 1678566973995L, u'shortText': u'message',
|
u'serverTime': u'2023-03-11T12:36:13.995Z', u'text': u'Write Commit Failure. Disk write failed for item in Bucket "bucket1" on node 172.23.106.94.'}
|
{u'code': 0, u'module': u'menelaus_web_alerts_srv', u'type': u'info', u'node': u'ns_1@172.23.106.94', u'tstamp': 1678566973995L, u'shortText': u'message',
|
u'serverTime': u'2023-03-11T12:36:13.995Z', u'text': u'Write Commit Failure. Disk write failed for item in Bucket "default" on node 172.23.106.94.'}
|
{u'code': 0, u'module': u'ns_memcached', u'type': u'info', u'node': u'ns_1@172.23.106.94', u'tstamp': 1678566957880L, u'shortText': u'message',
|
u'serverTime': u'2023-03-11T12:35:57.880Z', u'text': u'Bucket "default" loaded on node \'ns_1@172.23.106.94\' in 0 seconds.'}
|
{u'code': 0, u'module': u'ns_memcached', u'type': u'info', u'node': u'ns_1@172.23.106.94', u'tstamp': 1678566957602L, u'shortText': u'message',
|
u'serverTime': u'2023-03-11T12:35:57.602Z', u'text': u'Bucket "bucket2" loaded on node \'ns_1@172.23.106.94\' in 0 seconds.'}
|
{u'code': 0, u'module': u'ns_memcached', u'type': u'info', u'node': u'ns_1@172.23.106.94', u'tstamp': 1678566957570L, u'shortText': u'message',
|
u'serverTime': u'2023-03-11T12:35:57.570Z', u'text': u'Bucket "bucket1" loaded on node \'ns_1@172.23.106.94\' in 0 seconds.'}
|
{u'code': 0, u'module': u'ns_orchestrator', u'type': u'info', u'node': u'ns_1@172.23.106.87', u'tstamp': 1678566957054L, u'shortText': u'message',
|
u'serverTime': u'2023-03-11T12:35:57.054Z', u'text': u"Starting rebalance, KeepNodes = ['ns_1@172.23.106.94','ns_1@172.23.106.87',\n 'ns_1@172.23.106.92','ns_1@172.23.107.147'], EjectNodes = [], Failed over and being ejected nodes = [], Delta recovery nodes = ['ns_1@172.23.106.94'], Delta recovery buckets = all; Operation Id = 0ee524e95ab99868a11b518dfc4fe7d3"}
|
{u'code': 0, u'module': u'menelaus_web_alerts_srv', u'type': u'info', u'node': u'ns_1@172.23.106.94', u'tstamp': 1678566913995L, u'shortText': u'message',
|
u'serverTime': u'2023-03-11T12:35:13.995Z', u'text': u'Write Commit Failure. Disk write failed for item in Bucket "bucket2" on node 172.23.106.94.'}
|
TAF test:
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.84460.ini GROUP=disk_fo,rerun=False,disk_optimized_thread_settings=True,get-cbcollect-info=True,autoCompactionDefined=true,dedupe_update_itrs=10000,upgrade_version=7.2.0-5242 -t failover.DiskFailoverTests.DiskAutofailoverTests.test_disk_autofailover_and_addback_of_node,timeout=10,num_node_failures=1,recovery_strategy=delta,failover_action=disk_full,nodes_init=4,disk_timeout=15,bucket_spec=magma_dgm.10_percent_dgm.4_node_1_replica_magma_512,doc_size=512,randomize_value=True,data_load_spec=volume_test_load_with_CRUD_on_collections,data_location=/root,crash_warning=True,default_history_retention_for_collections=false,bucket_history_retention_seconds=86400,bucket_history_retention_bytes=20000000000,GROUP=P0_set1;disk_fo'
|