Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.0
-
Triaged
-
Centos 64-bit
-
1
-
No
Description
Script to Repro
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops-temp_rebalance_even2-magma.ini rerun=False,get-cbcollect-info=True,quota_percent=99,crash_warning=True,retry_get_process_num=600,bucket_storage=magma,enable_dp=True -t bucket_collections.collections_drop_recreate_rebalance.CollectionsDropRecreateRebalance.test_data_load_collections_with_rebalance_out,nodes_init=5,nodes_out=2,bucket_spec=multi_bucket.buckets_1000_collections'
|
Steps to Repro
1. Create a 5 node cluster
2021-07-20 22:05:38,463 | test | INFO | pool-3-thread-7 | [table_view:display:72] Rebalance Overview
-----------------------------------------------------------------------
Nodes | Services | Version | CPU | Status |
-----------------------------------------------------------------------
172.23.121.135 | kv | 7.1.0-1083-enterprise | 0.713749060856 | Cluster node |
172.23.121.136 | None | <--- IN — | ||
172.23.121.139 | None | <--- IN — | ||
172.23.121.140 | None | <--- IN — | ||
172.23.121.141 | None | <--- IN — |
-----------------------------------------------------------------------
2. Create bucket/scope/collections/data
2021-07-20 22:07:15,842 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bucket | Type | Storage Backend | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used | ARR |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
nm3B1VRZsUKP8%uMu-txjkDFBi-NGJ4oIQx1jai4hU91hdLnYwwoOp4TKoathcSVXpoqeboBdQncke-48-878000 | couchbase | magma | 2 | none | 0 | 3000000 | 9.77 GiB | 2.78 GiB | 3.10 GiB | 100 |
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3. Start CRUD on collections, this goes on until the end of rebalance that is started in the next step.
4. Start multi node rebalance out(172.23.121.136 and 172.23.121.135).
2021-07-20 22:07:16,509 | test | INFO | pool-3-thread-4 | [table_view:display:72] Rebalance Overview
----------------------------------------------------------------------
Nodes | Services | Version | CPU | Status |
----------------------------------------------------------------------
172.23.121.140 | kv | 7.1.0-1083-enterprise | 4.8921224285 | Cluster node |
172.23.121.136 | kv | 7.1.0-1083-enterprise | 4.82654600302 | — OUT ---> |
172.23.121.139 | kv | 7.1.0-1083-enterprise | 4.95037064958 | Cluster node |
172.23.121.135 | kv | 7.1.0-1083-enterprise | 4.73737119879 | — OUT ---> |
172.23.121.141 | kv | 7.1.0-1083-enterprise | 4.77506911284 | Cluster node |
----------------------------------------------------------------------
This rebalance fails as shown below.
2021-07-20 22:28:11,601 | test | ERROR | pool-3-thread-4 | [rest_client:_rebalance_status_and_progress:1547] {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'type': u'rebalance', u'masterRequestTimedOut': False, u'statusId': u'ef4929b52095d0e1b7f1ce7c96d01b84', u'statusIsStale': False, u'lastReportURI': u'/logs/rebalanceReport?reportID=a43d262e6e2c00c53e6aa53b0a22d187', u'status': u'notRunning'} - rebalance failed
|
2021-07-20 22:28:11,624 | test | INFO | pool-3-thread-4 | [rest_client:print_UI_logs:2693] Latest logs from UI on 172.23.121.135:
|
2021-07-20 22:28:11,624 | test | ERROR | pool-3-thread-4 | [rest_client:print_UI_logs:2695] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.121.135', u'tstamp': 1626845288009L, u'shortText': u'message', u'serverTime': u'2021-07-20T22:28:08.009Z', u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n {\'EXIT\',<0.10706.1>,\n {{dcp_wait_for_data_move_failed,\n "nm3B1VRZsUKP8%uMu-txjkDFBi-NGJ4oIQx1jai4hU91hdLnYwwoOp4TKoathcSVXpoqeboBdQncke-48-878000",\n 203,\'ns_1@172.23.121.135\',\n [\'ns_1@172.23.121.141\',\n \'ns_1@172.23.121.140\',\n \'ns_1@172.23.121.139\'],\n {error,no_stats_for_this_vbucket}},\n [{ns_single_vbucket_mover,\n \'-wait_dcp_data_move/5-fun-0-\',5,\n [{file,"src/ns_single_vbucket_mover.erl"},\n {line,459}]},\n {proc_lib,init_p,3,\n [{file,"proc_lib.erl"},{line,234}]}]}}}}.\nRebalance Operation Id = 49de3cd996b5984e6a69b46336b253fa'}
|
2021-07-20 22:28:11,625 | test | ERROR | pool-3-thread-4 | [rest_client:print_UI_logs:2695] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.121.135', u'tstamp': 1626845287930L, u'shortText': u'message', u'serverTime': u'2021-07-20T22:28:07.930Z', u'text': u'Worker <0.10565.1> (for action {move,{203,\n [\'ns_1@172.23.121.135\',\n \'ns_1@172.23.121.141\',\n \'ns_1@172.23.121.140\'],\n [\'ns_1@172.23.121.141\',\n \'ns_1@172.23.121.140\',\n \'ns_1@172.23.121.139\'],\n []}}) exited with reason {unexpected_exit,\n {\'EXIT\',\n <0.10706.1>,\n {{dcp_wait_for_data_move_failed,\n "nm3B1VRZsUKP8%uMu-txjkDFBi-NGJ4oIQx1jai4hU91hdLnYwwoOp4TKoathcSVXpoqeboBdQncke-48-878000",\n 203,\n \'ns_1@172.23.121.135\',\n [\'ns_1@172.23.121.141\',\n \'ns_1@172.23.121.140\',\n \'ns_1@172.23.121.139\'],\n {error,\n no_stats_for_this_vbucket}},\n [{ns_single_vbucket_mover,\n \'-wait_dcp_data_move/5-fun-0-\',\n 5,\n [{file,\n "src/ns_single_vbucket_mover.erl"},\n {line,\n 459}]},\n {proc_lib,\n init_p,3,\n [{file,\n "proc_lib.erl"},\n {line,\n 234}]}]}}}'}
|
It should be noted that this is the same test and similar failure was seen on MB-47390 which was marked dup of MB-42652 which has been fixed on the build this was run on.
cbcollect_info attached.