Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.0
-
7.1.0-1083-enterprise
-
Triaged
-
Centos 64-bit
-
1
-
No
Description
Script to Repro
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops-temp_rebalance_even2-magma.ini rerun=False,get-cbcollect-info=True,quota_percent=99,crash_warning=True,retry_get_process_num=600,bucket_storage=magma,enable_dp=True -t bucket_collections.collections_drop_recreate_rebalance.CollectionsDropRecreateRebalance.test_data_load_collections_with_rebalance_out,nodes_init=5,nodes_out=2,bucket_spec=multi_bucket.buckets_1000_collections'
|
Steps to Repro
1. Create a 5 node cluster
2021-07-20 22:39:31,190 | test | INFO | pool-3-thread-7 | [table_view:display:72] Rebalance Overview
-----------------------------------------------------------------------
Nodes | Services | Version | CPU | Status |
-----------------------------------------------------------------------
172.23.121.135 | kv | 7.1.0-1083-enterprise | 0.150281778334 | Cluster node |
172.23.121.136 | None | <--- IN — | ||
172.23.121.139 | None | <--- IN — | ||
172.23.121.140 | None | <--- IN — | ||
172.23.121.141 | None | <--- IN — |
-----------------------------------------------------------------------
2. Create Bucket/scopes/collections/data
2021-07-20 22:41:09,727 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
--------------------------------------------------------------------------------------------------------------------------------------------------------
Bucket | Type | Storage Backend | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used | ARR |
--------------------------------------------------------------------------------------------------------------------------------------------------------
56O1a_jkB3cCAwqd29-QSy4EmFIRMjnPi8ZnbSzRDR-YDMH-oFpTTgCT_ur-41-657000 | couchbase | magma | 2 | none | 0 | 3000000 | 9.77 GiB | 2.77 GiB | 3.12 GiB | 100 |
--------------------------------------------------------------------------------------------------------------------------------------------------------
3. Start CRUD on collections, this goes on until the end of rebalance that is started in the next step.
4. Start multi node rebalance out(172.23.121.136 and 172.23.121.135).
2021-07-20 22:41:10,466 | test | INFO | pool-3-thread-23 | [table_view:display:72] Rebalance Overview
----------------------------------------------------------------------
Nodes | Services | Version | CPU | Status |
----------------------------------------------------------------------
172.23.121.140 | kv | 7.1.0-1083-enterprise | 4.18966382338 | Cluster node |
172.23.121.136 | kv | 7.1.0-1083-enterprise | 4.37405731523 | — OUT ---> |
172.23.121.139 | kv | 7.1.0-1083-enterprise | 4.3047188755 | Cluster node |
172.23.121.135 | kv | 7.1.0-1083-enterprise | 3.75864236329 | — OUT ---> |
172.23.121.141 | kv | 7.1.0-1083-enterprise | 4.00703429217 | Cluster node |
----------------------------------------------------------------------
This rebalance fails as shown below.
Latest logs from UI on 172.23.121.135:
2021-07-20 23:35:20,617 | test | ERROR | pool-3-thread-23 | [rest_client:print_UI_logs:2695] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.121.135', u'tstamp': 1626849319109L, u'shortText': u'message', u'serverTime': u'2021-07-20T23:35:19.109Z', u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n {\'EXIT\',<0.6756.8>,\n {{{{timeout,\n {gen_server,call,\n [memcached_refresh,\n {apply_to_file,\n "/opt/couchbase/var/lib/couchbase/config/memcached.rbac.tmp",\n "/opt/couchbase/var/lib/couchbase/config/memcached.rbac"}]}},\n {gen_server,call,\n [memcached_permissions,sync,infinity]}},\n {gen_server,call,\n [\'ns_memcached-56O1a_jkB3cCAwqd29-QSy4EmFIRMjnPi8ZnbSzRDR-YDMH-oFpTTgCT_ur-41-657000\',\n {get_dcp_docs_estimate,189,\n "replication:ns_1@172.23.121.135->ns_1@172.23.121.141:56O1a_jkB3cCAwqd29-QSy4EmFIRMjnPi8ZnbSzRDR-YDMH-oFpTTgCT_ur-41-657000"},\n 180000]}},\n {gen_server,call,\n [{\'janitor_agent-56O1a_jkB3cCAwqd29-QSy4EmFIRMjnPi8ZnbSzRDR-YDMH-oFpTTgCT_ur-41-657000\',\n \'ns_1@172.23.121.135\'},\n {if_rebalance,<0.15174.0>,\n {wait_dcp_data_move,\n [\'ns_1@172.23.121.141\',\n \'ns_1@172.23.121.139\',\n \'ns_1@172.23.121.140\'],\n 189}},\n infinity]}}}}}.\nRebalance Operation Id = 5646fb1a94153b57ce60511c2e21f24f'}
|
2021-07-20 23:35:20,621 | test | ERROR | pool-3-thread-23 | [rest_client:print_UI_logs:2695] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.121.135', u'tstamp': 1626849319028L, u'shortText': u'message', u'serverTime': u'2021-07-20T23:35:19.028Z', u'text': u'Worker <0.3080.1> (for action {move,{189,\n [\'ns_1@172.23.121.135\',\n \'ns_1@172.23.121.141\',\n \'ns_1@172.23.121.139\'],\n [\'ns_1@172.23.121.141\',\n \'ns_1@172.23.121.139\',\n \'ns_1@172.23.121.140\'],\n []}}) exited with reason {unexpected_exit,\n {\'EXIT\',\n <0.6756.8>,\n {{{{timeout,\n {gen_server,\n call,\n [memcached_refresh,\n {apply_to_file,\n "/opt/couchbase/var/lib/couchbase/config/memcached.rbac.tmp",\n "/opt/couchbase/var/lib/couchbase/config/memcached.rbac"}]}},\n {gen_server,\n call,\n [memcached_permissions,\n sync,\n infinity]}},\n {gen_server,\n call,\n [\'ns_memcached-56O1a_jkB3cCAwqd29-QSy4EmFIRMjnPi8ZnbSzRDR-YDMH-oFpTTgCT_ur-41-657000\',\n {get_dcp_docs_estimate,\n 189,\n "replication:ns_1@172.23.121.135->ns_1@172.23.121.141:56O1a_jkB3cCAwqd29-QSy4EmFIRMjnPi8ZnbSzRDR-YDMH-oFpTTgCT_ur-41-657000"},\n 180000]}},\n {gen_server,\n call,\n [{\'janitor_agent-56O1a_jkB3cCAwqd29-QSy4EmFIRMjnPi8ZnbSzRDR-YDMH-oFpTTgCT_ur-41-657000\',\n \'ns_1@172.23.121.135\'},\n {if_rebalance,\n <0.15174.0>,\n {wait_dcp_data_move,\n [\'ns_1@172.23.121.141\',\n \'ns_1@172.23.121.139\',\n \'ns_1@172.23.121.140\'],\n 189}},\n infinity]}}}}'}
|
cbcollect_info attached.