Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
Cheshire-Cat
-
7.0.0-4486-enterprise
-
Triaged
-
Centos 64-bit
-
1
-
Yes
Description
Script to Repro
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops111111111111124244224.ini rerun=False,quota_percent=95,crash_warning=True -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_rebalance_in_out,nodes_init=4,nodes_in=2,nodes_out=1,bucket_spec=dgm.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,dgm=55,GROUP=rebalance_with_collection_crud_dgm'
|
Note to self: Even though above test is rebalance_in_out since we had only 5 nodes in .ini file it worked as a swap rebalance.
Steps to Repro
1) Create a 4 node cluster
Nodes | Services | Version | CPU | Status |
172.23.106.209 | kv | 7.0.0-4486-enterprise | 1.16468378209 | Cluster node |
172.23.106.225 | None | <--- IN — | ||
172.23.106.232 | None | <--- IN — | ||
172.23.106.239 | None | <--- IN — |
2) Create bucket/scope/collections/data
2021-02-18 00:56:15,555 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
Bucket | Type | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used |
default | couchbase | 2 | none | 0 | 0 | 3355443200 | 231591872 | 0 |
3) Push the bucket to DGM
2021-02-18 01:00:20,690 | test | INFO | pool-5-thread-11 | [task:_load_bucket_into_dgm:2073] Active_resident_items_ratio for default is 100
|
2021-02-18 01:00:20,690 | test | INFO | pool-5-thread-11 | [task:_load_bucket_into_dgm:2075] Replica_resident_items_ratio for default is 56.9882555556
|
2021-02-18 01:00:22,506 | test | INFO | pool-5-thread-11 | [task:_load_bucket_into_dgm:2079] Active DGM 100% Replica DGM 45.4014552547% achieved for 'default'. Loaded docs: 4240000
|
4) Add a node(172.23.106.246) and remove a node(172.23.106.239) and start a swap rebalance.
Swap rebalance fails with the following error.
2021-02-18 01:00:43,309 | test | INFO | pool-5-thread-2 | [rest_client:print_UI_logs:2599] Latest logs from UI on 172.23.106.209:
|
2021-02-18 01:00:43,309 | test | ERROR | pool-5-thread-2 | [rest_client:print_UI_logs:2601] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.106.209', u'tstamp': 1613638839746L, u'shortText': u'message', u'serverTime': u'2021-02-18T01:00:39.746Z', u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n {\'EXIT\',<0.24171.1>,\n {{dcp_wait_for_data_move_failed,"default",\n 481,\'ns_1@172.23.106.225\',\n [\'ns_1@172.23.106.246\',\n \'ns_1@172.23.106.232\'],\n {error,no_stats_for_this_vbucket}},\n [{ns_single_vbucket_mover,\n \'-wait_dcp_data_move/5-fun-0-\',5,\n [{file,"src/ns_single_vbucket_mover.erl"},\n {line,465}]},\n {proc_lib,init_p,3,\n [{file,"proc_lib.erl"},{line,234}]}]}}}}.\nRebalance Operation Id = 375f8fa5c330e3fc0660b209a82b1784'}
|
2021-02-18 01:00:43,311 | test | ERROR | pool-5-thread-2 | [rest_client:print_UI_logs:2601] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.106.209', u'tstamp': 1613638839653L, u'shortText': u'message', u'serverTime': u'2021-02-18T01:00:39.653Z', u'text': u'Worker <0.24154.1> (for action {move,{481,\n [\'ns_1@172.23.106.225\',\n \'ns_1@172.23.106.239\',\n \'ns_1@172.23.106.232\'],\n [\'ns_1@172.23.106.225\',\n \'ns_1@172.23.106.246\',\n \'ns_1@172.23.106.232\'],\n []}}) exited with reason {unexpected_exit,\n {\'EXIT\',\n <0.24171.1>,\n {{dcp_wait_for_data_move_failed,\n "default",\n 481,\n \'ns_1@172.23.106.225\',\n [\'ns_1@172.23.106.246\',\n \'ns_1@172.23.106.232\'],\n {error,\n no_stats_for_this_vbucket}},\n [{ns_single_vbucket_mover,\n \'-wait_dcp_data_move/5-fun-0-\',\n 5,\n [{file,\n "src/ns_single_vbucket_mover.erl"},\n {line,\n 465}]},\n {proc_lib,\n init_p,3,\n [{file,\n "proc_lib.erl"},\n {line,\n 234}]}]}}}'}
|
cbcollect_info attached. This test had passed on 7.0.0-4454.
Attachments
Issue Links
- duplicates
-
MB-44417 [Collections] - Collection CRUD + Multi node graceful failover + dgm + rebalance out fails with wait_seqno_persisted_failed
- Closed