Details
-
Bug
-
Resolution: Duplicate
-
Major
-
Cheshire-Cat
-
7.0.0-4356-enterprise
-
Untriaged
-
Centos 64-bit
-
1
-
Unknown
Description
Script to Repro
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops.ini rerun=False,quota_percent=95,crash_warning=True -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_hard_failover_recovery,nodes_init=5,nodes_failover=1,recovery_type=full,override_spec_params=durability;replicas,durability=MAJORITY,replicas=2,bucket_spec=multi_bucket.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,quota_percent=80,GROUP=failover_with_collection_crud_durability_MAJORITY'
|
Steps to Repro
1) Create a 5 node cluster
2021-02-02 23:40:40,683 | test | INFO | pool-1-thread-6 | [table_view:display:72] Rebalance Overview
-----------------------------------
Nodes | Services | Status |
-----------------------------------
172.23.107.94 | kv | Cluster node |
172.23.107.95 | None | <--- IN — |
172.23.107.97 | None | <--- IN — |
172.23.107.98 | None | <--- IN — |
172.23.107.99 | None | <--- IN — |
-----------------------------------
2) Create buckets/scopes/collections/data
2021-02-02 23:46:08,055 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
-----------------------------------------------------------------------
Bucket | Type | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used |
-----------------------------------------------------------------------
bucket1 | couchbase | 2 | none | 0 | 6000 | 524288000 | 19645517 | 579736826 |
bucket2 | ephemeral | 2 | none | 0 | 6000 | 524288000 | 391920918 | 170 |
default | couchbase | 2 | none | 0 | 10000 | 524288000 | 21748195 | 587694028 |
-----------------------------------------------------------------------
3) Do a hard failover(172.23.107.99) of 1 node + full recovery + rebalance. Rebalance fails as shown below.
2021-02-02 23:48:30,255 | test | INFO | MainThread | [collections_rebalance:wait_for_failover_or_assert:213] 1 nodes failed over as expected in 0.0390000343323 seconds
|
2021-02-02 23:48:55,203 | test | WARNING | MainThread | [rest_client:get_nodes:1710] 172.23.107.99 - Node not part of cluster inactiveFailed
|
2021-02-02 23:51:10,203 | test | ERROR | pool-1-thread-30 | [rest_client:print_UI_logs:2595] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.107.94', u'tstamp': 1612338666604L, u'shortText': u'message', u'serverTime': u'2021-02-02T23:51:06.604Z', u'text': u'Rebalance exited with reason {mover_crashed,\n {unexpected_exit,\n {\'EXIT\',<0.3043.9>,\n {{bulk_set_vbucket_state_failed,\n [{\'ns_1@172.23.107.99\',\n {\'EXIT\',\n {socket_closed,\n {gen_server,call,\n [{\'janitor_agent-bucket2\',\n \'ns_1@172.23.107.99\'},\n {if_rebalance,<0.25501.7>,\n {update_vbucket_state,149,replica,\n undefined,\'ns_1@172.23.107.94\'}},\n infinity]}}}}]},\n [{janitor_agent,bulk_set_vbucket_state,4,\n [{file,"src/janitor_agent.erl"},\n {line,403}]},\n {ns_single_vbucket_mover,\n update_replication_post_move,5,\n [{file,"src/ns_single_vbucket_mover.erl"},\n {line,530}]},\n {ns_single_vbucket_mover,on_move_done_body,\n 6,\n [{file,"src/ns_single_vbucket_mover.erl"},\n {line,556}]},\n {proc_lib,init_p,3,\n [{file,"proc_lib.erl"},{line,234}]}]}}}}.\nRebalance Operation Id = 56e71afdf44a06f680a450379f24d556'}
|
2021-02-02 23:51:10,203 | test | ERROR | pool-1-thread-30 | [rest_client:print_UI_logs:2595] {u'code': 0, u'module': u'ns_vbucket_mover', u'type': u'critical', u'node': u'ns_1@172.23.107.94', u'tstamp': 1612338666574L, u'shortText': u'message', u'serverTime': u'2021-02-02T23:51:06.574Z', u'text': u'Worker <0.2741.9> (for action {move,{149,\n [\'ns_1@172.23.107.94\',\n \'ns_1@172.23.107.98\',undefined],\n [\'ns_1@172.23.107.94\',\n \'ns_1@172.23.107.98\',\n \'ns_1@172.23.107.99\'],\n []}}) exited with reason {unexpected_exit,\n {\'EXIT\',\n <0.3043.9>,\n {{bulk_set_vbucket_state_failed,\n [{\'ns_1@172.23.107.99\',\n {\'EXIT\',\n {socket_closed,\n {gen_server,\n call,\n [{\'janitor_agent-bucket2\',\n \'ns_1@172.23.107.99\'},\n {if_rebalance,\n <0.25501.7>,\n {update_vbucket_state,\n 149,\n replica,\n undefined,\n \'ns_1@172.23.107.94\'}},\n infinity]}}}}]},\n [{janitor_agent,\n bulk_set_vbucket_state,\n 4,\n [{file,\n "src/janitor_agent.erl"},\n {line,\n 403}]},\n {ns_single_vbucket_mover,\n update_replication_post_move,\n 5,\n [{file,\n "src/ns_single_vbucket_mover.erl"},\n {line,\n 530}]},\n {ns_single_vbucket_mover,\n on_move_done_body,\n 6,\n [{file,\n "src/ns_single_vbucket_mover.erl"},\n {line,\n 556}]},\n {proc_lib,\n init_p,3,\n [{file,\n "proc_lib.erl"},\n {line,\n 234}]}]}}}'}
|
Logs attached.
Attachments
Issue Links
- is duplicated by
-
MB-44079 Ephemeral out of order purging can cause prepares to be recommitted and DurabilityMonitor montonicity exceptions to throw
- Closed