Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
7.0.0-4889-enterprise
-
Untriaged
-
Centos 64-bit
-
1
-
Yes
Description
Scripts to Repro
{'GROUP': 'failover_with_collection_crud_durability_MAJORITY_AND_PERSIST_TO_ACTIVE_dgm',
|
'cluster_name': 'testexec.62576',
|
'conf_file': 'conf/collections/collections_failover_crud_on_collections_dgm.conf',
|
'ini': '/tmp/testexec.62576.ini',
|
'num_nodes': 5,
|
'rerun': 'False',
|
'spec': 'collections_failover_crud_on_collections_dgm',
|
'upgrade_version': '7.0.0-4889'}
|
Its not a particular test thats failing. So, in our conf file we have 6 tests, 2 tests pass fine. From 3rd test onwards all the subsequent test fails in setting up a cluster.
Attaching the last failure from test_6 so that we get a chance to look at all the logs/failures before too.
Script
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.62576.ini GROUP=failover_with_collection_crud_durability_MAJORITY_AND_PERSIST_TO_ACTIVE_dgm,rerun=False,upgrade_version=7.0.0-4889 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_hard_failover_recovery,nodes_init=5,nodes_failover=1,recovery_type=delta,override_spec_params=durability;replicas,durability=MAJORITY_AND_PERSIST_TO_ACTIVE,step_count=1,replicas=2,bucket_spec=dgm.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,num_items=32000,dgm=60,skip_validations=False,GROUP=failover_with_collection_crud_durability_MAJORITY_AND_PERSIST_TO_ACTIVE_dgm'
|
Following rebalance fails
2021-04-08 01:15:23,651 | test | INFO | pool-8-thread-6 | [table_view:display:72] Rebalance Overview
----------------------++-------------------------------------------------
Nodes | Services | Version | CPU | Status |
----------------------++-------------------------------------------------
172.23.122.58 | kv | 7.0.0-4889-enterprise | 3.68129097327 | Cluster node |
172.23.97.218 | None | <--- IN — | ||
172.23.96.197 | None | <--- IN — | ||
172.23.96.220 | None | <--- IN — | ||
172.23.96.196 | None | <--- IN — |
----------------------++-------------------------------------------------
2021-04-08 01:15:28,684 | test | ERROR | pool-8-thread-6 | [rest_client:_rebalance_status_and_progress:1510] {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'type': u'rebalance', u'masterRequestTimedOut': False, u'statusId': u'f905a8a731f10858dc7fe0a8aecaa6c6', u'statusIsStale': False, u'lastReportURI': u'/logs/rebalanceReport?reportID=b0296a1a146d458d46f1e229d387714e', u'status': u'notRunning'} - rebalance failed
|
2021-04-08 01:15:28,710 | test | INFO | pool-8-thread-6 | [rest_client:print_UI_logs:2611] Latest logs from UI on 172.23.122.58:
|
2021-04-08 01:15:28,711 | test | ERROR | pool-8-thread-6 | [rest_client:print_UI_logs:2613] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.122.58', u'tstamp': 1617869723778L, u'shortText': u'message', u'serverTime': u'2021-04-08T01:15:23.778Z', u'text': u"Rebalance exited with reason {buckets_cleanup_failed,['ns_1@172.23.96.196']}.\nRebalance Operation Id = d623f07b93a35d71d94d86a93dab5dbf"}
|
2021-04-08 01:15:28,711 | test | ERROR | pool-8-thread-6 | [rest_client:print_UI_logs:2613] {u'code': 0, u'module': u'ns_rebalancer', u'type': u'critical', u'node': u'ns_1@172.23.122.58', u'tstamp': 1617869723777L, u'shortText': u'message', u'serverTime': u'2021-04-08T01:15:23.777Z', u'text': u"Failed to cleanup old buckets on node 'ns_1@172.23.96.196': {error,ebusy}"}
|
2021-04-08 01:15:28,711 | test | ERROR | pool-8-thread-6 | [rest_client:print_UI_logs:2613] {u'code': 0, u'module': u'ns_couchdb_api', u'type': u'critical', u'node': u'ns_1@172.23.96.196', u'tstamp': 1617869723776L, u'shortText': u'message', u'serverTime': u'2021-04-08T01:15:23.776Z', u'text': u'Unable to delete bucket database directory overlay\n{error,ebusy}'}
|
2021-04-08 01:15:28,711 | test | ERROR | pool-8-thread-6 | [rest_client:print_UI_logs:2613] {u'code': 0, u'module': u'ns_storage_conf', u'type': u'info', u'node': u'ns_1@172.23.96.196', u'tstamp': 1617869723769L, u'shortText': u'message', u'serverTime': u'2021-04-08T01:15:23.769Z', u'text': u'Deleting old data files of bucket "overlay"'}
|
2021-04-08 01:15:28,711 | test | ERROR | pool-8-thread-6 | [rest_client:print_UI_logs:2613] {u'code': 0, u'module': u'memcached_config_mgr', u'type': u'info', u'node': u'ns_1@172.23.96.196', u'tstamp': 1617869723710L, u'shortText': u'message', u'serverTime': u'2021-04-08T01:15:23.710Z', u'text': u'Hot-reloaded memcached.json for config change of the following keys: [<<"scramsha_fallback_salt">>]'}
|
2021-04-08 01:15:28,713 | test | ERROR | pool-8-thread-6 | [rest_client:print_UI_logs:2613] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'info', u'node': u'ns_1@172.23.122.58', u'tstamp': 1617869723623L, u'shortText': u'message', u'serverTime': u'2021-04-08T01:15:23.623Z', u'text': u"Starting rebalance, KeepNodes = ['ns_1@172.23.96.197','ns_1@172.23.97.218',\n 'ns_1@172.23.96.220','ns_1@172.23.96.196',\n 'ns_1@172.23.122.58'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = d623f07b93a35d71d94d86a93dab5dbf"}
|
2021-04-08 01:15:28,713 | test | ERROR | pool-8-thread-6 | [rest_client:print_UI_logs:2613] {u'code': 3, u'module': u'ns_cluster', u'type': u'info', u'node': u'ns_1@172.23.96.196', u'tstamp': 1617869723595L, u'shortText': u'message', u'serverTime': u'2021-04-08T01:15:23.595Z', u'text': u'Node ns_1@172.23.96.196 joined cluster'}
|
This was not seen on 7.0.0-4854. cbcollect_info attached.