Loading...

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.0.0
Affects Version/s: Cheshire-Cat
Component/s: couchbase-bucket
Labels:
Environment:
7.0.0-4889-enterprise

Triage:
Untriaged
Operating System:
Centos 64-bit
Epic Link:
KV: Collections
Story Points:
1
Is this a Regression?:
Yes

Description

Scripts to Repro

{'GROUP': 'failover_with_collection_crud_durability_MAJORITY_AND_PERSIST_TO_ACTIVE_dgm',

 'cluster_name': 'testexec.62576',

 'conf_file': 'conf/collections/collections_failover_crud_on_collections_dgm.conf',

 'ini': '/tmp/testexec.62576.ini',

 'num_nodes': 5,

 'rerun': 'False',

 'spec': 'collections_failover_crud_on_collections_dgm',

 'upgrade_version': '7.0.0-4889'}

Its not a particular test thats failing. So, in our conf file we have 6 tests, 2 tests pass fine. From 3rd test onwards all the subsequent test fails in setting up a cluster.

Attaching the last failure from test_6 so that we get a chance to look at all the logs/failures before too.

Script

guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/testexec.62576.ini GROUP=failover_with_collection_crud_durability_MAJORITY_AND_PERSIST_TO_ACTIVE_dgm,rerun=False,upgrade_version=7.0.0-4889 -t bucket_collections.collections_rebalance.CollectionsRebalance.test_data_load_collections_with_hard_failover_recovery,nodes_init=5,nodes_failover=1,recovery_type=delta,override_spec_params=durability;replicas,durability=MAJORITY_AND_PERSIST_TO_ACTIVE,step_count=1,replicas=2,bucket_spec=dgm.buckets_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,data_load_stage=during,num_items=32000,dgm=60,skip_validations=False,GROUP=failover_with_collection_crud_durability_MAJORITY_AND_PERSIST_TO_ACTIVE_dgm'

Following rebalance fails
2021-04-08 01:15:23,651 | test | INFO | pool-8-thread-6 | [table_view:display:72] Rebalance Overview
----------------------++-------------------------------------------------

Nodes

Services

Version

CPU

Status

----------------------++-------------------------------------------------

172.23.122.58	kv	7.0.0-4889-enterprise	3.68129097327	Cluster node
172.23.97.218	None			<--- IN —
172.23.96.197	None			<--- IN —
172.23.96.220	None			<--- IN —
172.23.96.196	None			<--- IN —

----------------------++-------------------------------------------------

2021-04-08 01:15:28,684 | test  | ERROR   | pool-8-thread-6 | [rest_client:_rebalance_status_and_progress:1510] {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'type': u'rebalance', u'masterRequestTimedOut': False, u'statusId': u'f905a8a731f10858dc7fe0a8aecaa6c6', u'statusIsStale': False, u'lastReportURI': u'/logs/rebalanceReport?reportID=b0296a1a146d458d46f1e229d387714e', u'status': u'notRunning'} - rebalance failed

2021-04-08 01:15:28,710 | test  | INFO    | pool-8-thread-6 | [rest_client:print_UI_logs:2611] Latest logs from UI on 172.23.122.58:

2021-04-08 01:15:28,711 | test  | ERROR   | pool-8-thread-6 | [rest_client:print_UI_logs:2613] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.122.58', u'tstamp': 1617869723778L, u'shortText': u'message', u'serverTime': u'2021-04-08T01:15:23.778Z', u'text': u"Rebalance exited with reason {buckets_cleanup_failed,['ns_1@172.23.96.196']}.\nRebalance Operation Id = d623f07b93a35d71d94d86a93dab5dbf"}

2021-04-08 01:15:28,711 | test  | ERROR   | pool-8-thread-6 | [rest_client:print_UI_logs:2613] {u'code': 0, u'module': u'ns_rebalancer', u'type': u'critical', u'node': u'ns_1@172.23.122.58', u'tstamp': 1617869723777L, u'shortText': u'message', u'serverTime': u'2021-04-08T01:15:23.777Z', u'text': u"Failed to cleanup old buckets on node 'ns_1@172.23.96.196': {error,ebusy}"}

2021-04-08 01:15:28,711 | test  | ERROR   | pool-8-thread-6 | [rest_client:print_UI_logs:2613] {u'code': 0, u'module': u'ns_couchdb_api', u'type': u'critical', u'node': u'ns_1@172.23.96.196', u'tstamp': 1617869723776L, u'shortText': u'message', u'serverTime': u'2021-04-08T01:15:23.776Z', u'text': u'Unable to delete bucket database directory overlay\n{error,ebusy}'}

2021-04-08 01:15:28,711 | test  | ERROR   | pool-8-thread-6 | [rest_client:print_UI_logs:2613] {u'code': 0, u'module': u'ns_storage_conf', u'type': u'info', u'node': u'ns_1@172.23.96.196', u'tstamp': 1617869723769L, u'shortText': u'message', u'serverTime': u'2021-04-08T01:15:23.769Z', u'text': u'Deleting old data files of bucket "overlay"'}

2021-04-08 01:15:28,711 | test  | ERROR   | pool-8-thread-6 | [rest_client:print_UI_logs:2613] {u'code': 0, u'module': u'memcached_config_mgr', u'type': u'info', u'node': u'ns_1@172.23.96.196', u'tstamp': 1617869723710L, u'shortText': u'message', u'serverTime': u'2021-04-08T01:15:23.710Z', u'text': u'Hot-reloaded memcached.json for config change of the following keys: [<<"scramsha_fallback_salt">>]'}

2021-04-08 01:15:28,713 | test  | ERROR   | pool-8-thread-6 | [rest_client:print_UI_logs:2613] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'info', u'node': u'ns_1@172.23.122.58', u'tstamp': 1617869723623L, u'shortText': u'message', u'serverTime': u'2021-04-08T01:15:23.623Z', u'text': u"Starting rebalance, KeepNodes = ['ns_1@172.23.96.197','ns_1@172.23.97.218',\n                                 'ns_1@172.23.96.220','ns_1@172.23.96.196',\n                                 'ns_1@172.23.122.58'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = d623f07b93a35d71d94d86a93dab5dbf"}

2021-04-08 01:15:28,713 | test  | ERROR   | pool-8-thread-6 | [rest_client:print_UI_logs:2613] {u'code': 3, u'module': u'ns_cluster', u'type': u'info', u'node': u'ns_1@172.23.96.196', u'tstamp': 1617869723595L, u'shortText': u'message', u'serverTime': u'2021-04-08T01:15:23.595Z', u'text': u'Node ns_1@172.23.96.196 joined cluster'}

This was not seen on 7.0.0-4854. cbcollect_info attached.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

consoleText.zip
8.94 MB
08/Apr/21 4:05 AM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

[Collections] - Setting cluster fails with "Rebalance exited with reason {buckets_cleanup_failed,['ns_1@172.23.96.196']}" in DGM jobs

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty