Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: 7.0.0
Affects Version/s: Cheshire-Cat
Component/s: ns_server
Labels:
Environment:
7.0.0-5085-enterprise

Triage:
Untriaged
Operating System:
Centos 64-bit
Epic Link:
KV: Collections
Story Points:
1
Is this a Regression?:
Yes

Description

Script to Repro

guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/win10-bucket-ops.ini rerun=False,get-cbcollect-info=True,quota_percent=95,crash_warning=True,GROUP=rebalance_with_collection_crud_durability_MAJORITY_AND_PERSIST_TO_ACTIVE,rerun=False -t bucket_collections.collections_rebalance.CollectionsRebalance.test_rebalance_cycles,nodes_init=4,nodes_in=2,durability=MAJORITY_AND_PERSIST_TO_ACTIVE,replicas=2,bucket_spec=single_bucket.default,num_items=10000,bulk_api_crud=True,GROUP=rebalance_with_collection_crud_durability_MAJORITY_AND_PERSIST_TO_ACTIVE'

Steps to Repro
1. Create a 4 node cluster
2021-05-02 23:56:37,022 | test | INFO | pool-7-thread-6 | [table_view:display:72] Rebalance Overview
----------------------------------------------------------------------

Nodes

Services

Version

CPU

Status

----------------------------------------------------------------------

172.23.98.196	kv	7.0.0-5085-enterprise	6.98260650366	Cluster node
172.23.98.195	None			<--- IN —
172.23.121.10	None			<--- IN —
172.23.104.186	None			<--- IN —

----------------------------------------------------------------------

2. Create bucket/scope/collections/data.
2021-05-02 23:57:55,855 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
------------------------------------------------------------------------------

Bucket

Type

Replicas

Durability

TTL

Items

RAM Quota

RAM Used

Disk Used

------------------------------------------------------------------------------

VG-52-682000

couchbase

none

10000

10825498624

103923744

178484619

------------------------------------------------------------------------------

3. Start crud on collections + durability data load.

4. Start a rebalance in
2021-05-02 23:58:10,523 | test | INFO | pool-7-thread-17 | [table_view:display:72] Rebalance Overview
----------------------------------------------------------------------

Nodes

Services

Version

CPU

Status

----------------------------------------------------------------------

172.23.98.196	kv	7.0.0-5085-enterprise	9.64402928553	Cluster node
172.23.98.195	kv	7.0.0-5085-enterprise	17.5447441391	Cluster node
172.23.104.186	kv	7.0.0-5085-enterprise	10.2467270896	Cluster node
172.23.121.10	kv	7.0.0-5085-enterprise	11.4379913771	Cluster node
172.23.120.206	None			<--- IN —

----------------------------------------------------------------------

Rebalance in fails as shown below.

2021-05-02 23:58:30,848 | test  | ERROR   | pool-7-thread-17 | [rest_client:_rebalance_status_and_progress:1510] {u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.', u'type': u'rebalance', u'masterRequestTimedOut': False, u'statusId': u'072936ba1c3c193c7d8610c56757bedb', u'statusIsStale': False, u'lastReportURI': u'/logs/rebalanceReport?reportID=1ae5320e1248720433ca8b05e521ff96', u'status': u'notRunning'} - rebalance failed

2021-05-02 23:58:31,163 | test  | INFO    | pool-7-thread-17 | [rest_client:print_UI_logs:2611] Latest logs from UI on 172.23.98.196:

2021-05-02 23:58:31,164 | test  | ERROR   | pool-7-thread-17 | [rest_client:print_UI_logs:2613] {u'code': 0, u'module': u'ns_orchestrator', u'type': u'critical', u'node': u'ns_1@172.23.98.196', u'tstamp': 1620025105878L, u'shortText': u'message', u'serverTime': u'2021-05-02T23:58:25.878Z', u'text': u"Rebalance exited with reason {buckets_cleanup_failed,['ns_1@172.23.104.186']}.\nRebalance Operation Id = 0c7edaeac725974d21c90107c8baf059"}

This is not consistently reproducible. I tried running many times, no luck reproing so far.
This was not seen on last weekly run we had on 7.0.0-5017.

cbcollect_info attached.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

MB-46067.txt
4.53 MB
03/May/21 7:14 AM

Issue Links

duplicates

MB-45594 [System Test][couchbase-bucket] Rebalance failures with error buckets_cleanup_failed

Closed

MB-46099 130 node testing: Rebalance in 1 node in a 85 node cluster failed due to buckets_cleanup_failed

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Dave Finlay

Reporter:: Balakumaran Gopal

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Due:: 06/May/21

Created:: 03/May/21 7:11 AM

Updated:: 06/Jan/22 11:10 AM

Resolved:: 05/May/21 8:12 AM

Gerrit Reviews

There are no open Gerrit changes

[Collections] - Rebalance in fails with Rebalance exited with reason {buckets_cleanup_failed,

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty