Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.0.0
Affects Version/s: Cheshire-Cat
Component/s: secondary-index
Labels:
Environment:
6.6.2->9588 to 7.0.0-5033

Triage:
Untriaged
Operating System:
Centos 64-bit
Story Points:
1
Is this a Regression?:
No

Description

Steps to repro
1. Run longevity system test on 6.6.2 for 3 days. It initially had 27 node cluster.

 ./sequoia -client 172.23.96.162:2375 -provider file:centos_third_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.2-9588 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true

2. Swap rebalance 6 nodes one of each service type.
3. Now failover 6 nodes, one of each service type.
4. sysemctl stop couchbas-server on 6 nodes.
5. Install 7.0.0 on all of them.
6. Do a recovery + rebalance
7. Now failover 4 nodes, one of each service type.(172.23.106.117 172.23.105.25 172.23.105.210 172.23.105.206)
8. Do a delta recovery(for kv) + rebalance.

This rebalance hangs. Waited for almost 3.5 hours.

Rebalance Details
ns_1@172.23.105.102 2:11:41 AM 26 Apr, 2021

Starting rebalance, KeepNodes = ['ns_1@172.23.104.15','ns_1@172.23.104.214',

'ns_1@172.23.104.232','ns_1@172.23.104.244',

'ns_1@172.23.104.245','ns_1@172.23.105.102',

'ns_1@172.23.105.109','ns_1@172.23.105.112',

'ns_1@172.23.105.118','ns_1@172.23.105.164',

'ns_1@172.23.105.206','ns_1@172.23.105.210',

'ns_1@172.23.105.25','ns_1@172.23.105.29',

'ns_1@172.23.105.62','ns_1@172.23.105.86',

'ns_1@172.23.105.90','ns_1@172.23.105.93',

'ns_1@172.23.106.117','ns_1@172.23.106.191',

'ns_1@172.23.106.207','ns_1@172.23.106.225',

'ns_1@172.23.106.232','ns_1@172.23.106.239',

'ns_1@172.23.106.246','ns_1@172.23.106.37'], EjectNodes = [], Failed over and being ejected nodes = [], Delta recovery nodes = ['ns_1@172.23.105.206'], Delta recovery buckets = all; Operation Id = 311b47dd1673729487fe143259e68cce

It would be highly appreciated if you can provide a work around so that we can go forward and complete the upgrade of the entire cluster.

cbcollect_info attached. I am assigning this to secondary index as rebalance got completed for kv and fts but hung at 2i rebalance.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Balakumaran Gopal

Reporter:: Balakumaran Gopal

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Due:: 05/May/21

Created:: 26/Apr/21 5:49 AM

Updated:: 17/Jun/21 3:10 PM

Resolved:: 04/May/21 1:29 PM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

MB-45920 Restart Inactive MAINT_STREAM for Catchup index: Gerrit Review:

MB-45920 add config for index reset on rollback to 0: Gerrit Review:

[System Test Upgrade] - Online upgrade using graceful failover + recovery + rebalance hangs

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty