Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.0.0
Affects Version/s: Cheshire-Cat
Component/s: secondary-index
Labels:
- system_test_upgrade
- upgrade
Environment:
6.6.2-9588 -> 7.0.0-5226

Triage:
Untriaged
Operating System:
Centos 64-bit
Story Points:
1
Is this a Regression?:
No

Description

Steps to Repro
1. Run the following longevity on 6.6.2 for 3-4 days

./sequoia -client 172.23.96.162:2375 -provider file:centos_third_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.2-9588 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true

2. We have 27 node cluster in 6.6.2
3. Add 6 nodes(1 of each service - 7.0.0-5226) and remove 6 nodes(6.6.2) and do a swap rebalance to upgrade the cluster.
4. Failover 6 node(1 of each service - 6.6.2), upgrade, do a recovery and rebalance. Noticed errors like the following.

ns_1@172.23.106.70 9:22:46 AM 25 May, 2021

Starting rebalance, KeepNodes = ['ns_1@172.23.104.15','ns_1@172.23.104.214',

'ns_1@172.23.104.232','ns_1@172.23.104.244',

'ns_1@172.23.104.245','ns_1@172.23.105.102',

'ns_1@172.23.105.109','ns_1@172.23.105.112',

'ns_1@172.23.105.118','ns_1@172.23.105.206',

'ns_1@172.23.105.210','ns_1@172.23.105.25',

'ns_1@172.23.105.29','ns_1@172.23.105.61',

'ns_1@172.23.105.86','ns_1@172.23.105.90',

'ns_1@172.23.106.117','ns_1@172.23.106.191',

'ns_1@172.23.106.207','ns_1@172.23.106.225',

'ns_1@172.23.106.232','ns_1@172.23.106.239',

'ns_1@172.23.106.246','ns_1@172.23.106.37',

'ns_1@172.23.106.54','ns_1@172.23.106.70',

'ns_1@172.23.110.75'], EjectNodes = [], Failed over and being ejected nodes = [], Delta recovery nodes = ['ns_1@172.23.105.90'], Delta recovery buckets = all; Operation Id = ce74d9abdb4cfff2d661b7ff0a20a220

This has hung in indexing rebalance for over 11 hours.
I think it could possibly be a dup of ~~MB-45939~~. I don't want to pollute the timeline of that bug, hence raising a new one in case it turns out to be a different bug.

However I do have a few questions that would help us in progressing.

1. Is this actually an hang or just that indexing rebalance is progressing slowly ? There was a hypothesis that it could be the case in ~~MB-45939~~.
2. How do you differentiate between hang v/s slow progress ? Since UI has no discernible way to tell if rebalance hung or is progressing slowly I have relied on the following CLI command to determine that its hung. If % does not change for say 5 hours I assume its hung. Is it ok to rely on it or is there a better way to do it?

[root@localhost ~]# date; /opt/couchbase/bin/couchbase-cli rebalance-status -c localhost:8091 --username Administrator  --password password

Tue May 25 20:34:47 PDT 2021

  "status": "running",

  "msg": "Rebalance is running",

  "details": {

    "progress": 55.55555555555556,

    "refresh": 0.25,

    "totalBuckets": 10,

    "curBucket": 10,

    "curBucketName": "default",

    "docsRemaining": 0

3. If this is a hang, I would stop/start rebalance as a work around to continue the upgrade the cluster. Would it help in this case? If not , is there any around work around to this?

I don't want to stop/start rebalance until we get some kind of confirmation that it is hang.
cbcollect_info attached.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Varun Velamuri

Reporter:: Balakumaran Gopal

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 25/May/21 8:51 PM

Updated:: 17/Jun/21 3:10 PM

Resolved:: 26/May/21 12:34 AM

Gerrit Reviews

There are no open Gerrit changes

[System test] - Online upgrade using graceful failover + recovery + rebalance hangs in indexing rebalance

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty