Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.1.0
Affects Version/s: 7.0.2
Component/s: ns_server
Labels:
- system_test_upgrade
- upgrade
Environment:
6.6.3-9808 -> 7.0.2-6668

Triage:
Untriaged
Operating System:
Centos 64-bit
Story Points:
1
Is this a Regression?:
No

Description

Steps to Repro
1. Run the following longevity script on 6.6.3 for 5 days.

./sequoia -client 172.23.104.254:2375 -provider file:centos_second_cluster.yml -test tests/integration/test_allFeatures_madhatter_durability.yml -scope tests/integration/scope_Xattrs_Madhatter.yml -scale 3 -repeat 0 -log_level 0 -version 6.6.3-9808 -skip_setup=true -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true

At this point it should have a 27 node cluster ( 9 Kv, 6 Index, 3 analytics, 3 fts, 3 eventing and 3 n1ql)
2. Create 10k metakv tombstones. This has been part of our testing since ~~MB-44838~~ was fixed. We used to have a total of around 25k for CC, have reduced it here to around 12k.

 #!/bin/sh

for i in {0..10000}

do

        `curl -X PUT -u Administrator:password http://localhost:8091/_metakv/key{$i} -d 'value=foo1'`

        `curl -X DELETE -v -u Administrator:password http://localhost:8091/_metakv/key{$i}`

    done

3. Swap rebalance 6 nodes , 1 of each service with that of 7.0.2 nodes. Rebalance goes through successfully.
4. Failover 6 nodes(6.6.3 nodes)1 of each service(kv is graceful failover), Upgrade these nodes to 7.0.2, do a recovery of all the 6 node(kv is delta recovery) and rebalance.

ns_1@172.23.106.136 1:12:01 AM 13 Sep, 2021

Starting rebalance, KeepNodes = ['ns_1@172.23.106.134','ns_1@172.23.106.136',

'ns_1@172.23.106.137','ns_1@172.23.106.138',

'ns_1@172.23.120.58','ns_1@172.23.120.73',

'ns_1@172.23.120.74','ns_1@172.23.120.75',

'ns_1@172.23.120.77','ns_1@172.23.120.81',

'ns_1@172.23.120.86','ns_1@172.23.121.118',

'ns_1@172.23.121.77','ns_1@172.23.123.24',

'ns_1@172.23.123.25','ns_1@172.23.123.26',

'ns_1@172.23.123.31','ns_1@172.23.123.32',

'ns_1@172.23.123.33','ns_1@172.23.96.122',

'ns_1@172.23.96.14','ns_1@172.23.96.243',

'ns_1@172.23.97.105','ns_1@172.23.97.148',

'ns_1@172.23.97.149','ns_1@172.23.97.150',

'ns_1@172.23.97.151'], EjectNodes = [], Failed over and being ejected nodes = [], Delta recovery nodes = ['ns_1@172.23.96.14'], Delta recovery buckets = all; Operation Id = 8fa9cee395483fda91678362bea50af3

The above rebalance fails as shown in rebalance_report_20210913T082158.json. The rebalance failure is humongous which I believe is a dup of ~~MB-46805~~. If it's not, we should file a new one.

cbcollect_info attached. This the first time we are running this system test upgrade on 7.0.2, hence there is no baseline as such and no last working build.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

rebalance_report_20210913T082158.json
25.72 MB
13/Sep/21 1:40 AM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Balakumaran Gopal

Reporter:: Balakumaran Gopal

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 13/Sep/21 1:46 AM

Updated:: 25/Jan/22 1:14 AM

Resolved:: 19/Nov/21 3:37 AM

Gerrit Reviews

There are no open Gerrit changes

[System Test] - Online upgrade with graceful failover fails with "Rebalance exited with reason {mover_crashed, {unexpected_exit, {'EXIT',<0.18147.69>, {failed_to_update_vbucket_map,"HISTORY",641, {error, [{'ns_1@172.23.120.81', {exit, {{{timeout"

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty