Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: 7.6.0
Affects Version/s: 7.6.0
Component/s: secondary-index
Labels:
Environment:
7.2.3-6705 --> 7.6.0-1767

Triage:
Untriaged
Operating System:
Linux x86_64
Story Points:
0
Is this a Regression?:
No

Description

Steps to Repro
1. Run the below longevity test on 7.2.3 for 4-5 days.

./sequoia -client 172.23.104.254:2375 -provider file:centos_third_cluster.yml -test tests/integration/7.2/test_7.2.yml -scope tests/integration/7.2/scope_7.2_magma.yml -scale 3 -repeat 0 -log_level 0 -version 7.2.3-6705 -skip_setup=false -skip_test=false -skip_teardown=true -skip_cleanup=false -continue=false -collect_on_error=false -stop_on_error=false -duration=604800 -show_topology=true

2. Upgraded all KV node using graceful failover/recovery strategy.
3. Created 5 nodes on 7.6 with provisioned profile(172.23.108.144 ,172.23.97.179, 172.23.104.176 ,172.23.97.183 ,172.23.121.118) add them to the cluster and remove 4 indexing nodes which is part of cluster in 7.2.3 and do rebalance. This is done to simulate upgrade that we would have on cloud so that file based rebalance get used even during upgrade.

172.23.108.144 7:32:56 AM 9 Nov, 2023

Starting rebalance, KeepNodes = ['ns_1@172.23.104.176','ns_1@172.23.104.216',

'ns_1@172.23.104.249','ns_1@172.23.105.134',

'ns_1@172.23.105.210','ns_1@172.23.105.38',

'ns_1@172.23.105.39','ns_1@172.23.105.91',

'ns_1@172.23.106.37','ns_1@172.23.107.142',

'ns_1@172.23.107.236','ns_1@172.23.107.25',

'ns_1@172.23.108.129','ns_1@172.23.108.134',

'ns_1@172.23.108.136','ns_1@172.23.108.138',

'ns_1@172.23.108.139','ns_1@172.23.108.140',

'ns_1@172.23.108.141','ns_1@172.23.108.143',

'ns_1@172.23.108.144','ns_1@172.23.108.145',

'ns_1@172.23.108.146','ns_1@172.23.108.148',

'ns_1@172.23.121.118','ns_1@172.23.97.179',

'ns_1@172.23.97.183'], EjectNodes = ['ns_1@172.23.108.61',

'ns_1@172.23.108.34',

'ns_1@172.23.108.132',

'ns_1@172.23.106.54'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 9c7f04f312337cd95f93ff81cd2f539b

172.23.108.144 7:33:11 AM 9 Nov, 2023

Rebalance exited with reason {service_rebalance_failed,index,

{worker_died,

{'EXIT',<0.18034.2>,

{{badmatch,

{error,

{unknown_error,

<<"Post \"http://172.23.104.176:9102/registerRebalanceToken\": EOF">>}}},

[{service_manager,rebalance_op,5,

[{file,"src/service_manager.erl"},

{line,341}]},

{service_manager,do_run_op,1,

[{file,"src/service_manager.erl"},

{line,257}]},

{proc_lib,init_p,3,

[{file,"proc_lib.erl"},{line,225}]}]}}}}.

Rebalance Operation Id = 9c7f04f312337cd95f93ff81cd2f539b

Rebalance continues to file on repeated retires. I am going to try a few more times. We would have to mark this a blocker if this doesn't progress. cbcollect_info attached.

Any workarounds are highly appreciated.

Attachments

Issue Links

relates to

MB-60435 Add retry mechanism for EOF errors during HTTP calls in Rebalance

Open

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Balakumaran Gopal

Reporter:: Balakumaran Gopal

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 09/Nov/23 7:41 AM

Updated:: 18/Jan/24 1:27 AM

Resolved:: 17/Nov/23 2:00 AM

Gerrit Reviews

There are no open Gerrit changes

[System Test Upgrade] :- Online upgrade using swap rebalance for 2i fails with "service_rebalance_failed,index, {worker_died, {'EXIT',<0.18034.2>, {{badmatch, {error, {unknown_error, <<"Post \"http://172.23.104.176:9102/registerRebalanceToken\": EOF">>"

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty