Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: None
Affects Version/s: 7.2.4
Component/s: ns_server
Labels:
Environment:
Operating System : Debian GNU/Linux 10 (buster)
Couchbase Enterprise Edition build 7.2.4-7059

Triage:
Untriaged
Operating System:
Linux x86_64
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
https://cb-engineering.s3.amazonaws.com/MB-60085/collectinfo-2023-12-11T144517-ns_1@172.23.121.160.zip
https://cb-engineering.s3.amazonaws.com/MB-60085/collectinfo-2023-12-11T144517-ns_1@172.23.121.194.zip
https://cb-engineering.s3.amazonaws.com/MB-60085/collectinfo-2023-12-11T144517-ns_1@172.23.121.198.zip
https://cb-engineering.s3.amazonaws.com/MB-60085/collectinfo-2023-12-11T144517-ns_1@172.23.121.199.zip
https://cb-engineering.s3.amazonaws.com/MB-60085/collectinfo-2023-12-11T144517-ns_1@172.23.121.203.zip

Show
https://cb-engineering.s3.amazonaws.com/MB-60085/collectinfo-2023-12-11T144517-ns_1@172.23.121.160.zip https://cb-engineering.s3.amazonaws.com/MB-60085/collectinfo-2023-12-11T144517-ns_1@172.23.121.194.zip https://cb-engineering.s3.amazonaws.com/MB-60085/collectinfo-2023-12-11T144517-ns_1@172.23.121.198.zip https://cb-engineering.s3.amazonaws.com/MB-60085/collectinfo-2023-12-11T144517-ns_1@172.23.121.199.zip https://cb-engineering.s3.amazonaws.com/MB-60085/collectinfo-2023-12-11T144517-ns_1@172.23.121.203.zip
Story Points:
0
Is this a Regression?:
Unknown

Description

Steps to reproduce

Created a 4 node kv cluster 172.23.121.194, 172.23.121.203, 172.23.121.160, 172.23.121.199
Created an ephemeral bucket named 'default' with replicas=3 and loaded some docs onto it
Disabled auto-failover
Enabled auto-reprovision
Induced a failure "restart_machine" in node 172.23.121.194
A few minutes later, added node 172.23.121.198
Started a rebalance

Rebalance fails with

2023-12-10T22:09:08.502-08:00, ns_orchestrator:0:critical:message(ns_1@172.23.121.199) - Rebalance exited with reason {pre_rebalance_janitor_run_failed,"default",                                 {error,wait_for_memcached_failed,                                     ['ns_1@172.23.121.194']}}.Rebalance Operation Id = dab11fbacca77867586d2306c382740f

The issue is not very consistent, cannot comment if this is a regression.

TAF script to reproduce

guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /data/workspace/debian-p0-collections-vset00-00-auto_reprovision_7.0_P1/testexec.8970.ini GROUP=auto_reprovision,rerun=False,get-cbcollect-info=True,log_level=info,upgrade_version=7.2.4-7059,sirius_url=http://172.23.120.103:4000 -t failover.AutoFailoverTests.AutoFailoverTests.test_rebalance_after_autofailover,timeout=5,num_node_failures=1,nodes_in=1,nodes_out=0,auto_reprovision=True,failover_action=restart_machine,nodes_init=4,can_abort_rebalance=False,bucket_spec=single_bucket.buckets_all_ephemeral_for_rebalance_tests_more_collections,data_load_spec=volume_test_load_with_CRUD_on_collections,GROUP=auto_reprovision'

Job name : debian-collections-auto_reprovision_7.0_P1

Job ref : http://cb-logs-qe.s3-website-us-west-2.amazonaws.com/7.2.4-7059/jenkins_logs/test_suite_executor-TAF/294830/

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Raghav S K

Reporter:: Raghav S K

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 11/Dec/23 8:33 AM

Updated:: 18/Dec/23 12:50 PM

Resolved:: 18/Dec/23 8:43 AM

Gerrit Reviews

There are no open Gerrit changes

[Rebalance] : Rebalance exited with reason {pre_rebalance_janitor_run_failed,"default",{error,wait_for_memcached_failed,['ns_1@172.23.121.194']}}

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty