Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: None
Affects Version/s: 7.1.4
Component/s: analytics
Labels:
Environment:
Enterprise Edition 7.1.4 build 3638

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:
http://supportal.couchbase.com/snapshot/1427b736ada0a402107e8666f3a01ff9::0
Story Points:
0
Is this a Regression?:
Unknown

Description

Test steps

Deploy an GCP cluster having 3 KV, GSI, Query and FTS nodes each separately.
Create a Magma bucket having single replica and 1 scope + 2 collections in addition to _default._default keyspace.
Load 5M docs in each of the 2 collections.
Create GSI Indexes, wait for the Indexes to come online and run queries against them.
Start KV workload 10k/s.
Increase disk size by 5G for all service groups.
Online scaling operation goes through fine without any issues.

Decrease the disk size by 5G for all the service groups. This triggers a swap rebalance for all the nodes one at a time. Swap rebalance for cbas failing:

Rebalance Activity
Starting rebalance, KeepNodes = ['ns_1@svc-a-node-011.np7cmh-hxy7-vgln.sandbox.nonprod-project-avengers.com',
'ns_1@svc-a-node-012.np7cmh-hxy7-vgln.sandbox.nonprod-project-avengers.com',
'ns_1@svc-a-node-018.np7cmh-hxy7-vgln.sandbox.nonprod-project-avengers.com',
'ns_1@svc-d-node-013.np7cmh-hxy7-vgln.sandbox.nonprod-project-avengers.com',
'ns_1@svc-d-node-014.np7cmh-hxy7-vgln.sandbox.nonprod-project-avengers.com',
'ns_1@svc-d-node-015.np7cmh-hxy7-vgln.sandbox.nonprod-project-avengers.com',
'ns_1@svc-qi-node-004.np7cmh-hxy7-vgln.sandbox.nonprod-project-avengers.com',
'ns_1@svc-qi-node-005.np7cmh-hxy7-vgln.sandbox.nonprod-project-avengers.com',
'ns_1@svc-qi-node-006.np7cmh-hxy7-vgln.sandbox.nonprod-project-avengers.com',
'ns_1@svc-s-node-009.np7cmh-hxy7-vgln.sandbox.nonprod-project-avengers.com',
'ns_1@svc-s-node-016.np7cmh-hxy7-vgln.sandbox.nonprod-project-avengers.com',
'ns_1@svc-s-node-017.np7cmh-hxy7-vgln.sandbox.nonprod-project-avengers.com'], EjectNodes = ['ns_1@svc-a-node-010.np7cmh-hxy7-vgln.sandbox.nonprod-project-avengers.com'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 327a570d2e5aa3e0647921f14d434456

First failure
Rebalance exited with reason {service_rebalance_failed,cbas,
{worker_died,
{'EXIT',<0.23349.29>,
{rebalance_failed,
{service_error,
<<"Rebalance 85b674cc8e804b9c806b6ae210376a3b failed: see analytics_info.log for details">>}}}}}.
Rebalance Operation Id = fdf6a79b02b963dd3be3c30a111c506a

Analytics Service unable to successfully rebalance e933bc4742d4866e9f78de073471adc8 due to 'java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [130fd128ae000f61de18d44341223852], state: UNUSABLE)'; see analytics_info.log for details

QE Test

sudo guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/capella.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False -t aGoodDoctor.hostedHospital.Murphy.test_rebalance,graceful=True,skip_cleanup=True,num_buckets=1,bucket_names=GleamBook,bucket_type=membase,eviction_policy=fullEviction,iterations=10,batch_size=1000,sdk_timeout=60,log_level=debug,infra_log_level=debug,rerun=False,skip_cleanup=True,key_size=24,randomize_doc_size=False,randomize_value=True,maxttl=10,pc=20,gsi_nodes=3,cbas_nodes=3,fts_nodes=3,kv_nodes=3,n1ql_nodes=3,mutation_perc=100,key_type=RandomKey,capella_run=true,services=data-query:index-search-analytics,max_rebl_nodes=27,kv_compute=n2-standard-16,gsi_compute=n2-standard-16,n1ql_compute=n2-standard-16,fts_compute=n2-standard-16,cbas_compute=n2-standard-16,kv_disk=500,n1ql_disk=50,gsi_disk=500,cbas_disk=500,provider=GCP,region=us-central1,type=PD-SSD,skip_teardown_cleanup=true,wait_timeout=14400,index_timeout=28800,runtype=dedicated,track_failures=True,skip_init=False,key_type=CircularKey,rebalance_type=disk,clients_per_db=1 -m rest'

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Ritesh Agarwal

Reporter:: Ritesh Agarwal

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 12/Jun/23 4:10 PM

Updated:: 25/Jun/23 3:11 PM

Resolved:: 23/Jun/23 8:19 PM

Gerrit Reviews

There are no open Gerrit changes

[Provisioned/GCP]: Analytics Swap rebalance triggered by CP due to disk size reduction is failing forever in a loop. Rebalance timing out waiting for all nodes to join every 5 mins.

Details

Description

Test steps

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty