Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.2.1
-
couchbase-cloud-server-7.2.1-5819-v1.0.20
-
Untriaged
-
-
0
-
Unknown
-
Analytics Sprint 22
Description
- Create a 3 nodes colocated services cluster on AWS.
- Create a bucket, 2 collections and load 75M items in each collection.
- Create CBAS datasets and indexes. Wait for them to build/ingest data.
- While this is happening it is seen that node 001 failed over. CP tried to add back the node and rebalance it IN.
Analytics Service unable to successfully rebalance 943d20d4e3c0fe8bcb36e3b25842a9a0 due to 'java.lang.Exception: replica com.couchbase.analytics.control.rebalance.TopologyCoordinator$TimedReplicaStatus@582ced3e inactivity timeout; 300 seconds passed with no progress'; see analytics_info.log for details
Failed over ['ns_1@svc-dqisa-node-001.b8ea3-02qejf6ihx.sandbox.nonprod-project-avengers.com']. Failover couldn't complete on some nodes:
['ns_1@svc-dqisa-node-001.b8ea3-02qejf6ihx.sandbox.nonprod-project-avengers.com']
- Rebalance failed:
Starting rebalance, KeepNodes = ['ns_1@svc-dqisa-node-001.b8ea3-02qejf6ihx.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-002.b8ea3-02qejf6ihx.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-003.b8ea3-02qejf6ihx.sandbox.nonprod-project-avengers.com'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 99b90a3489aee7df57aa0cf81626aeaa
Analytics Service unable to successfully rebalance 9c47373917a92825e779ee891df88715 due to 'java.lang.Exception: replica com.couchbase.analytics.control.rebalance.TopologyCoordinator$TimedReplicaStatus@747ed24d inactivity timeout; 300 seconds passed with no progress'; see analytics_info.log for details
- Next Rebalance attempt:
Starting rebalance, KeepNodes = ['ns_1@svc-dqisa-node-001.b8ea3-02qejf6ihx.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-002.b8ea3-02qejf6ihx.sandbox.nonprod-project-avengers.com',
'ns_1@svc-dqisa-node-003.b8ea3-02qejf6ihx.sandbox.nonprod-project-avengers.com'], EjectNodes = [], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = d6acbc11307f34ea9381b627b3bd3850
Analytics Service unable to successfully rebalance 4b3462081972c9fbf923e89a12a2259b due to 'java.lang.IllegalStateException: timed out waiting for all nodes to join & cluster active (missing nodes: [f42c94909e21cfa5316f5317c7f34e78], state: ACTIVE)'; see analytics_info.log for details
- Links seems to be broken and data ingestion is stuck.
Ali Alsuliman, This run was on AWS and t could be related to MB-57636. Please have a look.
QE Test |
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/couchbase_capella_volume_2_new.ini bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False -t aGoodDoctor.hostedHospital.Murphy.test_rebalance,num_items=75000000,num_buckets=1,bucket_names=GleamBook,bucket_type=membase,iterations=5,batch_size=1000,sdk_timeout=60,log_level=debug,infra_log_level=debug,rerun=False,skip_cleanup=True,key_size=18,randomize_doc_size=False,randomize_value=True,maxttl=10,pc=20,gsi_nodes=3,cbas_nodes=3,fts_nodes=3,kv_nodes=3,n1ql_nodes=3,kv_disk=500,n1ql_disk=50,gsi_disk=500,fts_disk=500,cbas_disk=500,kv_compute=m5.4xlarge,gsi_compute=m5.4xlarge,n1ql_compute=m5.4xlarge,fts_compute=m5.4xlarge,cbas_compute=m5.4xlarge,mutation_perc=20,key_type=CircularKey,capella_run=true,services=data:query:index:analytics:search,max_rebl_nodes=27,provider=AWS,region=us-east-1,type=GP3,size=500,skip_teardown_cleanup=false,wait_timeout=14400,index_timeout=28800,runtype=dedicated,sanity=True'
|
Attachments
Issue Links
- links to