Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
7.6.0
-
Enterprise Edition 7.6.0 build 2144
-
Untriaged
-
0
-
Unknown
Description
Test Config:
- 6 FTS nodes
- 1 FTS index
- 108 partitions per index - 18 partition per index per node
- 0 Replicas
Dataset
- 5 Million KV data in 1 collections
- 5 Million docs in FTS
- 1536 Vector Dimension
- 1 query thread shooting FTS vector queries on fts indexes randomly
- 2000 ops - pure upserts
Steps:
- Start 1 query thread shooting FTS vector queries on fts indexes randomly
- Start KV load of 2000 ops pure upserts
- Add 1 fts node and rebalance - passed
- Remove 1 fts node and rebalance - passed
- Remove 1 and add 2 fts node and rebalance - passed
- Swap rebalance 1 fts node and rebalance - passed
- Failover 1 node and rebalance out - failed
Starting rebalance, KeepNodes = ['ns_1@172.23.107.76','ns_1@172.23.107.97',
'ns_1@172.23.107.232','ns_1@172.23.107.240',
'ns_1@172.23.121.78','ns_1@172.23.107.220',
'ns_1@172.23.107.237'], EjectNodes = [], Failed over and being ejected nodes = ['ns_1@172.23.107.221']; no delta recovery nodes; Operation Id = d9f411071e4c0a2ac3466588b9c36668
Rebalance exited with reason {service_rebalance_failed,fts,
{agent_died,<35628.18378.91>,
{linked_process_died,<35628.10185.95>,
{'ns_1@172.23.107.232',
{{badmatch,
{false,
{topology,[],
[<<"075fa765b644a9c0d5438e7fc737d4a9">>,
<<"5a8daa275cf24b72c6f0c480d6284c76">>,
<<"62a8ed8bfe6b8fcc7e96ac47bc1baec0">>,
<<"aa221da8e3d2a480c22dc6ab7aa90b1b">>,
<<"c4bc1b5a5f5e9c99b6f52a0a980a1ed8">>,
<<"ea1d3f6a7acebe39b838cfd45631a2fc">>],
false,[]},
{topology,[],
[<<"075fa765b644a9c0d5438e7fc737d4a9">>,
<<"5a8daa275cf24b72c6f0c480d6284c76">>,
<<"62a8ed8bfe6b8fcc7e96ac47bc1baec0">>,
<<"aa221da8e3d2a480c22dc6ab7aa90b1b">>,
<<"c4bc1b5a5f5e9c99b6f52a0a980a1ed8">>,
<<"ea1d3f6a7acebe39b838cfd45631a2fc">>],
true,[]}}},
[{service_agent,long_poll_worker_loop,5,
[{file,"src/service_agent.erl"},
{line,750}]},
{proc_lib,init_p,3,
[{file,"proc_lib.erl"},{line,225}]}]}}}}}.
Rebalance Operation Id = d9f411071e4c0a2ac3466588b9c36668
As a side note:
[16 Feb, 2024, 9:48:41 AM] - Warning: On node 172.23.107.232 system memory use is 85.46% of total available memory, above the warning threshold of 85%. |
[16 Feb, 2024, 9:48:49 AM] - CRITICAL: On node 172.23.107.97 system memory use is 90.30% of total available memory, above the critical threshold of 90%. |
Hi Abhi Dangeti, could you please help understand why the memory is shooting up even after the test is finished/failed/no workload. Still merging processes are running?
QE Test |
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P args=-i /tmp/magma_temp_job.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False -t aGoodDoctor.vectorSearch.VectorVolume.Murphy.ClusterOpsVolume,nodes_init=1,graceful=True,skip_cleanup=True,num_items=5000000,num_buckets=1,bucket_names=GleamBook,doc_size=1024,bucket_type=membase,eviction_policy=fullEviction,iterations=2,batch_size=1000,sdk_timeout=60,log_level=debug,infra_log_level=debug,rerun=False,skip_cleanup=True,key_size=18,randomize_doc_size=False,randomize_value=True,assert_crashes_on_load=True,collections=1,maxttl=10,num_indexes=1,pc=20,index_nodes=0,xdcr_collections=10,xdcr_remote_nodes=0,cbas_nodes=0,fts_nodes=6,ops_rate=10000,doc_ops=update,rebl_ops_rate=2000,key_type=RandomKey,mutation_perc=30,replicas=1,clients_per_db=10,skip_cluster_reset=false,skip_setup_cleanup=false,use_https=False,track_failures=False,model=sentence-transformers/paraphrase-MiniLM-L3-v2,fts_index_partition=108,fts_replicas=0,mockVector=true,dim=1536
|
Attachments
Issue Links
- relates to
-
MB-60803 Cluster stuck in a repetitive Rebalancing cycle. 760-2119
- Closed