Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60836

[1536dim]: FTS rebalance failed: badmatch

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • 7.6.0
    • 7.6.0
    • fts
    • Enterprise Edition 7.6.0 build 2144
    • Untriaged
    • 0
    • Unknown

    Description

      Test Config:

      • 6 FTS nodes
      • 1 FTS index
      • 108 partitions per index - 18 partition per index per node
      • 0 Replicas

      Dataset

      • 5 Million KV data in 1 collections
      • 5 Million docs in FTS
      • 1536 Vector Dimension
      • 1 query thread shooting FTS vector queries on fts indexes randomly
      • 2000 ops - pure upserts

      Steps:

      1. Start 1 query thread shooting FTS vector queries on fts indexes randomly
      2. Start KV load of 2000 ops pure upserts
      3. Add 1 fts node and rebalance - passed
      4. Remove 1 fts node and rebalance - passed
      5. Remove 1 and add 2 fts node and rebalance - passed
      6. Swap rebalance 1 fts node and rebalance - passed
      7. Failover 1 node and rebalance out - failed

        Starting rebalance, KeepNodes = ['ns_1@172.23.107.76','ns_1@172.23.107.97',
        'ns_1@172.23.107.232','ns_1@172.23.107.240',
        'ns_1@172.23.121.78','ns_1@172.23.107.220',
        'ns_1@172.23.107.237'], EjectNodes = [], Failed over and being ejected nodes = ['ns_1@172.23.107.221']; no delta recovery nodes; Operation Id = d9f411071e4c0a2ac3466588b9c36668
         
        Rebalance exited with reason {service_rebalance_failed,fts,
        {agent_died,<35628.18378.91>,
        {linked_process_died,<35628.10185.95>,
        {'ns_1@172.23.107.232',
        {{badmatch,
        {false,
        {topology,[],
        [<<"075fa765b644a9c0d5438e7fc737d4a9">>,
        <<"5a8daa275cf24b72c6f0c480d6284c76">>,
        <<"62a8ed8bfe6b8fcc7e96ac47bc1baec0">>,
        <<"aa221da8e3d2a480c22dc6ab7aa90b1b">>,
        <<"c4bc1b5a5f5e9c99b6f52a0a980a1ed8">>,
        <<"ea1d3f6a7acebe39b838cfd45631a2fc">>],
        false,[]},
        {topology,[],
        [<<"075fa765b644a9c0d5438e7fc737d4a9">>,
        <<"5a8daa275cf24b72c6f0c480d6284c76">>,
        <<"62a8ed8bfe6b8fcc7e96ac47bc1baec0">>,
        <<"aa221da8e3d2a480c22dc6ab7aa90b1b">>,
        <<"c4bc1b5a5f5e9c99b6f52a0a980a1ed8">>,
        <<"ea1d3f6a7acebe39b838cfd45631a2fc">>],
        true,[]}}},
        [{service_agent,long_poll_worker_loop,5,
        [{file,"src/service_agent.erl"},
        {line,750}]},
        {proc_lib,init_p,3,
        [{file,"proc_lib.erl"},{line,225}]}]}}}}}.
        Rebalance Operation Id = d9f411071e4c0a2ac3466588b9c36668
        

      As a side note:

      [16 Feb, 2024, 9:48:41 AM] - Warning: On node 172.23.107.232 system memory use is 85.46% of total available memory, above the warning threshold of 85%.
      [16 Feb, 2024, 9:48:49 AM] - CRITICAL: On node 172.23.107.97 system memory use is 90.30% of total available memory, above the critical threshold of 90%.
      

      Hi Abhi Dangeti, could you please help understand why the memory is shooting up even after the test is finished/failed/no workload. Still merging processes are running?

      QE Test

      guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P args=-i /tmp/magma_temp_job.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False -t aGoodDoctor.vectorSearch.VectorVolume.Murphy.ClusterOpsVolume,nodes_init=1,graceful=True,skip_cleanup=True,num_items=5000000,num_buckets=1,bucket_names=GleamBook,doc_size=1024,bucket_type=membase,eviction_policy=fullEviction,iterations=2,batch_size=1000,sdk_timeout=60,log_level=debug,infra_log_level=debug,rerun=False,skip_cleanup=True,key_size=18,randomize_doc_size=False,randomize_value=True,assert_crashes_on_load=True,collections=1,maxttl=10,num_indexes=1,pc=20,index_nodes=0,xdcr_collections=10,xdcr_remote_nodes=0,cbas_nodes=0,fts_nodes=6,ops_rate=10000,doc_ops=update,rebl_ops_rate=2000,key_type=RandomKey,mutation_perc=30,replicas=1,clients_per_db=10,skip_cluster_reset=false,skip_setup_cleanup=false,use_https=False,track_failures=False,model=sentence-transformers/paraphrase-MiniLM-L3-v2,fts_index_partition=108,fts_replicas=0,mockVector=true,dim=1536
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              abhinav Abhi Dangeti
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty