Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60657

Search partitions seem poorly balanced this impacts throughput

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • 7.6.0
    • 7.6.0
    • fts
    • Untriaged
    • 0
    • Unknown

    Description

      I was doing sizing exercises for vector and noticed that one of two partitions was getting the lions share of data (on a 4M doc load at about 1.7M) - this is being done on a single node.

      I have a script that creates and deletes indexes of the same name as it changes the number of partitions. 

      Note after each run the search index is deleted and a memory quota change is made to cbft (Search) service to force a new process and bootstrap (to hopefully clean up any left over cruft in the file system).  Then the same index name is created but with a different partition count.

      On the second pass going from partition count == 1 to partition count == 2 I noticed unexpected poor performance like it was still at partition count of 1

      Looking at the file system I see two paritions as expected in /mnt_xfs/install/var/lib/couchbase/index/@fts/

      1. target._default.vindex_78f5e52ead50d51a_527bf675.pindex
      2. target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex 

      But the subdirectories store are unbalanced in fact one of them has 13 files while the other has 313 files, so far I have processed 1.7M vectors!

      1. target._default.vindex_78f5e52ead50d51a_527bf675.pindex/store
      2. target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex/store

       

      linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ cd /mnt_xfs/install/var/lib/couchbase/index/@fts/
       
      linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ ls -1 target._default.vindex_78f5e52ead50d51a_527bf675.pindex/store/* | wc -l
      13
       
      linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ du -sk target._default.vindex_78f5e52ead50d51a_527bf675.pindex/store/
      142480    target._default.vindex_78f5e52ead50d51a_527bf675.pindex/store/
       
      linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ ls -1 target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex/store/* | wc -l
      313
       
      linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ du -sk target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex/store/
      4584424    target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex/store/
       
       
      linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ bc -q
      scale=2
      313/13
      24.07
      4584424/142480
      32.17
      

      So we see form above one partiton has 24X the files and 32X the data

      Several minutes later at about 2.5M docs loaded to the index there is some improvment.

      linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ cd /mnt_xfs/install/var/lib/couchbase/index/@fts/
       
      linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ du -sk target._default.vindex_78f5e52ead50d51a_*/store
      2486812    target._default.vindex_78f5e52ead50d51a_527bf675.pindex/store
      4255596    target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex/store
       
      linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ ls -1 target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex/store | wc -l
      291
       
      linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ ls -1 target._default.vindex_78f5e52ead50d51a_527bf675.pindex/store | wc -l
      86
       
       
      linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ ls -1
      cbft.uuid
      dumps
      planPIndexes
      target._default.vindex_78f5e52ead50d51a_527bf675.pindex
      target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex
      linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ date
      Sat 03 Feb 2024 02:20:43 PM PST
      linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ 

      I can not understand how the initial index was so unbalanced after 1.7M docs with keys like K0000000001 to K0001700000 nor why things start to get a bit more balanced as we load more data with keys up to K0002500000.

      From some of my collected stats.

      e 1706996790, d 20240203134631, docs 1699926, docs/sec. 168.000, s1 112.500, s2 106.000, ram 22751151672
      e 1706996891, d 20240203134811, docs 1711526, docs/sec. 116.000, s1 122.500, s2 114.500, ram 14128416440
      e 1706996992, d 20240203134952, docs 1737467, docs/sec. 259.410, s1 160.352, s2 133.676, ram 22590235731
      e 1706997092, d 20240203135132, docs 1748062, docs/sec. 105.950, s1 162.340, s2 130.170, ram 25536062788
       
      HERE WE HAD A LARGE FILE COUNT (13 to 313) 
      AND DATA ON DISK IMBALANCES (32X) LOW DOCS/SEC
       
      e 1706997193, d 20240203135313, docs 1757262, docs/sec. 92.000, s1 143.340, s2 127.920, ram 21376892388
      e 1706997293, d 20240203135453, docs 1767862, docs/sec. 106.000, s1 140.840, s2 131.670, ram 20980636067
      e 1706997394, d 20240203135634, docs 1776662, docs/sec. 88.000, s1 97.987, s2 129.170, ram 20841044387
      e 1706997495, d 20240203135815, docs 1790262, docs/sec. 136.000, s1 105.500, s2 133.920, ram 20044913059
      e 1706997595, d 20240203135955, docs 1802062, docs/sec. 118.000, s1 112.000, s2 127.670, ram 19599861013
      e 1706997696, d 20240203140136, docs 1809662, docs/sec. 76.000, s1 104.500, s2 122.670, ram 21651498261
      e 1706997796, d 20240203140316, docs 1821462, docs/sec. 118.000, s1 112.000, s2 104.993, ram 22134710804
      e 1706997897, d 20240203140457, docs 1830262, docs/sec. 88.000, s1 100.000, s2 102.750, ram 13453054484
      e 1706997998, d 20240203140638, docs 1843862, docs/sec. 136.000, s1 104.500, s2 108.250, ram 1035247159
      e 1706998098, d 20240203140818, docs 1927398, docs/sec. 835.360, s1 294.340, s2 199.420, ram 413807700
      e 1706998198, d 20240203140958, docs 2023798, docs/sec. 964.000, s1 505.840, s2 308.920, ram 3307673551
      e 1706998299, d 20240203141139, docs 2090198, docs/sec. 664.000, s1 649.840, s2 374.920, ram 667006985
      e 1706998399, d 20240203141319, docs 2194798, docs/sec. 1046.000, s1 877.340, s2 490.920, ram 353070821
      e 1706998499, d 20240203141459, docs 2264398, docs/sec. 696.000, s1 842.500, s2 568.420, ram 394138605
      e 1706998600, d 20240203141640, docs 2368456, docs/sec. 1040.580, s1 861.645, s2 683.742, ram 404238814
      e 1706998700, d 20240203141820, docs 2455604, docs/sec. 871.480, s1 913.515, s2 781.677, ram 401114909
      e 1706998800, d 20240203142000, docs 2513846, docs/sec. 582.420, s1 797.620, s2 837.480, ram 765946136
       
      HERE WE HAD A LARGE FILE COUNT IMBALANCE BUT LESS (86 to 281) AND 
      DATA ON DISK IS ONLY IMBALANCED BY (1.7X) NOTE DOCS/SEC IMPROVES A LOT
       
      e 1706998900, d 20240203142140, docs 2619846, docs/sec. 1060.000, s1 888.620, s2 865.560, ram 435238939
      e 1706999001, d 20240203142321, docs 2697846, docs/sec. 780.000, s1 823.475, s2 842.560, ram 443060108
      e 1706999101, d 20240203142501, docs 2789446, docs/sec. 916.000, s1 834.605, s2 874.060, ram 2838588377
      e 1706999201, d 20240203142641, docs 2872632, docs/sec. 831.860, s1 896.965, s2 847.292, ram 527574903
      e 1706999302, d 20240203142822, docs 2955182, docs/sec. 825.500, s1 838.340, s2 863.480, ram 699138391
      e 1706999402, d 20240203143002, docs 3039187, docs/sec. 840.050, s1 853.352, s2 838.413, ram 3361899931
      e 1706999502, d 20240203143142, docs 3115398, docs/sec. 762.110, s1 814.880, s2 824.742, ram 437645612
      e 1706999603, d 20240203143323, docs 3206072, docs/sec. 906.740, s1 833.600, s2 865.282, ram 1920513359
      

      This seems to explain odd performance variations when I previously tried to go from one (1) partition and build an index with with two (2) partitions.

      A "cbcollect_info_issue_mb_unbalanced.zip" at about 1.8M docs loaded is attached.

      Attachments

        1. other.txt
          45 kB
          Jon Strabala
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            jon.strabala Jon Strabala
            jon.strabala Jon Strabala
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty