Details
-
Bug
-
Resolution: Won't Fix
-
Major
-
7.6.0
-
Source build from Feb 1, 2024
repo init -u ssh://github.com/couchbase/manifest -g all -m couchbase-server/trinity.xml
repo sync --force-sync -j24
make -j28 EXTRA_CMAKE_OPTIONS="-D PRODUCT_VERSION=7.6.0-20240131"
Couchbase Server 7.6.0-20240131
"memoryTotal": 67442626560,
"memoryFree": 60818214912,
"mcdMemoryReserved": 51454,
"mcdMemoryAllocated": 51454,
"memoryQuota": 18000,
"queryMemoryQuota": 0,
"indexMemoryQuota": 3000,
"ftsMemoryQuota": 7000,
"cbasMemoryQuota": 6255,
"eventingMemoryQuota": 256
uname -a
Linux couch01 4.19.0-25-amd64 #1 SMP Debian 4.19.289-2 (2023-08-08) x86_64 GNU/Linux
nproc
24
cat /proc/meminfo | grep Mem
MemTotal: 65861940 kB
MemFree: 33887504 kB
MemAvailable: 59410952 kB
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 86
Model name: Intel(R) Xeon(R) CPU D-1567 @ 2.10GHz
Stepping: 4
CPU MHz: 803.495
CPU max MHz: 2700.0000
CPU min MHz: 800.0000
BogoMIPS: 4200.06
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 18432K
NUMA node0 CPU(s): 0-23
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xto
pology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dno
wprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap
intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d
Source build from Feb 1, 2024 repo init -u ssh://github.com/couchbase/manifest -g all -m couchbase-server/trinity.xml repo sync --force-sync -j24 make -j28 EXTRA_CMAKE_OPTIONS="-D PRODUCT_VERSION=7.6.0-20240131" Couchbase Server 7.6.0-20240131 "memoryTotal": 67442626560, "memoryFree": 60818214912, "mcdMemoryReserved": 51454, "mcdMemoryAllocated": 51454, "memoryQuota": 18000, "queryMemoryQuota": 0, "indexMemoryQuota": 3000, "ftsMemoryQuota": 7000, "cbasMemoryQuota": 6255, "eventingMemoryQuota": 256 uname -a Linux couch01 4.19.0-25-amd64 #1 SMP Debian 4.19.289-2 (2023-08-08) x86_64 GNU/Linux nproc 24 cat /proc/meminfo | grep Mem MemTotal: 65861940 kB MemFree: 33887504 kB MemAvailable: 59410952 kB lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 24 On-line CPU(s) list: 0-23 Thread(s) per core: 2 Core(s) per socket: 12 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 86 Model name: Intel(R) Xeon(R) CPU D-1567 @ 2.10GHz Stepping: 4 CPU MHz: 803.495 CPU max MHz: 2700.0000 CPU min MHz: 800.0000 BogoMIPS: 4200.06 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 18432K NUMA node0 CPU(s): 0-23 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xto pology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dno wprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d
-
Untriaged
-
0
-
Unknown
Description
I was doing sizing exercises for vector and noticed that one of two partitions was getting the lions share of data (on a 4M doc load at about 1.7M) - this is being done on a single node.
I have a script that creates and deletes indexes of the same name as it changes the number of partitions.
Note after each run the search index is deleted and a memory quota change is made to cbft (Search) service to force a new process and bootstrap (to hopefully clean up any left over cruft in the file system). Then the same index name is created but with a different partition count.
On the second pass going from partition count == 1 to partition count == 2 I noticed unexpected poor performance like it was still at partition count of 1
Looking at the file system I see two paritions as expected in /mnt_xfs/install/var/lib/couchbase/index/@fts/
- target._default.vindex_78f5e52ead50d51a_527bf675.pindex
- target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex
But the subdirectories store are unbalanced in fact one of them has 13 files while the other has 313 files, so far I have processed 1.7M vectors!
- target._default.vindex_78f5e52ead50d51a_527bf675.pindex/store
- target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex/store
linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ cd /mnt_xfs/install/var/lib/couchbase/index/@fts/
|
|
linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ ls -1 target._default.vindex_78f5e52ead50d51a_527bf675.pindex/store/* | wc -l
|
13
|
|
linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ du -sk target._default.vindex_78f5e52ead50d51a_527bf675.pindex/store/
|
142480 target._default.vindex_78f5e52ead50d51a_527bf675.pindex/store/
|
|
linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ ls -1 target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex/store/* | wc -l
|
313
|
|
linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ du -sk target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex/store/
|
4584424 target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex/store/
|
|
|
linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ bc -q
|
scale=2
|
313/13
|
24.07
|
4584424/142480
|
32.17
|
So we see form above one partiton has 24X the files and 32X the data
Several minutes later at about 2.5M docs loaded to the index there is some improvment.
linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ cd /mnt_xfs/install/var/lib/couchbase/index/@fts/
|
|
linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ du -sk target._default.vindex_78f5e52ead50d51a_*/store
|
2486812 target._default.vindex_78f5e52ead50d51a_527bf675.pindex/store
|
4255596 target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex/store
|
|
linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ ls -1 target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex/store | wc -l
|
291
|
|
linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ ls -1 target._default.vindex_78f5e52ead50d51a_527bf675.pindex/store | wc -l
|
86
|
|
|
linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ ls -1
|
cbft.uuid
|
dumps
|
planPIndexes
|
target._default.vindex_78f5e52ead50d51a_527bf675.pindex
|
target._default.vindex_78f5e52ead50d51a_b6d0c5f9.pindex
|
linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$ date
|
Sat 03 Feb 2024 02:20:43 PM PST
|
linuxbrew@couch01:/mnt_xfs/install/var/lib/couchbase/index/@fts$
|
I can not understand how the initial index was so unbalanced after 1.7M docs with keys like K0000000001 to K0001700000 nor why things start to get a bit more balanced as we load more data with keys up to K0002500000.
From some of my collected stats.
e 1706996790, d 20240203134631, docs 1699926, docs/sec. 168.000, s1 112.500, s2 106.000, ram 22751151672
|
e 1706996891, d 20240203134811, docs 1711526, docs/sec. 116.000, s1 122.500, s2 114.500, ram 14128416440
|
e 1706996992, d 20240203134952, docs 1737467, docs/sec. 259.410, s1 160.352, s2 133.676, ram 22590235731
|
e 1706997092, d 20240203135132, docs 1748062, docs/sec. 105.950, s1 162.340, s2 130.170, ram 25536062788
|
|
HERE WE HAD A LARGE FILE COUNT (13 to 313)
|
AND DATA ON DISK IMBALANCES (32X) LOW DOCS/SEC
|
|
e 1706997193, d 20240203135313, docs 1757262, docs/sec. 92.000, s1 143.340, s2 127.920, ram 21376892388
|
e 1706997293, d 20240203135453, docs 1767862, docs/sec. 106.000, s1 140.840, s2 131.670, ram 20980636067
|
e 1706997394, d 20240203135634, docs 1776662, docs/sec. 88.000, s1 97.987, s2 129.170, ram 20841044387
|
e 1706997495, d 20240203135815, docs 1790262, docs/sec. 136.000, s1 105.500, s2 133.920, ram 20044913059
|
e 1706997595, d 20240203135955, docs 1802062, docs/sec. 118.000, s1 112.000, s2 127.670, ram 19599861013
|
e 1706997696, d 20240203140136, docs 1809662, docs/sec. 76.000, s1 104.500, s2 122.670, ram 21651498261
|
e 1706997796, d 20240203140316, docs 1821462, docs/sec. 118.000, s1 112.000, s2 104.993, ram 22134710804
|
e 1706997897, d 20240203140457, docs 1830262, docs/sec. 88.000, s1 100.000, s2 102.750, ram 13453054484
|
e 1706997998, d 20240203140638, docs 1843862, docs/sec. 136.000, s1 104.500, s2 108.250, ram 1035247159
|
e 1706998098, d 20240203140818, docs 1927398, docs/sec. 835.360, s1 294.340, s2 199.420, ram 413807700
|
e 1706998198, d 20240203140958, docs 2023798, docs/sec. 964.000, s1 505.840, s2 308.920, ram 3307673551
|
e 1706998299, d 20240203141139, docs 2090198, docs/sec. 664.000, s1 649.840, s2 374.920, ram 667006985
|
e 1706998399, d 20240203141319, docs 2194798, docs/sec. 1046.000, s1 877.340, s2 490.920, ram 353070821
|
e 1706998499, d 20240203141459, docs 2264398, docs/sec. 696.000, s1 842.500, s2 568.420, ram 394138605
|
e 1706998600, d 20240203141640, docs 2368456, docs/sec. 1040.580, s1 861.645, s2 683.742, ram 404238814
|
e 1706998700, d 20240203141820, docs 2455604, docs/sec. 871.480, s1 913.515, s2 781.677, ram 401114909
|
e 1706998800, d 20240203142000, docs 2513846, docs/sec. 582.420, s1 797.620, s2 837.480, ram 765946136
|
|
HERE WE HAD A LARGE FILE COUNT IMBALANCE BUT LESS (86 to 281) AND
|
DATA ON DISK IS ONLY IMBALANCED BY (1.7X) NOTE DOCS/SEC IMPROVES A LOT
|
|
e 1706998900, d 20240203142140, docs 2619846, docs/sec. 1060.000, s1 888.620, s2 865.560, ram 435238939
|
e 1706999001, d 20240203142321, docs 2697846, docs/sec. 780.000, s1 823.475, s2 842.560, ram 443060108
|
e 1706999101, d 20240203142501, docs 2789446, docs/sec. 916.000, s1 834.605, s2 874.060, ram 2838588377
|
e 1706999201, d 20240203142641, docs 2872632, docs/sec. 831.860, s1 896.965, s2 847.292, ram 527574903
|
e 1706999302, d 20240203142822, docs 2955182, docs/sec. 825.500, s1 838.340, s2 863.480, ram 699138391
|
e 1706999402, d 20240203143002, docs 3039187, docs/sec. 840.050, s1 853.352, s2 838.413, ram 3361899931
|
e 1706999502, d 20240203143142, docs 3115398, docs/sec. 762.110, s1 814.880, s2 824.742, ram 437645612
|
e 1706999603, d 20240203143323, docs 3206072, docs/sec. 906.740, s1 833.600, s2 865.282, ram 1920513359
|
This seems to explain odd performance variations when I previously tried to go from one (1) partition and build an index with with two (2) partitions.
A "cbcollect_info_issue_mb_unbalanced.zip" at about 1.8M docs loaded is attached.