Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Not a Bug
Priority: Critical
Fix Version/s: Morpheus, Cypher
Affects Version/s: Cypher
Component/s: secondary-index
Labels:
- vector-search-gsi
Environment:
8.0.0-1711

Triage:
Untriaged
Story Points:
0
Is this a Regression?:
Unknown

Description

Setup cluster with 1 node and loaded siftsmall dataset.

Create index:

CREATE INDEX vector_index_L2_SQUARED IF NOT EXISTS ON _default(size, brand, vec VECTOR) WITH {'dimension': 128, 'train_list': 10000, 'description': 'IVF,PQ8x8', 'similarity': 'L2_SQUARED', 'scan_nprobes': 3}

and run 10 queries but getting less than 1% recall. Note that recall is caluclated based on the expected ground truth for given vector query from sift

Running 1 threads X 10 queries

Thread 0: Running query#0 with vector [44.0, 11.0, 0.0, 0.0, 6.0] ...

Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100

Execution time: 0:00:00.023998

Recall rate: 0.0%

Thread 0: Running query#1 with vector [38.0, 17.0, 6.0, 0.0, 0.0] ...

Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100

Execution time: 0:00:00.032215

Recall rate: 1.0%

Thread 0: Running query#2 with vector [0.0, 2.0, 2.0, 0.0, 0.0] ...

Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100

Execution time: 0:00:00.022537

Recall rate: 0.0%

Thread 0: Running query#3 with vector [38.0, 26.0, 0.0, 0.0, 0.0] ...

Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100

Execution time: 0:00:00.032465

Recall rate: 0.0%

Thread 0: Running query#4 with vector [36.0, 12.0, 0.0, 0.0, 0.0] ...

Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100

Execution time: 0:00:00.032238

Recall rate: 1.0%

Thread 0: Running query#5 with vector [148.0, 1.0, 0.0, 0.0, 0.0] ...

Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100

Execution time: 0:00:00.034496

Recall rate: 0.0%

Thread 0: Running query#6 with vector [115.0, 13.0, 0.0, 0.0, 0.0] ...

Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100

Execution time: 0:00:00.028614

Recall rate: 0.0%

Thread 0: Running query#7 with vector [3.0, 0.0, 0.0, 0.0, 1.0] ...

Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100

Execution time: 0:00:00.030435

Recall rate: 1.0%

Thread 0: Running query#8 with vector [82.0, 24.0, 0.0, 3.0, 2.0] ...

Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100

Execution time: 0:00:00.017351

Recall rate: 0.0%

Thread 0: Running query#9 with vector [3.0, 0.0, 0.0, 0.0, 0.0] ...

Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100

Execution time: 0:00:00.031369

Recall rate: 0.0%

Ran # 10 (1x10) queries in 0.8 seconds

Query per seconds: 12.54

with KNN and i can get 100% recall rate. Let me know if you need more details or state of index scan at this time.

Here is script i use to run test above: https://github.com/couchbase/testrunner/blob/master/scripts/vector_thread.py you would need to change based on your cluster and also change the query to use ANN (right now it uses KNN)

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

collectinfo-2024-07-26T172112-ns_1@cb.local.zip
22.72 MB
26/Jul/24 10:27 AM
image-2024-07-22-15-49-40-104.png
84 kB
22/Jul/24 3:49 PM
image-2024-07-26-09-18-41-373.png
118 kB
26/Jul/24 9:18 AM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-62779
#	Subject	Branch	Project	Status	CR	V
212964,2	MB-62779: Send elements in ascending order from heap	unstable	indexing	Status: MERGED	+2	+1

Activity

People

Assignee:: Pierre Regazzoni

Reporter:: Pierre Regazzoni

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 17/Jul/24 9:19 PM

Updated:: 28/Aug/24 10:07 AM

Resolved:: 28/Aug/24 10:05 AM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

MB-62779: Send elements in ascending order from heap: Gerrit Review:

ANN query returning with low recall

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty