Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62779

ANN query returning with low recall

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown

    Description

      Setup cluster with 1 node and loaded siftsmall dataset.

      Create index: 

      CREATE INDEX vector_index_L2_SQUARED IF NOT EXISTS ON _default(size, brand, vec VECTOR) WITH {'dimension': 128, 'train_list': 10000, 'description': 'IVF,PQ8x8', 'similarity': 'L2_SQUARED', 'scan_nprobes': 3} 

      and run 10 queries but getting less than 1% recall. Note that recall is caluclated based on the expected ground truth for given vector query from sift

      Running 1 threads X 10 queries
       
       
      Thread 0: Running query#0 with vector [44.0, 11.0, 0.0, 0.0, 6.0] ...
      Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100
      Execution time: 0:00:00.023998
      Recall rate: 0.0%
      Thread 0: Running query#1 with vector [38.0, 17.0, 6.0, 0.0, 0.0] ...
      Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100
      Execution time: 0:00:00.032215
      Recall rate: 1.0%
      Thread 0: Running query#2 with vector [0.0, 2.0, 2.0, 0.0, 0.0] ...
      Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100
      Execution time: 0:00:00.022537
      Recall rate: 0.0%
      Thread 0: Running query#3 with vector [38.0, 26.0, 0.0, 0.0, 0.0] ...
      Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100
      Execution time: 0:00:00.032465
      Recall rate: 0.0%
      Thread 0: Running query#4 with vector [36.0, 12.0, 0.0, 0.0, 0.0] ...
      Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100
      Execution time: 0:00:00.032238
      Recall rate: 1.0%
      Thread 0: Running query#5 with vector [148.0, 1.0, 0.0, 0.0, 0.0] ...
      Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100
      Execution time: 0:00:00.034496
      Recall rate: 0.0%
      Thread 0: Running query#6 with vector [115.0, 13.0, 0.0, 0.0, 0.0] ...
      Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100
      Execution time: 0:00:00.028614
      Recall rate: 0.0%
      Thread 0: Running query#7 with vector [3.0, 0.0, 0.0, 0.0, 1.0] ...
      Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100
      Execution time: 0:00:00.030435
      Recall rate: 1.0%
      Thread 0: Running query#8 with vector [82.0, 24.0, 0.0, 3.0, 2.0] ...
      Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100
      Execution time: 0:00:00.017351
      Recall rate: 0.0%
      Thread 0: Running query#9 with vector [3.0, 0.0, 0.0, 0.0, 0.0] ...
      Query: SELECT RAW id FROM _default WHERE size IN $size AND brand IN $brand ORDER BY ANN(vec, $qvec, "L2_SQUARED") ASC LIMIT 100
      Execution time: 0:00:00.031369
      Recall rate: 0.0%
       
       
      Ran # 10 (1x10) queries in 0.8 seconds
      Query per seconds: 12.54 

       

      with KNN and i can get 100% recall rate. Let me know if you need more details or state of index scan at this time.

       

      Here is script i use to run test above: https://github.com/couchbase/testrunner/blob/master/scripts/vector_thread.py you would need to change based on your cluster and also change the query to use ANN (right now it uses KNN)

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            pierre.regazzoni Pierre Regazzoni
            pierre.regazzoni Pierre Regazzoni
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty