Uploaded image for project: 'Couchbase Lite'
  1. Couchbase Lite
  2. CBL-5551

Warning about not enouch training point when using PQ with high subquantizer

    XMLWordPrintable

Details

    • Task
    • Resolution: Cannot Reproduce
    • Critical
    • Beryllium
    • Beryllium
    • Vector Search
    • Security Level: Public
    • None
    • 2

    Description

      Test:

          /// 13. TestSubquantizersValidation
          /// Description
          ///     Test that the PQ’s subquantizers value is validated with dimensions correctly.
          ///     The invalid argument exception should be thrown when the vector index is created
          ///     with invalid subquantizers which are not a divisor of the dimensions or zero.
          /// Steps
          ///     1. Copy database words_db.
          ///     2. Create a vector index named "words_index" in _default.words collection.
          ///         - expression: "vector"
          ///         - dimensions: 300
          ///         - centroids: 8
          ///         - PQ(subquantizers: 2, bits: 8)
          ///     3. Check that the index is created without an error returned.
          ///     4. Delete the "words_index".
          ///     5. Repeat steps 2 to 4 by changing the subquantizers to
          ///       3, 4, 5, 6, 10, 12, 15, 20, 25, 30, 50, 60, 75, 100, 150, and 300.
          ///     6. Repeat step 2 to 4 by changing the subquantizers to 0 and 7.
          ///     7. Check that an invalid argument exception is thrown.
          func testSubquantizersValidation() throws {
              let collection = try db.collection(name: "words")!
              var config = VectorIndexConfiguration(expression: "vector", dimensions: 300, centroids: 8)
              config.encoding = .productQuantizer(subquantizers: 2, bits: 8)
              try collection.createIndex(withName: "words_index", config: config)
              
              let names = try collection.indexes()
              XCTAssert(names.contains("words_index"))
              
              // Step 5: Use valid subquantizer values
              for numberOfSubq in  [3, 4, 5, 6, 10, 12, 15, 20, 25, 30, 50, 60, 75, 100, 150, 300] {
                  try collection.deleteIndex(forName: "words_index")
                  config.encoding = .productQuantizer(subquantizers: UInt32(numberOfSubq), bits: 8)
                  try collection.createIndex(withName: "words_index", config: config)
                  
                  // Query:
                  let sql = "select meta().id, word from _default.words where vector_match(words_index, $vector, 20)"
                  let parameters = Parameters()
                  parameters.setValue(dinnerVector, forName: "vector")
                  
                  let q = try self.db.createQuery(sql)
                  q.parameters = parameters
                  
                  let explain = try q.explain() as NSString
                  XCTAssertNotEqual(explain.range(of: "SCAN kv_.words:vector:words_index").location, NSNotFound)
                  
                  let rs = try q.execute()
                  XCTAssertEqual(rs.allResults().count, 20)
                  XCTAssert(checkIndexWasTrained())
              }
              
              // Step 7: Check if exception thrown for wrong subquantizers:
              for numberOfSubq in [0, 7] {
                  try collection.deleteIndex(forName: "words_index")
                  config.encoding = .productQuantizer(subquantizers: UInt32(numberOfSubq), bits: 8)
                  expectExcepion(exception: .invalidArgumentException) {
                      try! collection.createIndex(withName: "words_index", config: config)
                  }
              }
          }
      

      Warning message:

      WARNING clustering 300 points to 256 centroids: please provide at least 9984 training points
      

      However, the strange path is that the index was trained even with that warning.

      Need to check if this PR changes the behavior.
      https://github.com/couchbaselabs/mobile-vector-search/pull/40

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            pasin Pasin Suriyentrakorn
            pasin Pasin Suriyentrakorn
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty