Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-63187

[Sift 1B Dataset]: Indexer::initiateTraining error - Error: 'nx >= k' failed: Number of training points (6627) should be at least as large as number of clusters (15263)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • Cypher
    • Cypher
    • secondary-index
    • Enterprise Edition 7.7.0 build 1079

    Description

      Dataset:

      t1M = {"vector": None, "size": [5, 6, 7, 8, 9, 10], "color": "green", "brand": "Nike", "country": "USA", "category": "Shoes", "type": "Apparel", "avg_review": 1}
      t2M = {"vector": None, "size": [6, 7, 8, 9, 10], "color": "green", "brand": "Nike", "country": "USA", "category": "Shoes", "type": "Apparel", "avg_review": 1.5}
      t5M = {"vector": None, "size": [7, 8, 9, 10], "color": "red", "brand": "Nike", "country": "USA", "category": "Shoes", "type": "Apparel", "avg_review": 2}
      t10M = {"vector": None, "size": [8, 9, 10], "color": "red", "brand": "Adidas", "country": "USA", "category": "Shoes", "type": "Apparel", "avg_review": 2.5}
      t20M = {"vector": None, "size": [9, 10], "color": "red", "brand": "Adidas", "country": "Canada", "category": "Shoes", "type": "Apparel", "avg_review": 3}
      t50M = {"vector": None, "size": [10], "color": "red", "brand": "Adidas", "country": "Canada", "category": "Jeans", "type": "Apparel", "avg_review": 3.5}
      t100M = {"vector": None, "color": "red", "brand": "Adidas", "country": "Canada", "category": "Jeans", "type": "Denim", "avg_review": 4}
      t200M = {"vector": None, "color": "red", "brand": "Adidas", "country": "Canada", "category": "Jeans", "type": "Denim", "avg_review": 4.5}
      t500M = {"vector": None, "color": "red", "brand": "Adidas", "country": "Canada", "category": "Jeans", "type": "Denim", "avg_review": 5}
      t1000M = {"vector": None, "color": "red", "brand": "Adidas", "country": "Canada", "category": "Jeans", "type": "Denim", "avg_review": 10}
      

      Steps:

      1. Based on the above template started loading the data into the bucket.
      2. Where there are around 100M items create the below index:

        CREATE INDEX `bigann2M` ON `_default`(`color`,`embedding` VECTOR) PARTITION BY HASH((META().`id`)) where color="green" WITH { "defer_build":TRUE, "num_partition":8, "dimension":128, "similarity":"L2_SQUARED", "description":"IVF,PQ32x8", "scan_nprobes":3};
        

      3. Index created properly
      4. When the load is around 250M created the below index:

        CREATE INDEX `bigann2MSQ8` ON `_default`(`color`,`embedding` VECTOR) PARTITION BY HASH((META().`id`)) where color="green" WITH { "defer_build":TRUE, "num_partition":8, "dimension":128, "similarity":"L2_SQUARED", "description":"IVF,SQ8", "scan_nprobes":3};
        

      5. Index creation failed:

        2024-08-16T16:10:39.856-07:00 [Info] Indexer::initateTraining Starting training for vector index with instId: 10405409891669675141, partnId: 42024-08-16T16:10:39.861-07:00 [Info] NewCodebookIVFSQ: Initialized codebook with dimension: 128, range: SQ8, nlist: 15263, metric: L2, useCosine: false
        2024-08-16T16:10:39.862-07:00 [Error] Indexer::initiateTraining error observed during training phase of codebook for instId: 10405409891669675141, partnId: 4, err: Error in void faiss::Clustering::train_encoded(faiss::idx_t, const uint8_t*, const faiss::Index*, faiss::Index&, const float*) at /home/couchbase/jenkins/workspace/cbdeps-platform-build/faiss/faiss/Clustering.cpp:276: Error: 'nx >= k' failed: Number of training points (6627) should be at least as large as number of clusters (15263)
        

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ritesh.agarwal Ritesh Agarwal
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty