Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: Cypher
Affects Version/s: Cypher
Component/s: secondary-index
Labels:
- volume-test
Environment:
Enterprise Edition 7.7.0 build 1079

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:
http://supportal.couchbase.com/snapshot/f35644066a9f46e6579eb32561931aef::0
Story Points:
0
Is this a Regression?:
Unknown

Description

Dataset:

t1M = {"vector": None, "size": [5, 6, 7, 8, 9, 10], "color": "green", "brand": "Nike", "country": "USA", "category": "Shoes", "type": "Apparel", "avg_review": 1}

t2M = {"vector": None, "size": [6, 7, 8, 9, 10], "color": "green", "brand": "Nike", "country": "USA", "category": "Shoes", "type": "Apparel", "avg_review": 1.5}

t5M = {"vector": None, "size": [7, 8, 9, 10], "color": "red", "brand": "Nike", "country": "USA", "category": "Shoes", "type": "Apparel", "avg_review": 2}

t10M = {"vector": None, "size": [8, 9, 10], "color": "red", "brand": "Adidas", "country": "USA", "category": "Shoes", "type": "Apparel", "avg_review": 2.5}

t20M = {"vector": None, "size": [9, 10], "color": "red", "brand": "Adidas", "country": "Canada", "category": "Shoes", "type": "Apparel", "avg_review": 3}

t50M = {"vector": None, "size": [10], "color": "red", "brand": "Adidas", "country": "Canada", "category": "Jeans", "type": "Apparel", "avg_review": 3.5}

t100M = {"vector": None, "color": "red", "brand": "Adidas", "country": "Canada", "category": "Jeans", "type": "Denim", "avg_review": 4}

t200M = {"vector": None, "color": "red", "brand": "Adidas", "country": "Canada", "category": "Jeans", "type": "Denim", "avg_review": 4.5}

t500M = {"vector": None, "color": "red", "brand": "Adidas", "country": "Canada", "category": "Jeans", "type": "Denim", "avg_review": 5}

t1000M = {"vector": None, "color": "red", "brand": "Adidas", "country": "Canada", "category": "Jeans", "type": "Denim", "avg_review": 10}

Steps:

Based on the above template started loading the data into the bucket.

Where there are around 100M items create the below index:

CREATE INDEX `bigann2M` ON `_default`(`color`,`embedding` VECTOR) PARTITION BY HASH((META().`id`)) where color="green" WITH { "defer_build":TRUE, "num_partition":8, "dimension":128, "similarity":"L2_SQUARED", "description":"IVF,PQ32x8", "scan_nprobes":3};

Index created properly

When the load is around 250M created the below index:

CREATE INDEX `bigann2MSQ8` ON `_default`(`color`,`embedding` VECTOR) PARTITION BY HASH((META().`id`)) where color="green" WITH { "defer_build":TRUE, "num_partition":8, "dimension":128, "similarity":"L2_SQUARED", "description":"IVF,SQ8", "scan_nprobes":3};

Index creation failed:

2024-08-16T16:10:39.856-07:00 [Info] Indexer::initateTraining Starting training for vector index with instId: 10405409891669675141, partnId: 42024-08-16T16:10:39.861-07:00 [Info] NewCodebookIVFSQ: Initialized codebook with dimension: 128, range: SQ8, nlist: 15263, metric: L2, useCosine: false

2024-08-16T16:10:39.862-07:00 [Error] Indexer::initiateTraining error observed during training phase of codebook for instId: 10405409891669675141, partnId: 4, err: Error in void faiss::Clustering::train_encoded(faiss::idx_t, const uint8_t*, const faiss::Index*, faiss::Index&, const float*) at /home/couchbase/jenkins/workspace/cbdeps-platform-build/faiss/faiss/Clustering.cpp:276: Error: 'nx >= k' failed: Number of training points (6627) should be at least as large as number of clusters (15263)

Attachments

Issue Links

relates to

MB-62657 Retry training for partial index if not enough qualifying documents

Open

MB-63251 Determine number of centroids for partial index

Open

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Ritesh Agarwal

Reporter:: Ritesh Agarwal

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 16/Aug/24 4:53 PM

Updated:: 21/Aug/24 11:16 AM

Gerrit Reviews

There are no open Gerrit changes

[Sift 1B Dataset]: Indexer::initiateTraining error - Error: 'nx >= k' failed: Number of training points (6627) should be at least as large as number of clusters (15263)

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty