Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60739

FTS service is crashing with merging err, merging failed: Error in faiss::InvertedLists ret == (n * ails->code_size)' failed: read error in : 304544 != 1994752 (Success)

    XMLWordPrintable

Details

    • Untriaged
    • Linux x86_64
    • 0
    • Unknown

    Description

      Test Steps

      1. Create Magma bucket with 1 scope and 1 collection in addition in _default._default.
      2. Load 100 million documents containing vector data having 2048 dimensions into this bucket.
      3. Create pair of vector indexes with 0 replicas and 9, 12 partitions respectively. Number of FTS nodes in the cluster are 3 hence creating indexes such that number of partitions are a multiple of FTS nodes in the cluster.
      4. Wait to index building to complete.

      Crash

      While index building in ongoing, FTS service is crashing with following error.

      Service 'fts' exited with status 1. Restarting. Messages:
      2024-02-08T21:26:37.369-08:00 [INFO] app_herder: indexing proceeding, indexes: 7, waiting: 3, usage: 105181485712
      2024-02-08T21:26:37.369-08:00 [INFO] app_herder: indexing proceeding, indexes: 7, waiting: 2, usage: 105181485712
      2024-02-08T21:26:37.370-08:00 [INFO] app_herder: indexing proceeding, indexes: 7, waiting: 1, usage: 105181485712
      2024-02-08T21:26:37.370-08:00 [INFO] app_herder: indexing proceeding, indexes: 7, waiting: 0, usage: 105181485712
      2024-02-08T21:55:37.235-08:00 [FATA] scorch AsyncError, path: /opt/couchbase/var/lib/couchbase/data/@fts/sift_bucket.sift_scope.sift_index2_5e483d4339589d6c_b024670b.pindex/store, treating this as fatal, err: merging err: merging failed: Error in faiss::InvertedLists* faiss::read_InvertedLists(IOReader*, int) at /home/couchbase/jenkins/workspace/couchbase-server-unix/faiss/faiss/impl/index_read.cpp:209: Error: 'ret == (n * ails->code_size)' failed: read error in : 304544 != 1994752 (Success), stack dump: /opt/couchbase/var/lib/couchbase/data/@fts/dumps/1707458137.fts.stack.dump.txt -- main.initBleveOptions.func2() at init_bleve.go:113
      

      Ran 1 sample query after this crash, this time FTS service is crashing in same codepath but with cannot allocate memory error.

      Service 'fts' exited with status 1. Restarting. Messages:
      2024-02-08T21:33:40.938-08:00 [WARN] slow-query: index: sift_bucket.sift_scope.sift_index1, username: <ud>Administrator</ud>, query: <ud>{"query":{"match_none":{}},"explain":true,"knn":[{"field":"vector_data","k":200,"vector":[114,31,0,0,0,0,0,10,129,48,5,13,13,0,0,7,0,23,62,41,51,8,2,0,0,44,78,2,1,3,3,0,103,45,14,1,3,8,5,43,129,28,2,8...</ud>, resultset bytes: 5762, duration: 27.087629484s, err: <nil> -- rest.(*QueryHandler).ServeHTTP() at rest_index.go:393
      2024-02-08T22:24:00.614-08:00 [FATA] scorch AsyncError, path: /opt/couchbase/var/lib/couchbase/data/@fts/sift_bucket.sift_scope.sift_index2_5e483d4339589d6c_65cd6ab5.pindex/store, treating this as fatal, err: merging err: merging failed: Error in faiss::InvertedLists* faiss::read_InvertedLists(IOReader*, int) at /home/couchbase/jenkins/workspace/couchbase-server-unix/faiss/faiss/impl/index_read.cpp:209: Error: 'ret == (n * ails->code_size)' failed: read error in : 501920 != 737280 (Cannot allocate memory), stack dump: /opt/couchbase/var/lib/couchbase/data/@fts/dumps/1707459840.fts.stack.dump.txt -- main.initBleveOptions.func2() at init_bleve.go:113
      

      Cluster Configuration

      1 KV node
      3 FTS nodes
      KV memory quota: 200 GiB
      FTS memory quota: 200 GiB
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            sujay.gad Sujay Gad
            sujay.gad Sujay Gad
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty