Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60520

FTS service is crashing with cannot allocate memory error while running knn queries

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Blocker
    • 7.6.0
    • 7.6.0
    • fts
    • Enterprise Edition 7.6.0 build 2044
    • Untriaged
    • Linux x86_64
    • 0
    • Unknown

    Description

      Steps

      1. Load 100 million documents having vector data into a bucket.
      2. Create a pair of vector indexes on the bucket present in the cluster.
      3. Start knn query workload while index processing is going.
      4. Swap rebalance pair of FTS nodes.
      5. Rebalance completed successfully.
      6. Continue knn query workload.

      Observation

      KNN queries are failing with context deadline exceeded error.

      2024-01-24T00:41:47.936-08:00 [WARN] grpc_client: Query() returned error from host: 172.23.110.128:9130, err: grpc_client: query got status code: 504, resp: &bleve.SearchResult{Status:(*bleve.SearchStatus)(0xc20c3e76c0), Request:(*bleve.SearchRequest)(0xc2b914fb30), Hits:search.DocumentMatchCollection(nil), Total:0x0, Cost:0x0, MaxScore:0, Took:0, Facets:search.FacetResults(nil)}, err: rpc error: code = DeadlineExceeded desc = context deadline exceeded -- cbft.(*GrpcClient).SearchInContext.func1() at grpc_client.go:153
      

      There are multiple instances of cannot allocate memory error in fts logs.
      Memory utilisation on all 3 FTS nodes in the cluster is less than 50% throughout the duration of the test.

      grep "cannot allocate memory" fts-172.23.110.66.log
      2024/01/24 00:37:18 unable to load snapshot, failed to load segment: error opening bolt segment: cannot allocate memory, continuing
      2024/01/24 00:37:22 unable to load snapshot, failed to load segment: error opening bolt segment: cannot allocate memory, continuing
      2024/01/24 00:37:51 unable to load snapshot, failed to load segment: error opening bolt segment: cannot allocate memory, continuing
      2024/01/24 00:38:01 unable to load snapshot, failed to load segment: error opening bolt segment: cannot allocate memory, continuing
      2024/01/24 00:38:38 unable to load snapshot, failed to load segment: error opening bolt segment: cannot allocate memory, continuing
      2024/01/24 00:38:41 unable to load snapshot, failed to load segment: error opening bolt segment: cannot allocate memory, continuing
      2024/01/24 00:39:28 unable to load snapshot, failed to load segment: error opening bolt segment: cannot allocate memory, continuing
      2024/01/24 00:39:38 unable to load snapshot, failed to load segment: error opening bolt segment: cannot allocate memory, continuing
      2024/01/24 00:40:02 unable to load snapshot, failed to load segment: error opening bolt segment: cannot allocate memory, continuing
      2024/01/24 00:40:05 unable to load snapshot, failed to load segment: error opening bolt segment: cannot allocate memory, continuing
      2024-01-24T01:08:49.704-08:00 [FATA] scorch AsyncError, path: /opt/couchbase/var/lib/couchbase/data/@fts/sift_bucket.sift_scope.sift_index1_69ba33fcc047e88f_37b65936.pindex/store, treating this as fatal, err: got err persisting snapshot: error opening new segment at /opt/couchbase/var/lib/couchbase/data/@fts/sift_bucket.sift_scope.sift_index1_69ba33fcc047e88f_37b65936.pindex/store/000000001c84.zap, cannot allocate memory, stack dump: /opt/couchbase/var/lib/couchbase/data/@fts/dumps/1706087329.fts.stack.dump.txt -- main.initBleveOptions.func2() at init_bleve.go:113
      

      Following service crash is also seen while running knn queries in the background.

      Service 'fts' exited with status 1. Restarting. Messages:
      2024-01-24T00:44:22.327-08:00 [WARN] slow-query: index: sift_bucket.sift_scope.sift_index, username: <ud>Administrator</ud>, query: <ud>{"query":{"match_none":{}},"explain":true,"knn":[{"field":"vector_data","k":2,"vector":[16,34,46,3,1,6,27,44,0,9,77,32,37,83,41,3,19,14,15,17,18,34,22,22,12,23,52,12,0,0,0,1,8,22,15,16,16,35,92,94,1,3,9,13,22,120,120,12,120,15,5,0,0,39,74,120,115,24,14,7,2,7,4,34,1,18,36,120,41,10,2,0,1,8,93,120,35,21,7,0,120,104,57,7,0,1,2,20,120,51,2,13,18,12,3,20,7,6,9,18,18,24,19,4,5,12,114,71,13,2,0,2,60,30,81,54,0,0,0,13,44,10,1,3,7,15,20,38]}],"size":10,"from":0}</ud>, resultset bytes: 1306, duration: 10.001109774s, err: <nil> -- rest.(*QueryHandler).ServeHTTP() at rest_index.go:388
       
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp:
      libgomp: Thread creation failed: Resource temporarily unavailable
       
      libgomp: Thread creation failed: Resource temporarily unavailable
      Thread creation failed: Resource temporarily unavailable
      Thread creation failed: Resource temporarily unavailable
      Thread creation failed: Resource temporarily unavailable
      Thread creation failed: Resource temporarily unavailable
      Thread creation failed: Resource temporarily unavailable
      Thread creation failed: Resource temporarily unavailable
      Thread creation failed: Resource temporarily unavailable
      Thread creation failed: Resource temporarily unavailable
      Thread creation failed: Resource temporarily unavailable
      Thread creation failed: Resource temporarily unavailable
      

      Attachments

        1. fts-172.23.110.127.log
          21.53 MB
        2. fts-172.23.110.128.log
          29.48 MB
        3. fts-172.23.110.66.log
          29.12 MB
        4. fts-172.23.110.69.log
          12.90 MB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            sujay.gad Sujay Gad
            sujay.gad Sujay Gad
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty