Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62278

FTS exited with status 2 with no load and on first 4 queries, k=10 || All queries are marked slow || Stat Discrepancy

    XMLWordPrintable

Details

    Description

      Test environment:

      • 5 dedicated fts nodes with 8cpus and 16GB RAM each node
      • 60Million document on KV of dimension 1536

      Test Steps:

      • Loaded ~60Mil documents on to KV
      • Created indexing with 18 partitions per node, so overall 90 paritions
      • "planParams": { "maxPartitionsPerPIndex": 12, "indexPartitions": 90 },
      • I have only one index with l2_norm 
      • Wait for indexing complete and cluster to be quiet.
      • Ran 5 knn queries with k=10 and fts exited with status 2

      ns_1@172.23.97.108
      3:00:07 AM 11 Jun, 2024
       
      Service 'fts' exited with status 2. Restarting. Messages:
      goroutine 1577357 gp=0xc04e42ba40 m=nil [runnable]:
      github.com/blevesearch/bleve/v2/index/scorch.(*OptimizeVR).Finish.func1.gowrap2()
      /home/couchbase/.cbdepscache/gomodcache/pkg/mod/github.com/blevesearch/bleve/v2@v2.4.1-0.20240606135638-72887d93ad69/index/scorch/optimize_knn.go:110 fp=0xc03d45b7e0 sp=0xc03d45b7d8 pc=0x78be40
      runtime.goexit({})
      /home/couchbase/.cbdepscache/exploded/x86_64/go-1.22.2/go/src/runtime/asm_amd64.s:1695 +0x1 fp=0xc03d45b7e8 sp=0xc03d45b7e0 pc=0x47cd01
      created by github.com/blevesearch/bleve/v2/index/scorch.(*OptimizeVR).Finish.func1 in goroutine 1576227
      /home/couchbase/.cbdepscache/gomodcache/pkg/mod/github.com/blevesearch/bleve/v2@v2.4.1-0.20240606135638-72887d93ad69/index/scorch/optimize_knn.go:110 +0x115rax 0x0
      rbx 0x6
      rcx 0x7fa6ff0367bb
      rdx 0x0
      rdi 0x2
      rsi 0x7fa678bc1e80
      rbp 0x2008952
      rsp 0x7fa678bc1e80
      r8 0x0
      r9 0x7fa678bc1e80
      r10 0x8
      r11 0x246
      r12 0x7fa678bc2290
      r13 0x599010
      r14 0xc000006c40
      r15 0xfffffffffffff
      rip 0x7fa6ff0367bb
      rflags 0x246
      cs 0x33
      fs 0x0
      gs 0x0 

      Image indicating just ~4 queries:


      Also I see total_knn_queries in promtimer stat as 305 which is contradictory, there seems to be some discrepancy in this stat and I also see all 305 being marked as slow.

      Attachments

        Activity

          People

            sarthak.dua Sarthak Dua
            sarthak.dua Sarthak Dua
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty