Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-62292

FTS service exists with status 1 with multiple FTS indices

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • None
    • 7.6.2
    • fts
    • Couchbase server Enterprise Edition 7.6.2 build 3710
    • Untriaged
    • 0
    • Unknown

    Description

      Created a 6 node cluster ( 3 fts nodes and 3 kv nodes )
      Installed Couchbase server Enterprise Edition 7.6.2 build 3710

      Running a script which constantly pushes 1M documents with vector embeddings of 4096 dimensions.
      Note: The push type is overwrite. That is, first 1M docs are pushed ranging from doc1 to doc1000000. Then immediately, the load is triggered again and the docs are upserted from doc1 to doc1000000 and this is looped infinite.

      Created 6 search indices of the following combinations

      • Dot product + recall
      • Dot product + memory
      • Dot product + perf
      • L2norm + recall
      • L2norm + memory
      • L2Norm + perf

        Index additional details :  1 replica and 12 partitions

      After few mins, got multiple instances of "FTS exited with status 1" with different reasons. Eventually one of the search nodes went down after these instances.

      Followed by this, all the search nodes (3 nodes) went down. The kv nodes were still up and the docs were getting upserted at constant ops.

      Error:

      Service 'fts' exited with status 1. Restarting. Messages: 2024-06-12T12:54:41.801+05:30 [INFO] feed_dcp_gocbcore: Start, name: b1.s1.dot_memory_3acd38391505a897_b024670b, num streams: 85, manifestUID: 8, streamOptions: {FilterOptions: &{ScopeID:0 CollectionIDs:[12]}, StreamOptions: &{StreamID:5}}, vbuckets: [684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768] 2024-06-12T12:54:41.778+05:30 [FATA] scorch AsyncError, path: /opt/couchbase/var/lib/couchbase/data/@fts/b1.s1.dot_performance_202bcdb171dec266_364edd56.pindex/store, treating this as fatal, err: merging err: merging failed: write /opt/couchbase/var/lib/couchbase/data/@fts/b1.s1.dot_performance_202bcdb171dec266 

      Service 'fts' exited with status 1. Restarting. Messages: 2024-06-12T12:54:35.188+05:30 [INFO] (GOCBCORE) Creating new dcp agent: &{UserAgent:fts:b1:4638b3a0529ff9c4150f3962ac894c03-5ae7cd03 BucketName:b1 SeedConfig:{HTTPAddrs:[127.0.0.1:8091] MemdAddrs:[] SRVRecord:<nil>} SecurityConfig:{UseTLS:false TLSRootCAProvider:0xc47b40 NoTLSSeedNode:true Auth:0x2e0eba0 AuthMechanisms:[]} CompressionConfig:{Enabled:false DisableDecompression:false MinSize:0 MinRatio:0} ConfigPollerConfig:{HTTPRedialPeriod:0s HTTPRetryDelay:0s HTTPMaxWait:0s CccpMaxWait:0s CccpPollPeriod:0s} IoConfig:{NetworkType:default UseMutationTokens:false UseDurations:false UseOutOfOrderResponses:false DisableXErrorHello:false DisableJSONHello:false DisableSyncReplicationHello:false EnablePITRHello:false UseCollections:true} KVConfig:{ConnectTimeout:7s ServerWaitBackoff:0s PoolSize:0 MaxQueueSize:0 ConnectionBufferSize:20971520} HTTPConfig:{MaxIdleConns:300 MaxIdleConnsPerHost:100 ConnectTimeout:1m0s IdleConnectionTimeout:0s} DCPConfig:{AgentPriority:1 UseExpiryOpcode:false UseStreamID:true UseOSOBackfill:true BackfillOrder:1 BufferSize:20971520 DisableBufferAcknowledgement:false}} 2024-06-12T12:54:35.435+05:30 [FATA] scorch AsyncError, path: /opt/couchbase/var/lib/couchbase/data/@fts/b1.s1.dot_performance_202bcdb171dec266_364edd56.pindex/store, treating this as fatal, err: merging err: merging failed: write /opt/couchbase/var/lib/couchbase/data/@fts/b1.s1.dot_performance_202bcdb171dec266_364edd56.pindex/store/000000000012.zap: no space left on device, stack dump: /opt/couchbase/var/lib/couchbase/data/@fts/dumps/1718177074.fts.stack.dump.txt -- main.initBleveOptions.func2() at init_bleve.go:113
      

      Service 'fts' exited with status 1. Restarting. Messages: 2024-06-12T12:54:27.801+05:30 [INFO] pindex_bleve: started runBatchWorker: 1 for pindex: b1.s1.dot_performance_202bcdb171dec266_b024670b 2024-06-12T12:54:27.801+05:30 [INFO] pindex_bleve: started runBatchWorker: 0 for pindex: b1.s1.dot_performance_202bcdb171dec266_b024670b 2024-06-12T12:54:27.801+05:30 [INFO] pindex_bleve: started runBatchWorker: 2 for pindex: b1.s1.dot_performance_202bcdb171dec266_b024670b 2024-06-12T12:54:28.427+05:30 DEBU REGU.impl.init.0.func1() at aggrecorder.go:57 [id 34] will report aggregate recorder stats every 5m0s  2024-06-12T12:54:28.484+05:30 [FATA] scorch AsyncError, path: /opt/couchbase/var/lib/couchbase/data/@fts/b1.s1.dot_performance_202bcdb171dec266_364edd56.pindex/store, treating this as fatal, err: merging err: merging failed: write /opt/couchbase/var/lib/couchbase/data/@fts/b1.s1.dot_performance_202bcdb171dec266_364edd56.pindex/store/000000000012.zap: no space left on device, stack dump: /opt/couchbase/var/lib/couchbase/data/@fts/dumps/1718177068.fts.stack.dump.txt -- main.initBleveOptions.func2() at init_bleve.go:113 

      and there were many instances of these errors.. 

      Will soon attach the logs of the fts node which went down.

      Attachments

        Activity

          People

            abhinav Abhi Dangeti
            nishanth.vm Nishanth VM
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty