Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19234

[FTS] DGM: Indexing on a bucket with 10% active resident ratio(value eviction) causes cbft to be killed by OOM killer

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 4.5.0
    • 4.5.0
    • cbft
    • None
    • Untriaged
    • No

    Description

      Build
      4.5.0-2133

      Testcase
      ./testrunner -i INI_FILE.ini get-cbcollect-info=True,get-coredumps=True,get-logs=False,stop-on-failure=False,cluster=D+F:F:F,GROUP=DGM -t fts.stable_topology_fts.StableTopFTS.create_simple_default_index,cluster=D+F,F,F,dgm_run=1,active_resident_ratio=10,GROUP=DGM

      Value eviction, a total of 1,351,000 keys were loaded. cbft gets killed after indexing 1,142,556 keys. All nodes have SSDs, 8GB RAM and 4 cores of CPU. OOM happens on a node that runs only fts.

      Test log shows -

      [2016-04-14 17:19:39,769] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 11886
      [2016-04-14 17:19:47,709] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 33014
      [2016-04-14 17:19:54,655] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 43568
      [2016-04-14 17:20:01,295] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 58069
      [2016-04-14 17:20:09,267] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 69933
      :
      :
      [2016-04-14 17:36:19,964] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1112250
      [2016-04-14 17:36:28,741] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1112250
      [2016-04-14 17:36:38,771] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1114876
      [2016-04-14 17:36:45,064] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1116192
      [2016-04-14 17:36:52,170] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1117508
      [2016-04-14 17:36:59,217] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1118825
      [2016-04-14 17:37:10,723] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1118825
      [2016-04-14 17:37:16,656] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1122797
      [2016-04-14 17:37:30,554] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1124112
      [2016-04-14 17:37:40,686] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1124112
      [2016-04-14 17:37:48,165] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1125445
      [2016-04-14 17:37:57,456] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1126772
      [2016-04-14 17:38:17,903] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1129385
      [2016-04-14 17:38:30,432] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1132024
      [2016-04-14 17:38:36,790] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1134649
      [2016-04-14 17:38:44,066] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1137283
      [2016-04-14 17:38:51,664] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1139910
      [2016-04-14 17:39:00,438] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1142556
      [2016-04-14 17:39:06,631] - [rest_client:757] ERROR - http://172.23.105.224:8094/api/index/default_index_1/count error 500 reason: status: 500, content: rest_index: Count, indexName: default_index_1, err: bleve: CountBlevePIndexImpl indexAlias error, indexName: default_index_1, indexUUID: , err: bleve: bleveIndexTargets, err: pindex: queries may have been disabled; no nodes are enabled/allocated to serve queries for the index partition(s)
       rest_index: Count, indexName: default_index_1, err: bleve: CountBlevePIndexImpl indexAlias error, indexName: default_index_1, indexUUID: , err: bleve: bleveIndexTargets, err: pindex: queries may have been disabled; no nodes are enabled/allocated to serve queries for the index partition(s)
      ERROR
      

      On .224's cbcollect, syslog/messages shows OOM caused by cbft twice.

      Apr 14 17:30:00 localhost kernel: Out of memory: Kill process 15250 (cbft) score 941 or sacrifice child
      :
      Apr 14 17:39:04 localhost kernel: Out of memory: Kill process 16950 (cbft) score 943 or sacrifice child
      

      Attaching cbcollect from all 3 nodes.
      .224 -> fts
      .216 -> fts
      .120 -> kv,fts

      Attachments

        1. 172.23.105.216-20160414-1742-diag.zip
          4.82 MB
        2. 172.23.105.224-20160414-1744-diag.zip
          4.64 MB
        3. 172.23.106.120-20160414-1739-diag.zip
          9.37 MB
        4. collectinfo-2016-04-18T204843-n_0@192.168.1.111.zip
          11.09 MB
        5. collectinfo-2016-04-18T204843-n_1@127.0.0.1.zip
          6.00 MB
        6. collectinfo-2016-04-18T204843-n_2@127.0.0.1.zip
          5.79 MB
        7. pprof001.svg
          96 kB
        8. pprof002.svg
          100 kB
        9. pprof003.svg
          96 kB
        10. pprof004.svg
          86 kB
        11. pprof005.svg
          76 kB
        12. pprof006.svg
          59 kB
        13. pprof007.svg
          76 kB
        14. pprof008.svg
          61 kB
        15. pprof009.svg
          55 kB
        16. pprof010.svg
          54 kB
        17. pprof011.svg
          70 kB
        18. s2.png
          s2.png
          337 kB
        19. s3.png
          s3.png
          341 kB
        20. s4.png
          s4.png
          301 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            apiravi Aruna Piravi (Inactive)
            apiravi Aruna Piravi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty