Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19234

[FTS] DGM: Indexing on a bucket with 10% active resident ratio(value eviction) causes cbft to be killed by OOM killer

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 4.5.0
    • 4.5.0
    • cbft
    • None
    • Untriaged
    • No

    Description

      Build
      4.5.0-2133

      Testcase
      ./testrunner -i INI_FILE.ini get-cbcollect-info=True,get-coredumps=True,get-logs=False,stop-on-failure=False,cluster=D+F:F:F,GROUP=DGM -t fts.stable_topology_fts.StableTopFTS.create_simple_default_index,cluster=D+F,F,F,dgm_run=1,active_resident_ratio=10,GROUP=DGM

      Value eviction, a total of 1,351,000 keys were loaded. cbft gets killed after indexing 1,142,556 keys. All nodes have SSDs, 8GB RAM and 4 cores of CPU. OOM happens on a node that runs only fts.

      Test log shows -

      [2016-04-14 17:19:39,769] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 11886
      [2016-04-14 17:19:47,709] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 33014
      [2016-04-14 17:19:54,655] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 43568
      [2016-04-14 17:20:01,295] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 58069
      [2016-04-14 17:20:09,267] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 69933
      :
      :
      [2016-04-14 17:36:19,964] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1112250
      [2016-04-14 17:36:28,741] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1112250
      [2016-04-14 17:36:38,771] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1114876
      [2016-04-14 17:36:45,064] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1116192
      [2016-04-14 17:36:52,170] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1117508
      [2016-04-14 17:36:59,217] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1118825
      [2016-04-14 17:37:10,723] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1118825
      [2016-04-14 17:37:16,656] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1122797
      [2016-04-14 17:37:30,554] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1124112
      [2016-04-14 17:37:40,686] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1124112
      [2016-04-14 17:37:48,165] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1125445
      [2016-04-14 17:37:57,456] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1126772
      [2016-04-14 17:38:17,903] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1129385
      [2016-04-14 17:38:30,432] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1132024
      [2016-04-14 17:38:36,790] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1134649
      [2016-04-14 17:38:44,066] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1137283
      [2016-04-14 17:38:51,664] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1139910
      [2016-04-14 17:39:00,438] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1142556
      [2016-04-14 17:39:06,631] - [rest_client:757] ERROR - http://172.23.105.224:8094/api/index/default_index_1/count error 500 reason: status: 500, content: rest_index: Count, indexName: default_index_1, err: bleve: CountBlevePIndexImpl indexAlias error, indexName: default_index_1, indexUUID: , err: bleve: bleveIndexTargets, err: pindex: queries may have been disabled; no nodes are enabled/allocated to serve queries for the index partition(s)
       rest_index: Count, indexName: default_index_1, err: bleve: CountBlevePIndexImpl indexAlias error, indexName: default_index_1, indexUUID: , err: bleve: bleveIndexTargets, err: pindex: queries may have been disabled; no nodes are enabled/allocated to serve queries for the index partition(s)
      ERROR
      

      On .224's cbcollect, syslog/messages shows OOM caused by cbft twice.

      Apr 14 17:30:00 localhost kernel: Out of memory: Kill process 15250 (cbft) score 941 or sacrifice child
      :
      Apr 14 17:39:04 localhost kernel: Out of memory: Kill process 16950 (cbft) score 943 or sacrifice child
      

      Attaching cbcollect from all 3 nodes.
      .224 -> fts
      .216 -> fts
      .120 -> kv,fts

      Attachments

        1. pprof001.svg
          96 kB
          Marty Schoch [X]
        2. pprof002.svg
          100 kB
          Marty Schoch [X]
        3. pprof003.svg
          96 kB
          Marty Schoch [X]
        4. pprof004.svg
          86 kB
          Marty Schoch [X]
        5. pprof005.svg
          76 kB
          Marty Schoch [X]
        6. pprof006.svg
          59 kB
          Marty Schoch [X]
        7. pprof007.svg
          76 kB
          Marty Schoch [X]
        8. pprof008.svg
          61 kB
          Marty Schoch [X]
        9. pprof009.svg
          55 kB
          Marty Schoch [X]
        10. pprof010.svg
          54 kB
          Marty Schoch [X]
        11. pprof011.svg
          70 kB
          Marty Schoch [X]
        12. s2.png
          337 kB
          Marty Schoch [X]
        13. s3.png
          341 kB
          Marty Schoch [X]
        14. s4.png
          301 kB
          Marty Schoch [X]
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            apiravi Aruna Piravi (Inactive)
            apiravi Aruna Piravi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty