Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19234

[FTS] DGM: Indexing on a bucket with 10% active resident ratio(value eviction) causes cbft to be killed by OOM killer

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 4.5.0
    • 4.5.0
    • cbft
    • None
    • Untriaged
    • No

    Description

      Build
      4.5.0-2133

      Testcase
      ./testrunner -i INI_FILE.ini get-cbcollect-info=True,get-coredumps=True,get-logs=False,stop-on-failure=False,cluster=D+F:F:F,GROUP=DGM -t fts.stable_topology_fts.StableTopFTS.create_simple_default_index,cluster=D+F,F,F,dgm_run=1,active_resident_ratio=10,GROUP=DGM

      Value eviction, a total of 1,351,000 keys were loaded. cbft gets killed after indexing 1,142,556 keys. All nodes have SSDs, 8GB RAM and 4 cores of CPU. OOM happens on a node that runs only fts.

      Test log shows -

      [2016-04-14 17:19:39,769] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 11886
      [2016-04-14 17:19:47,709] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 33014
      [2016-04-14 17:19:54,655] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 43568
      [2016-04-14 17:20:01,295] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 58069
      [2016-04-14 17:20:09,267] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 69933
      :
      :
      [2016-04-14 17:36:19,964] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1112250
      [2016-04-14 17:36:28,741] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1112250
      [2016-04-14 17:36:38,771] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1114876
      [2016-04-14 17:36:45,064] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1116192
      [2016-04-14 17:36:52,170] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1117508
      [2016-04-14 17:36:59,217] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1118825
      [2016-04-14 17:37:10,723] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1118825
      [2016-04-14 17:37:16,656] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1122797
      [2016-04-14 17:37:30,554] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1124112
      [2016-04-14 17:37:40,686] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1124112
      [2016-04-14 17:37:48,165] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1125445
      [2016-04-14 17:37:57,456] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1126772
      [2016-04-14 17:38:17,903] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1129385
      [2016-04-14 17:38:30,432] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1132024
      [2016-04-14 17:38:36,790] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1134649
      [2016-04-14 17:38:44,066] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1137283
      [2016-04-14 17:38:51,664] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1139910
      [2016-04-14 17:39:00,438] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1142556
      [2016-04-14 17:39:06,631] - [rest_client:757] ERROR - http://172.23.105.224:8094/api/index/default_index_1/count error 500 reason: status: 500, content: rest_index: Count, indexName: default_index_1, err: bleve: CountBlevePIndexImpl indexAlias error, indexName: default_index_1, indexUUID: , err: bleve: bleveIndexTargets, err: pindex: queries may have been disabled; no nodes are enabled/allocated to serve queries for the index partition(s)
       rest_index: Count, indexName: default_index_1, err: bleve: CountBlevePIndexImpl indexAlias error, indexName: default_index_1, indexUUID: , err: bleve: bleveIndexTargets, err: pindex: queries may have been disabled; no nodes are enabled/allocated to serve queries for the index partition(s)
      ERROR
      

      On .224's cbcollect, syslog/messages shows OOM caused by cbft twice.

      Apr 14 17:30:00 localhost kernel: Out of memory: Kill process 15250 (cbft) score 941 or sacrifice child
      :
      Apr 14 17:39:04 localhost kernel: Out of memory: Kill process 16950 (cbft) score 943 or sacrifice child
      

      Attaching cbcollect from all 3 nodes.
      .224 -> fts
      .216 -> fts
      .120 -> kv,fts

      Attachments

        1. s4.png
          s4.png
          301 kB
        2. s3.png
          s3.png
          341 kB
        3. s2.png
          s2.png
          337 kB
        4. pprof011.svg
          70 kB
        5. pprof010.svg
          54 kB
        6. pprof009.svg
          55 kB
        7. pprof008.svg
          61 kB
        8. pprof007.svg
          76 kB
        9. pprof006.svg
          59 kB
        10. pprof005.svg
          76 kB
        11. pprof004.svg
          86 kB
        12. pprof003.svg
          96 kB
        13. pprof002.svg
          100 kB
        14. pprof001.svg
          96 kB
        15. collectinfo-2016-04-18T204843-n_2@127.0.0.1.zip
          5.79 MB
        16. collectinfo-2016-04-18T204843-n_1@127.0.0.1.zip
          6.00 MB
        17. collectinfo-2016-04-18T204843-n_0@192.168.1.111.zip
          11.09 MB
        18. 172.23.106.120-20160414-1739-diag.zip
          9.37 MB
        19. 172.23.105.224-20160414-1744-diag.zip
          4.64 MB
        20. 172.23.105.216-20160414-1742-diag.zip
          4.82 MB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            apiravi Aruna Piravi (Inactive)
            apiravi Aruna Piravi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty