Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19234

[FTS] DGM: Indexing on a bucket with 10% active resident ratio(value eviction) causes cbft to be killed by OOM killer

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 4.5.0
    • 4.5.0
    • cbft
    • None
    • Untriaged
    • No

    Description

      Build
      4.5.0-2133

      Testcase
      ./testrunner -i INI_FILE.ini get-cbcollect-info=True,get-coredumps=True,get-logs=False,stop-on-failure=False,cluster=D+F:F:F,GROUP=DGM -t fts.stable_topology_fts.StableTopFTS.create_simple_default_index,cluster=D+F,F,F,dgm_run=1,active_resident_ratio=10,GROUP=DGM

      Value eviction, a total of 1,351,000 keys were loaded. cbft gets killed after indexing 1,142,556 keys. All nodes have SSDs, 8GB RAM and 4 cores of CPU. OOM happens on a node that runs only fts.

      Test log shows -

      [2016-04-14 17:19:39,769] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 11886
      [2016-04-14 17:19:47,709] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 33014
      [2016-04-14 17:19:54,655] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 43568
      [2016-04-14 17:20:01,295] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 58069
      [2016-04-14 17:20:09,267] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 69933
      :
      :
      [2016-04-14 17:36:19,964] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1112250
      [2016-04-14 17:36:28,741] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1112250
      [2016-04-14 17:36:38,771] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1114876
      [2016-04-14 17:36:45,064] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1116192
      [2016-04-14 17:36:52,170] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1117508
      [2016-04-14 17:36:59,217] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1118825
      [2016-04-14 17:37:10,723] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1118825
      [2016-04-14 17:37:16,656] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1122797
      [2016-04-14 17:37:30,554] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1124112
      [2016-04-14 17:37:40,686] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1124112
      [2016-04-14 17:37:48,165] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1125445
      [2016-04-14 17:37:57,456] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1126772
      [2016-04-14 17:38:17,903] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1129385
      [2016-04-14 17:38:30,432] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1132024
      [2016-04-14 17:38:36,790] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1134649
      [2016-04-14 17:38:44,066] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1137283
      [2016-04-14 17:38:51,664] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1139910
      [2016-04-14 17:39:00,438] - [fts_base:2391] INFO - Docs in bucket = 1351000, docs in FTS index 'default_index_1': 1142556
      [2016-04-14 17:39:06,631] - [rest_client:757] ERROR - http://172.23.105.224:8094/api/index/default_index_1/count error 500 reason: status: 500, content: rest_index: Count, indexName: default_index_1, err: bleve: CountBlevePIndexImpl indexAlias error, indexName: default_index_1, indexUUID: , err: bleve: bleveIndexTargets, err: pindex: queries may have been disabled; no nodes are enabled/allocated to serve queries for the index partition(s)
       rest_index: Count, indexName: default_index_1, err: bleve: CountBlevePIndexImpl indexAlias error, indexName: default_index_1, indexUUID: , err: bleve: bleveIndexTargets, err: pindex: queries may have been disabled; no nodes are enabled/allocated to serve queries for the index partition(s)
      ERROR
      

      On .224's cbcollect, syslog/messages shows OOM caused by cbft twice.

      Apr 14 17:30:00 localhost kernel: Out of memory: Kill process 15250 (cbft) score 941 or sacrifice child
      :
      Apr 14 17:39:04 localhost kernel: Out of memory: Kill process 16950 (cbft) score 943 or sacrifice child
      

      Attaching cbcollect from all 3 nodes.
      .224 -> fts
      .216 -> fts
      .120 -> kv,fts

      Attachments

        1. 172.23.106.120-20160414-1739-diag.zip
          9.37 MB
        2. 172.23.105.224-20160414-1744-diag.zip
          4.64 MB
        3. 172.23.105.216-20160414-1742-diag.zip
          4.82 MB
        4. pprof002.svg
          100 kB
        5. pprof001.svg
          96 kB
        6. pprof003.svg
          96 kB
        7. pprof006.svg
          59 kB
        8. pprof004.svg
          86 kB
        9. pprof005.svg
          76 kB
        10. pprof009.svg
          55 kB
        11. pprof008.svg
          61 kB
        12. pprof007.svg
          76 kB
        13. pprof011.svg
          70 kB
        14. pprof010.svg
          54 kB
        15. s2.png
          s2.png
          337 kB
        16. s4.png
          s4.png
          301 kB
        17. s3.png
          s3.png
          341 kB
        18. collectinfo-2016-04-18T204843-n_2@127.0.0.1.zip
          5.79 MB
        19. collectinfo-2016-04-18T204843-n_1@127.0.0.1.zip
          6.00 MB
        20. collectinfo-2016-04-18T204843-n_0@192.168.1.111.zip
          11.09 MB
        For Gerrit Dashboard: MB-19234
        # Subject Branch Project Status CR V

        Activity

          People

            apiravi Aruna Piravi (Inactive)
            apiravi Aruna Piravi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty