Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-61559

Index sg_channels_x1 fails to proccess all docs in SGW Sync test due to running out of memory

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown

    Description

      Splitting [CBG-3889], so this ticket is only focused about the index memory issue.

      Running a SGW sync test with MOI indexes with 500M docs, results in the not all docs being processed by the sg_channels_x1 index, as can be seen in these runs: https://perf.jenkins.couchbase.com/job/rhea-dev-sgw/48/console, and https://perf.jenkins.couchbase.com/job/rhea-dev-sgw/49/console. However, a run with 350M docs loads all the docs: https://perf.jenkins.couchbase.com/job/rhea-dev-sgw/51/console.

      This happens due to the index nodes running out of memory. The nodes used in these runs each have 100GB memory quota for the Index service. The only 2 large indexes are sg_allDocs_x1 and sg_channels_x1. However, sg_allDocs_x1 managed to process all 500M docs, while sg_channels_x1 gets stuck at 433M docs, due to running out of memory, and enters "Paused" state, which leads to all SGW ops using this index to fail. Based on these numbers, it seems that a larger machine of 256GB would run out of memory at around 1.1B docs.

      It's worth noting that we didn't see any actual error messages on the index side. The only indications that we ran out of memory were the sg_channels_x1 index being "Paused", that it had a lot of changes left to process (67M changes in our tests) and seeing index memory close to the quota limit. The only errors we saw were SGW side errors that mentioned the indexer being paused. Such errors can be found in the sg_warn.log file:

      LIMIT 5000","errors":[{"code":5000,"message":" Indexer Cannot Service SESSION_CONSISTENCY Scan In Paused State from [172.23.97.131:9101] - cause:  Indexer Cannot Service SESSION_CONSISTENCY Scan In Paused State from [172.23.97.131:9101]"}],"http_status_code":200}
      

      I did a couple of runs with Index storage set to plasma, and we managed to load and process all 500M docs for both sg_allDocs_x1 and sg_channels_x1 without running out of memory: https://perf.jenkins.couchbase.com/job/rhea-dev-sgw/55/console

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            dragos.taraban Dragos Taraban
            dragos.taraban Dragos Taraban
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty