Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-28047

Ejection makes no progress for several minutes causing TMP OOM failures during data ingestion

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • Yes

    Description

      Several tests indicate problems with initial data load. Clients keeps receiving ep_tmp_oom_errors even in absence of large persistence and replication queues. It looks like kv-engine cannot evict items promptly - ep_num_value_ejects counter literally freezes for several minutes.

      Let's use the following test case as an example:

      • 2 nodes
      • 1 bucket (full ejection)
      • 100M items

      I stopped the clients after the first TMP OOM error at 15:06:50 and left the system running.

      I can see that one of three non-IO threads is constantly busy (100% CPU) and ep_num_eject_failures counter keeps increasing. Once in a while items get ejected.

      Logs, perf profile for non-IO thread, and some graphs from mortimer are attached.

      Attachments

        1. ep_num_value_ejects.png
          176 kB
          Pavel Paulau
        2. ep_num_eject_failures.png
          217 kB
          Pavel Paulau
        3. mem_used.png
          193 kB
          Pavel Paulau
        4. non_io.txt
          132 kB
          Pavel Paulau
        5. htop.png
          567 kB
          Pavel Paulau
        6. not_ejecting.png
          663 kB
          Pavel Paulau

        Issue Links

          For Gerrit Dashboard: MB-28047
          # Subject Branch Project Status CR V

          Activity

            People

              pavelpaulau Pavel Paulau (Inactive)
              pavelpaulau Pavel Paulau (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty