Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-28047

Ejection makes no progress for several minutes causing TMP OOM failures during data ingestion

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • Yes

    Description

      Several tests indicate problems with initial data load. Clients keeps receiving ep_tmp_oom_errors even in absence of large persistence and replication queues. It looks like kv-engine cannot evict items promptly - ep_num_value_ejects counter literally freezes for several minutes.

      Let's use the following test case as an example:

      • 2 nodes
      • 1 bucket (full ejection)
      • 100M items

      I stopped the clients after the first TMP OOM error at 15:06:50 and left the system running.

      I can see that one of three non-IO threads is constantly busy (100% CPU) and ep_num_eject_failures counter keeps increasing. Once in a while items get ejected.

      Logs, perf profile for non-IO thread, and some graphs from mortimer are attached.

      Attachments

        1. ep_num_eject_failures.png
          ep_num_eject_failures.png
          217 kB
        2. ep_num_value_ejects.png
          ep_num_value_ejects.png
          176 kB
        3. htop.png
          htop.png
          567 kB
        4. mem_used.png
          mem_used.png
          193 kB
        5. non_io.txt
          132 kB
        6. not_ejecting.png
          not_ejecting.png
          663 kB
        7. triton-srv-01-ip6.perf.couchbase.com.zip
          21.25 MB
        8. triton-srv-02-ip6.perf.couchbase.com.zip
          20.27 MB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              pavelpaulau Pavel Paulau (Inactive)
              pavelpaulau Pavel Paulau (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty