Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-35889

Replication can get stuck when checkpoint memory overhead is very high

    XMLWordPrintable

Details

    • Untriaged
    • No
    • KV-Engine MH 2nd Beta, KV Sprint 2020-April

    Description

      Build 6.5.0-4218

      Observed that replication stuck when data service goes into low resident ratio.
      While running some HiDD tests on couchbase bucket we came across this issue.
      In this test we have 2 data nodes, load 250M docs and RR goes to 0.43%. After load phase we wait for "ep_dcp_replica_items_remaining" to go to zero. "ep_dcp_replica_items_remaining" stays ~19K and never become zero.

      Job- http://perf.jenkins.couchbase.com/job/magma-hidd/441
      Logs-
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/replica_issue_couchbase/collectinfo-2019-09-10T055022-ns_1%40172.23.97.38.zip
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/replica_issue_couchbase/collectinfo-2019-09-10T055022-ns_1%40172.23.97.39.zip
       

      Attachments

        Issue Links

          Activity

            People

              bo-chun.wang Bo-Chun Wang
              mahesh.mandhare Mahesh Mandhare (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty