Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-35889

Replication can get stuck when checkpoint memory overhead is very high

    XMLWordPrintable

Details

    • Untriaged
    • No
    • KV-Engine MH 2nd Beta, KV Sprint 2020-April

    Description

      Build 6.5.0-4218

      Observed that replication stuck when data service goes into low resident ratio.
      While running some HiDD tests on couchbase bucket we came across this issue.
      In this test we have 2 data nodes, load 250M docs and RR goes to 0.43%. After load phase we wait for "ep_dcp_replica_items_remaining" to go to zero. "ep_dcp_replica_items_remaining" stays ~19K and never become zero.

      Job- http://perf.jenkins.couchbase.com/job/magma-hidd/441
      Logs-
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/replica_issue_couchbase/collectinfo-2019-09-10T055022-ns_1%40172.23.97.38.zip
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/replica_issue_couchbase/collectinfo-2019-09-10T055022-ns_1%40172.23.97.39.zip
       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            bo-chun.wang Bo-Chun Wang added a comment -

            I do a run on build 6.6.0-7883. In this run, ep_dcp_replica_items_remaining is able to reach 0. I close this ticket.

            http://perf.jenkins.couchbase.com/job/rhea-5node1/120/

             

            2020-07-21T13:43:40 [INFO] Monitoring DCP queues: bucket-1

            2020-07-21T13:43:40 [INFO] ep_dcp_replica_items_remaining reached 0

            2020-07-21T13:43:40 [INFO] ep_dcp_other_items_remaining reached 0

            2020-07-21T13:43:40 [INFO] Monitoring replica count match: bucket-1

            2020-07-21T13:43:40 [INFO] curr_items: 250000000, replica_curr_items: 250000000

            bo-chun.wang Bo-Chun Wang added a comment - I do a run on build 6.6.0-7883. In this run, ep_dcp_replica_items_remaining is able to reach 0. I close this ticket. http://perf.jenkins.couchbase.com/job/rhea-5node1/120/   2020-07-21T13:43:40 [INFO] Monitoring DCP queues: bucket-1 2020-07-21T13:43:40 [INFO] ep_dcp_replica_items_remaining reached 0 2020-07-21T13:43:40 [INFO] ep_dcp_other_items_remaining reached 0 2020-07-21T13:43:40 [INFO] Monitoring replica count match: bucket-1 2020-07-21T13:43:40 [INFO] curr_items: 250000000, replica_curr_items: 250000000

            Build couchbase-server-7.0.0-2130 contains kv_engine commit 2bd86cd with commit message:
            MB-35889: Don't add keys to Checkpoint indexes for Disk checkpoints

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-2130 contains kv_engine commit 2bd86cd with commit message: MB-35889 : Don't add keys to Checkpoint indexes for Disk checkpoints

            Build couchbase-server-7.0.0-2130 contains kv_engine commit 7579822 with commit message:
            MB-35889: Don't invalidate index entry for Disk checkpoint for expel

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-2130 contains kv_engine commit 7579822 with commit message: MB-35889 : Don't invalidate index entry for Disk checkpoint for expel

            Build couchbase-server-6.6.0-7654 contains kv_engine commit 7579822 with commit message:
            MB-35889: Don't invalidate index entry for Disk checkpoint for expel

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.0-7654 contains kv_engine commit 7579822 with commit message: MB-35889 : Don't invalidate index entry for Disk checkpoint for expel

            Jepsen tests caught a crash with this, need to loosen up some assertions and add a test or two - MB-39435.

            ben.huddleston Ben Huddleston added a comment - Jepsen tests caught a crash with this, need to loosen up some assertions and add a test or two - MB-39435 .

            People

              bo-chun.wang Bo-Chun Wang
              mahesh.mandhare Mahesh Mandhare (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty