Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-35889

Replication can get stuck when checkpoint memory overhead is very high

    XMLWordPrintable

    Details

    • Triage:
      Untriaged
    • Epic Link:
    • Is this a Regression?:
      No
    • Sprint:
      KV-Engine MH 2nd Beta, KV Sprint 2020-April

      Description

      Build 6.5.0-4218

      Observed that replication stuck when data service goes into low resident ratio.
      While running some HiDD tests on couchbase bucket we came across this issue.
      In this test we have 2 data nodes, load 250M docs and RR goes to 0.43%. After load phase we wait for "ep_dcp_replica_items_remaining" to go to zero. "ep_dcp_replica_items_remaining" stays ~19K and never become zero.

      Job- http://perf.jenkins.couchbase.com/job/magma-hidd/441
      Logs-
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/replica_issue_couchbase/collectinfo-2019-09-10T055022-ns_1%40172.23.97.38.zip
      https://cb-jira.s3.us-east-2.amazonaws.com/logs/replica_issue_couchbase/collectinfo-2019-09-10T055022-ns_1%40172.23.97.39.zip
       

        Attachments

          Issue Links

          For Gerrit Dashboard: MB-35889
          # Subject Branch Project Status CR V

            Activity

            Hide
            ben.huddleston Ben Huddleston added a comment -

            Jepsen tests caught a crash with this, need to loosen up some assertions and add a test or two - MB-39435.

            Show
            ben.huddleston Ben Huddleston added a comment - Jepsen tests caught a crash with this, need to loosen up some assertions and add a test or two - MB-39435 .
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.6.0-7654 contains kv_engine commit 7579822 with commit message:
            MB-35889: Don't invalidate index entry for Disk checkpoint for expel

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.0-7654 contains kv_engine commit 7579822 with commit message: MB-35889 : Don't invalidate index entry for Disk checkpoint for expel
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-7.0.0-2130 contains kv_engine commit 7579822 with commit message:
            MB-35889: Don't invalidate index entry for Disk checkpoint for expel

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-2130 contains kv_engine commit 7579822 with commit message: MB-35889 : Don't invalidate index entry for Disk checkpoint for expel
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-7.0.0-2130 contains kv_engine commit 2bd86cd with commit message:
            MB-35889: Don't add keys to Checkpoint indexes for Disk checkpoints

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-2130 contains kv_engine commit 2bd86cd with commit message: MB-35889 : Don't add keys to Checkpoint indexes for Disk checkpoints
            Hide
            bo-chun.wang Bo-Chun Wang added a comment -

            I do a run on build 6.6.0-7883. In this run, ep_dcp_replica_items_remaining is able to reach 0. I close this ticket.

            http://perf.jenkins.couchbase.com/job/rhea-5node1/120/

             

            2020-07-21T13:43:40 [INFO] Monitoring DCP queues: bucket-1

            2020-07-21T13:43:40 [INFO] ep_dcp_replica_items_remaining reached 0

            2020-07-21T13:43:40 [INFO] ep_dcp_other_items_remaining reached 0

            2020-07-21T13:43:40 [INFO] Monitoring replica count match: bucket-1

            2020-07-21T13:43:40 [INFO] curr_items: 250000000, replica_curr_items: 250000000

            Show
            bo-chun.wang Bo-Chun Wang added a comment - I do a run on build 6.6.0-7883. In this run, ep_dcp_replica_items_remaining is able to reach 0. I close this ticket. http://perf.jenkins.couchbase.com/job/rhea-5node1/120/   2020-07-21T13:43:40 [INFO] Monitoring DCP queues: bucket-1 2020-07-21T13:43:40 [INFO] ep_dcp_replica_items_remaining reached 0 2020-07-21T13:43:40 [INFO] ep_dcp_other_items_remaining reached 0 2020-07-21T13:43:40 [INFO] Monitoring replica count match: bucket-1 2020-07-21T13:43:40 [INFO] curr_items: 250000000, replica_curr_items: 250000000

              People

              Assignee:
              bo-chun.wang Bo-Chun Wang
              Reporter:
              mahesh.mandhare Mahesh Mandhare (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  PagerDuty