Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-58463

Replication stuck caused by Checkpoint OOM

    XMLWordPrintable

Details

    • Triaged
    • 0
    • Unknown

    Description

      In our weekly runs using 7.6.0-1419, we saw the issue happened intermittently. ep_dcp_replica_items_remaining can't reach 0 for hours.

      2023-08-29T08:14:11 [INFO] ep_dcp_replica_items_remaining = 516936

      ...

      2023-08-29T15:44:25 [INFO] ep_dcp_replica_items_remaining = 516936

      Build timed out (after 480 minutes). Marking the build as aborted.

      There is another ticket opened for the same issue (MB-58342) in 7.2.1. However, we didn't see this issue in our previous weekly pipeline using 7.6.0-1363. It's not clear if it's a new issue in 7.6 or it shares the same root cause as MB-58342. Therefore, I opened this ticket to track the issue.

      Build: 7.6.0-1419

      Jobs: 

      http://perf.jenkins.couchbase.com/job/magma-nvme-ycsb/4462/

      http://perf.jenkins.couchbase.com/job/magma-nvme-ycsb/4459/

      Logs:

      https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2023-08-30T040233-ns_1%40172.23.100.135.zip

      https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2023-08-30T040233-ns_1%40172.23.100.136.zip

      https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2023-08-30T040233-ns_1%40172.23.100.137.zip

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              bo-chun.wang Bo-Chun Wang
              bo-chun.wang Bo-Chun Wang
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty