Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7287

XDCR: replication lag on ec2 periodically reaches 1 minute and more

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0, 2.2.0
    • Fix Version/s: 3.0
    • Component/s: XDCR
    • Security Level: Public
    • Labels:
    • Environment:
      ec2 High-Memory Double Extra Large Instance, 4 east <-> 4 west, bidir
      build 1967
    • Triage:
      Untriaged
    • Sprint:
      PCI Team - Sprint 2, PCI Team - Sprint 3

      Description

      While mutation rate is about 6K ops/sec replication queue reaches 400K items from time to time (see page 37, east coast (source) is worse for some reason).

      I doesn't seem to be an issue, but I never saw requirements for delays. So wondering.

      This is ec2 specific issue (taking into account laws of physics).

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        junyi Junyi Xie (Inactive) added a comment -

        Pavel, I mentioned to run test without compaction just to isolate the problem for diagnosis. it does not mean we should turn off compaction in production.

        Maria,

        To fundamentally solve the lag we need to be able to replicate directly from memory (UPR project), other ad-hoc improvements at this time may not solve the issue due to the limitation of design. Thus I am wondering if we can defer it to 3.0. Thanks.

        Show
        junyi Junyi Xie (Inactive) added a comment - Pavel, I mentioned to run test without compaction just to isolate the problem for diagnosis. it does not mean we should turn off compaction in production. Maria, To fundamentally solve the lag we need to be able to replicate directly from memory (UPR project), other ad-hoc improvements at this time may not solve the issue due to the limitation of design. Thus I am wondering if we can defer it to 3.0. Thanks.
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        Please advise. Thanks.

        Show
        junyi Junyi Xie (Inactive) added a comment - Please advise. Thanks.
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        Better hardware certainly would help a lot of things. But requirements also bumps up with better hardware, e.g., users may want to much higher writes/sec after spending $$$ upgrading to SSD. Thus we need in-memory XDCR instead of asking users to upgrade their hardware. Deferred to 3.0

        Show
        junyi Junyi Xie (Inactive) added a comment - Better hardware certainly would help a lot of things. But requirements also bumps up with better hardware, e.g., users may want to much higher writes/sec after spending $$$ upgrading to SSD. Thus we need in-memory XDCR instead of asking users to upgrade their hardware. Deferred to 3.0
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        As part of investigating this issue, I'd start with figuring out if that's ram -> disk latency or something else.

        One way to observe something hitting disk is views. There may be others.

        Feel free to approach me for discussion.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - As part of investigating this issue, I'd start with figuring out if that's ram -> disk latency or something else. One way to observe something hitting disk is views. There may be others. Feel free to approach me for discussion.
        Hide
        pavelpaulau Pavel Paulau added a comment -

        Closing as irrelevant.

        We have a separate ticket for "ec2" issues and we already know that UPR-based XDCR eliminated persistence issue.

        Show
        pavelpaulau Pavel Paulau added a comment - Closing as irrelevant. We have a separate ticket for "ec2" issues and we already know that UPR-based XDCR eliminated persistence issue.

          People

          • Assignee:
            alkondratenko Aleksey Kondratenko (Inactive)
            Reporter:
            pavelpaulau Pavel Paulau
          • Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Agile

                Gerrit Reviews

                There are no open Gerrit changes