Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-29325

Intra-cluster replication uses only a fraction of network and disk before bottlenecking

    XMLWordPrintable

Details

    Description

      Some experiments of bulk import and rebalancing have shown that sustainable speed of bulk import is capped by about 10K docs/s (document size 15KB) on a 2 node cluster consisting of i3.2xlarge nodes (10Gbit network, 8 CPU cores, 64GB RAM, NVME disk, 180.000 IOPS). 

      At higher import speeds, the DCP queue for replication is constantly growing, until it reaches levels generating temp OOM errors. The drain rate of this DCP queue stays all the time very stable at about 5K items/s per node. The speed is the same independently of whether there is additional network traffic generated by the client.

      The 10Gbit connection should support replication speed of about 1GB/s, but the actual rate is only 75MB/s per node. To make sure that network bandwidth is not the limiting resource, we repeated the same experiment on i3.16xlarge nodes, which are connected by 25Gbit network. The replication rate per node increased only insignificantly to 5.3K items/s per node.

      There are no other activities on the cluster: no indexes, no views, only data service.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              shivani.gupta Shivani Gupta
              shivani.gupta Shivani Gupta
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty