Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6563

Replication ops/sec drops to 0, on stop/start load from source cluster. [Load with only creates looks okay, load with expired items--> replication rate drops very low.]

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • 2.0-beta-2
    • 2.0-beta
    • XDCR
    • Security Level: Public
    • None
    • Build - 2.0-1696
      vbuckets 1024
      unidirectional replication.
      4G, 4 Core machines.
      No Swap

    Description

      • Setup a 2:3 unidirecitonal replication between 2 clusters.
      • Start mix load on the source cluster.

      On initial start of replication, seeing very good xdc ops/sec and creates/sec ranging from 4k-8k ops/sec.
      Stop load on source.

      Start load on source ( Note this will be treated now as updates)

      • Seeing 8-10k xdc ops/sec on destination cluster, No ( 0-30 items updated) updates/creates being done on destination cluster.

      Can reproduce this with
      nohup lib/perf_engines/mcsoda.py localhost:41208 vbuckets=1024 doc-gen=0 doc-cache=0 ratio-creates=1 ratio-sets=1 ratio-expirations=0.03 expiration=30 ratio-deletes=0.04 min-value-size=2,3 max-items=2000000 exit-after-creates=0 prefix=k_two&
      nohup lib/perf_engines/mcsoda.py localhost:41208 vbuckets=1024 doc-gen=0 doc-cache=0 ratio-creates=1 ratio-sets=1 ratio-expirations=0.03 expiration=30 ratio-deletes=0.04 min-value-size=2,3 max-items=2000000 exit-after-creates=0 prefix=k_one&

      ( With /without updates/deletes also works)

      Note: For about 10-15 minutes, saw no crash reports on the source side.
      Now seeing unable to POST error messages on the source side.

      Source logs show

        • Reason for termination ==
        • {http_request_failed,"POST",
          "http://Administrator:*****@10.3.121.38:8092/default%2f907%3bdc8cfd8bf825ca8adece5b7387af2afd/_bulk_docs",
          {error, {error,timeout}}}

          [error_logger:error,2012-09-07T13:11:25.697,ns_1@10.3.121.32:error_logger:ale_error_logger_handler:log_report:72]
          =========================CRASH REPORT=========================
          crasher:
          initial call: xdc_vbucket_rep:init/1
          pid: <0.7311.2>
          registered_name: []
          exception exit: {http_request_failed,"POST",
          "http://Administrator:*****@10.3.121.38:8092/default%2f907%3bdc8cfd8bf825ca8adece5b7387af2afd/_bulk_docs",
          {error,{error,timeout}

          }}
          in function gen_server:terminate/6
          ancestors: [<0.6821.2>,<0.6816.2>,xdc_replication_sup,ns_server_sup,
          ns_server_cluster_sup,<0.60.0>]
          messages: []
          links: [<0.6821.2>]
          dictionary: []
          trap_exit: true
          status: running
          heap_size: 28657
          stack_size: 24
          reductions: 247791
          neighbours:

      Adding atop commands form the source, seeing no major CPU contention,

      Adding screenshot from destination.
      Adding atop commands from source

      live cluster at : 10.3.121.31 - source, 10.3.121.38-destination

      Adding ns_server logs.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            ketaki Ketaki Gangal (Inactive)
            ketaki Ketaki Gangal (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty