Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7757

Growing queue on 2 of 11 nodes on a xdcr-cluster. Seeing constant replicator crash on xdcr-cluster

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Duplicate
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.0.1
    • Component/s: XDCR
    • Security Level: Public
    • Labels:
      None
    • Environment:
      2.0.1-153-rel
      1 bucket, bidirectional replication.

      Description

      System Test

      Front end load @ 10k on both the clusters.
      Bidirectional replication on-going on the cluster.

      Deleted replication and recreated replication after a few hours, replication started from last checkpoints.

      Most of the nodes, except 2 have successfully replicated data.

      SAmple output from logs
      debug.4:=========================CRASH REPORT=========================
      debug.4- crasher:
      debug.4- initial call: xdc_vbucket_rep:init/1
      debug.4- pid: <0.4116.455>
      debug.4- registered_name: []
      debug.4- exception exit: {function_clause,
      debug.4- [{couch_api_wrap,'get_missing_revs/2-fun-1',
      debug.4- [400,
      debug.4- [

      {"Cache-Control","must-revalidate"}

      ,
      debug.4-

      {"Content-Length","50"}

      ,
      debug.4-

      {"Content-Type","application/json"}

      ,

      Adding more logs from both the clusters.

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        junyi Junyi Xie (Inactive) added a comment - - edited

        Still do not see any skew or growing mutations to replicate as of now. The disk write queue, however is highly skewed.

        I am not sure why it is a bug.

        Show
        junyi Junyi Xie (Inactive) added a comment - - edited Still do not see any skew or growing mutations to replicate as of now. The disk write queue, however is highly skewed. I am not sure why it is a bug.
        Hide
        ketaki Ketaki Gangal added a comment -

        Hi Junyi,

        The errors on replication were seen on the intial xdcr-start load phase. The system tests run the next phases w/ a smaller mutation rate and likely we will not see the same skew on this run again.

        The XDCR queue is very small < 10K and < 1k on both the clusters. Likely one would not see unevenness on this state-cluster.

        A lot of these XDCR errors are seen at particular points in replication and not continuously throughout the replication for most tests.

        -Ketaki

        Show
        ketaki Ketaki Gangal added a comment - Hi Junyi, The errors on replication were seen on the intial xdcr-start load phase. The system tests run the next phases w/ a smaller mutation rate and likely we will not see the same skew on this run again. The XDCR queue is very small < 10K and < 1k on both the clusters. Likely one would not see unevenness on this state-cluster. A lot of these XDCR errors are seen at particular points in replication and not continuously throughout the replication for most tests. -Ketaki
        Hide
        jin Jin Lim added a comment - - edited

        Per bug scrubs, Ketaki to restart the test again and ping Junyi for when he should start monitor this issue.

        Show
        jin Jin Lim added a comment - - edited Per bug scrubs, Ketaki to restart the test again and ping Junyi for when he should start monitor this issue.
        Hide
        ketaki Ketaki Gangal added a comment -

        Duplicate of MB-7657

        Show
        ketaki Ketaki Gangal added a comment - Duplicate of MB-7657
        Hide
        maria Maria McDuff (Inactive) added a comment -

        closing as dupes.

        Show
        maria Maria McDuff (Inactive) added a comment - closing as dupes.

          People

          • Assignee:
            ketaki Ketaki Gangal
            Reporter:
            ketaki Ketaki Gangal
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes