Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6919

Possibly incorrect OUTBOUND XDCR OPERATIONS seen on source cluster, very slow replication (very less number of sets) observed with number of items > 200M

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0
    • Component/s: XDCR
    • Security Level: Public
    • Labels:
      None
    • Environment:
      64bit ec2-ubuntu-12.04 LTS
      Build 1844

      Description

      Cluster set up:
      c1 : c2 :: 10 : 10

      sbucket: c1 -> c2
      default: c2 -> c1

      >> Replication set up with continuous front end load
      >> Front end load for default = ~10K ops per sec
      >> Front end load for sbucket = ~4-5K ops per sec
      >> Average replication seen on c1 (for default): ~12-14K ops per sec
      >> Average replication seen on c2 (for sbucket): ~15-18K ops per sec

      At a particular snapshot, on C1:

      {With same amount of load (mixed), on bucket "sbucket"}

      No. of items: 214M
      No. of items in replication queue: 136M (way too high)
      Secs in replicating = 0 (!?)
      Secs in checkpointing = 385 (!?)
      Checkpoints issued = 79 (!?)

      These stats being for a cluster that's been up running with continuous load and replication for ~65hrs.

      Also seen on the destination C2:

      {on bucket "sbucket"}

      Gets per sec: 19.2K
      Sets per sec: 347 (seems very low)

      Also seeing a number of these errors on the XDCR tab on the source:

      2012-10-15 19:17:50 - Error replicating vbucket 397: {http_request_failed, "POST", "http://Administrator:*****@ec2-175-41-177-173.ap-southeast-1.compute.amazonaws.com:8092/sbucket%2f397%3bc8731525718bcbdd0bf0382e420c453f/_revs_diff", {error,

      {error,timeout}}}

      2012-10-15 19:17:50 - Error replicating vbucket 381: {http_request_failed, "POST", "http://Administrator:*****@ec2-175-41-177-173.ap-southeast-1.compute.amazonaws.com:8092/sbucket%2f381%3bc8731525718bcbdd0bf0382e420c453f/_revs_diff", {error,{error,timeout}

      }}

      ....

      Load on sbucket with mcsoda:
      lib/perf_engines/mcsoda.py sbucket@ec2-50-18-140-172.us-west-1.compute.amazonaws.com:11211 vbuckets=1024 doc-gen=0 doc-cache=0 ratio-creates=1 ratio-sets=1 ratio-expirations=0.03 expiration=60 ratio-deletes=0.5 min-value-size=1000 threads=30 max-items=100000000 exit-after-creates=2 prefix=KEY1_ max-creates=100000000

      Load on default with cbworkloadgen:
      /opt/couchbase/bin/tools/cbworkloadgen -n ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com:8091 -r .7 -i 400000000 -s 128 -t 30 -p KEY3_

      Killed the front end load on both the buckets right now, replication seen catching up.
      Live clusters:
      c1: http://ec2-50-18-140-172.us-west-1.compute.amazonaws.com:8091/
      c2: http://ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com:8091/

      Attached grabbed diags for the orchestrator on c1.

      1. Screen Shot 2012-10-15 at 11.49.00 AM.png
        96 kB
      2. Screen Shot 2012-10-15 at 11.49.29 AM.png
        98 kB
      3. Screen Shot 2012-10-16 at 2.23.33 PM.png
        114 kB
      4. Screen Shot 2012-10-16 at 2.25.04 PM.png
        57 kB
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        abhinav Abhinav Dangeti created issue -
        abhinav Abhinav Dangeti made changes -
        Field Original Value New Value
        Attachment ns-diag-20121015185325.txt.zip [ 15433 ]
        Description Cluster set up:
        c1 : c2 :: 10 : 10

        sbucket: c1 -> c2
        default: c2 -> c1

        >> Replication set up with continuous front end load
        >> Front end load for default = ~10K ops per sec
        >> Front end load for sbucket = ~4-5K ops per sec
        >> Average replication seen on c1 (for default): ~12-14K ops per sec
        >> Average replication seen on c2 (for sbucket): ~15-18K ops per sec

        At a particular snapshot, on C1:
        {With same amount of load (mixed), on bucket "sbucket"}

        No. of items: 214M
        No. of items in replication queue: 136M (way too high)
        Secs in replicating = 0 (!?)
        Secs in checkpointing = 385 (!?)
        Checkpoints issued = 79 (!?)

        These stats being for a cluster that's been up running with continuous load and replication for ~65hrs.

        Also seen on the destination C2:
        {on bucket "sbucket"}
        Gets per sec: 19.2K
        Sets per sec: 347 (seems very low)

        Also seeing a number of these errors on the XDCR tab on the source:

        2012-10-15 19:17:50 - Error replicating vbucket 397: {http_request_failed, "POST", "http://Administrator:*****@ec2-175-41-177-173.ap-southeast-1.compute.amazonaws.com:8092/sbucket%2f397%3bc8731525718bcbdd0bf0382e420c453f/_revs_diff", {error,{error,timeout}}}

        2012-10-15 19:17:50 - Error replicating vbucket 381: {http_request_failed, "POST", "http://Administrator:*****@ec2-175-41-177-173.ap-southeast-1.compute.amazonaws.com:8092/sbucket%2f381%3bc8731525718bcbdd0bf0382e420c453f/_revs_diff", {error,{error,timeout}}}

        ....

        Load on sbucket with mcsoda:
         lib/perf_engines/mcsoda.py sbucket@ec2-50-18-140-172.us-west-1.compute.amazonaws.com:11211 vbuckets=1024 doc-gen=0 doc-cache=0 ratio-creates=1 ratio-sets=1 ratio-expirations=0.03 expiration=60 ratio-deletes=0.5 min-value-size=1000 threads=30 max-items=100000000 exit-after-creates=2 prefix=KEY1_ max-creates=100000000

        Load on default with cbworkloadgen:
        /opt/couchbase/bin/tools/cbworkloadgen -n ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com:8091 -r .7 -i 400000000 -s 128 -t 30 -p KEY3_

        Cluster set up:
        c1 : c2 :: 10 : 10

        sbucket: c1 -> c2
        default: c2 -> c1

        >> Replication set up with continuous front end load
        >> Front end load for default = ~10K ops per sec
        >> Front end load for sbucket = ~4-5K ops per sec
        >> Average replication seen on c1 (for default): ~12-14K ops per sec
        >> Average replication seen on c2 (for sbucket): ~15-18K ops per sec

        At a particular snapshot, on C1:
        {With same amount of load (mixed), on bucket "sbucket"}

        No. of items: 214M
        No. of items in replication queue: 136M (way too high)
        Secs in replicating = 0 (!?)
        Secs in checkpointing = 385 (!?)
        Checkpoints issued = 79 (!?)

        These stats being for a cluster that's been up running with continuous load and replication for ~65hrs.

        Also seen on the destination C2:
        {on bucket "sbucket"}
        Gets per sec: 19.2K
        Sets per sec: 347 (seems very low)

        Also seeing a number of these errors on the XDCR tab on the source:

        2012-10-15 19:17:50 - Error replicating vbucket 397: {http_request_failed, "POST", "http://Administrator:*****@ec2-175-41-177-173.ap-southeast-1.compute.amazonaws.com:8092/sbucket%2f397%3bc8731525718bcbdd0bf0382e420c453f/_revs_diff", {error,{error,timeout}}}

        2012-10-15 19:17:50 - Error replicating vbucket 381: {http_request_failed, "POST", "http://Administrator:*****@ec2-175-41-177-173.ap-southeast-1.compute.amazonaws.com:8092/sbucket%2f381%3bc8731525718bcbdd0bf0382e420c453f/_revs_diff", {error,{error,timeout}}}

        ....

        Load on sbucket with mcsoda:
         lib/perf_engines/mcsoda.py sbucket@ec2-50-18-140-172.us-west-1.compute.amazonaws.com:11211 vbuckets=1024 doc-gen=0 doc-cache=0 ratio-creates=1 ratio-sets=1 ratio-expirations=0.03 expiration=60 ratio-deletes=0.5 min-value-size=1000 threads=30 max-items=100000000 exit-after-creates=2 prefix=KEY1_ max-creates=100000000

        Load on default with cbworkloadgen:
        /opt/couchbase/bin/tools/cbworkloadgen -n ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com:8091 -r .7 -i 400000000 -s 128 -t 30 -p KEY3_

        Attached grabbed diags for the orchestrator on c1.
        abhinav Abhinav Dangeti made changes -
        Fix Version/s 2.0 [ 10114 ]
        Fix Version/s 2.0-beta-2 [ 10385 ]
        Affects Version/s 2.0 [ 10114 ]
        Affects Version/s 2.0-beta-2 [ 10385 ]
        abhinav Abhinav Dangeti made changes -
        Priority Major [ 3 ] Critical [ 2 ]
        abhinav Abhinav Dangeti made changes -
        Description Cluster set up:
        c1 : c2 :: 10 : 10

        sbucket: c1 -> c2
        default: c2 -> c1

        >> Replication set up with continuous front end load
        >> Front end load for default = ~10K ops per sec
        >> Front end load for sbucket = ~4-5K ops per sec
        >> Average replication seen on c1 (for default): ~12-14K ops per sec
        >> Average replication seen on c2 (for sbucket): ~15-18K ops per sec

        At a particular snapshot, on C1:
        {With same amount of load (mixed), on bucket "sbucket"}

        No. of items: 214M
        No. of items in replication queue: 136M (way too high)
        Secs in replicating = 0 (!?)
        Secs in checkpointing = 385 (!?)
        Checkpoints issued = 79 (!?)

        These stats being for a cluster that's been up running with continuous load and replication for ~65hrs.

        Also seen on the destination C2:
        {on bucket "sbucket"}
        Gets per sec: 19.2K
        Sets per sec: 347 (seems very low)

        Also seeing a number of these errors on the XDCR tab on the source:

        2012-10-15 19:17:50 - Error replicating vbucket 397: {http_request_failed, "POST", "http://Administrator:*****@ec2-175-41-177-173.ap-southeast-1.compute.amazonaws.com:8092/sbucket%2f397%3bc8731525718bcbdd0bf0382e420c453f/_revs_diff", {error,{error,timeout}}}

        2012-10-15 19:17:50 - Error replicating vbucket 381: {http_request_failed, "POST", "http://Administrator:*****@ec2-175-41-177-173.ap-southeast-1.compute.amazonaws.com:8092/sbucket%2f381%3bc8731525718bcbdd0bf0382e420c453f/_revs_diff", {error,{error,timeout}}}

        ....

        Load on sbucket with mcsoda:
         lib/perf_engines/mcsoda.py sbucket@ec2-50-18-140-172.us-west-1.compute.amazonaws.com:11211 vbuckets=1024 doc-gen=0 doc-cache=0 ratio-creates=1 ratio-sets=1 ratio-expirations=0.03 expiration=60 ratio-deletes=0.5 min-value-size=1000 threads=30 max-items=100000000 exit-after-creates=2 prefix=KEY1_ max-creates=100000000

        Load on default with cbworkloadgen:
        /opt/couchbase/bin/tools/cbworkloadgen -n ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com:8091 -r .7 -i 400000000 -s 128 -t 30 -p KEY3_

        Attached grabbed diags for the orchestrator on c1.
        Cluster set up:
        c1 : c2 :: 10 : 10

        sbucket: c1 -> c2
        default: c2 -> c1

        >> Replication set up with continuous front end load
        >> Front end load for default = ~10K ops per sec
        >> Front end load for sbucket = ~4-5K ops per sec
        >> Average replication seen on c1 (for default): ~12-14K ops per sec
        >> Average replication seen on c2 (for sbucket): ~15-18K ops per sec

        At a particular snapshot, on C1:
        {With same amount of load (mixed), on bucket "sbucket"}

        No. of items: 214M
        No. of items in replication queue: 136M (way too high)
        Secs in replicating = 0 (!?)
        Secs in checkpointing = 385 (!?)
        Checkpoints issued = 79 (!?)

        These stats being for a cluster that's been up running with continuous load and replication for ~65hrs.

        Also seen on the destination C2:
        {on bucket "sbucket"}
        Gets per sec: 19.2K
        Sets per sec: 347 (seems very low)

        Also seeing a number of these errors on the XDCR tab on the source:

        2012-10-15 19:17:50 - Error replicating vbucket 397: {http_request_failed, "POST", "http://Administrator:*****@ec2-175-41-177-173.ap-southeast-1.compute.amazonaws.com:8092/sbucket%2f397%3bc8731525718bcbdd0bf0382e420c453f/_revs_diff", {error,{error,timeout}}}

        2012-10-15 19:17:50 - Error replicating vbucket 381: {http_request_failed, "POST", "http://Administrator:*****@ec2-175-41-177-173.ap-southeast-1.compute.amazonaws.com:8092/sbucket%2f381%3bc8731525718bcbdd0bf0382e420c453f/_revs_diff", {error,{error,timeout}}}

        ....

        Load on sbucket with mcsoda:
         lib/perf_engines/mcsoda.py sbucket@ec2-50-18-140-172.us-west-1.compute.amazonaws.com:11211 vbuckets=1024 doc-gen=0 doc-cache=0 ratio-creates=1 ratio-sets=1 ratio-expirations=0.03 expiration=60 ratio-deletes=0.5 min-value-size=1000 threads=30 max-items=100000000 exit-after-creates=2 prefix=KEY1_ max-creates=100000000

        Load on default with cbworkloadgen:
        /opt/couchbase/bin/tools/cbworkloadgen -n ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com:8091 -r .7 -i 400000000 -s 128 -t 30 -p KEY3_

        Killed the front end load on both the buckets right now, replication seen catching up.
        Live clusters:
        c1: http://ec2-50-18-140-172.us-west-1.compute.amazonaws.com:8091/
        c2: http://ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com:8091/

        Attached grabbed diags for the orchestrator on c1.
        ketaki Ketaki Gangal made changes -
        junyi Junyi Xie (Inactive) made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        farshid Farshid Ghods (Inactive) made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            junyi Junyi Xie (Inactive)
            Reporter:
            abhinav Abhinav Dangeti
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes