Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7133

Low XDCR replication rate, seen with high number of gets but very low number of sets

    Details

      Description

      • ~11M items and ~4M items pre-existing on C1 (Contained creates-updates-deletes)
      • Set up unidirectional replication for both buckets from C1 to C2 without any front end loads.
      • After a point very slow replication rate observed::
      • Very high number of gets (expected as item count on C2 is initially zero)
      • Extremely low number of sets

      On C2:
      Noticed: capi_replication:is_missing_rev call
      on <ec2-54-251-5-97.ap-southeast-1.compute.amazonaws.com>:

      [couchdb:error,2012-11-09T1:52:05.139,ns_1@ec2-175-41-185-71.ap-southeast-1.compute.amazonaws.com:<0.19876.1>:couch_log:error:42]Uncaught error in HTTP request: {exit,
                                      {shutdown,
                                       {gen_server,call,
                                        [

      {'ns_memcached-uniXDCR_dest',                                     'ns_1@ec2-175-41-185-71.ap-southeast-1.compute.amazonaws.com'},
                                         {get_meta,<<"C2_U_-b6b33ea1ae541ef8">>,                                     714},
                                         60000]}}}

      Stacktrace: [{gen_server,call,3},
                  {ns_memcached,do_call,3},
                  {capi_replication,get_meta,3},
                  {capi_replication,is_missing_rev,4},
                  {capi_replication,'-get_missing_revs/2-fun-0-',4},
                  {lists,foldr,3},
                  {capi_replication,get_missing_revs,2},
                  {capi_frontend,get_missing_revs,2}]
      [couchdb:error,2012-11-09T1:52:05.128,ns_1@ec2-175-41-185-71.ap-southeast-1.compute.amazonaws.com:<0.16701.1>:couch_log:error:42]Uncaught error in HTTP request: {exit,
                                      {shutdown,
                                       {gen_server,call,
                                        [{'ns_memcached-uniXDCR_dest',                                     'ns_1@ec2-175-41-185-71.ap-southeast-1.compute.amazonaws.com'}

      ,
                                         

      {get_meta,<<"C2_U_-80ad36e893534134">>,                                     673}

      ,
                                         60000]}}}

      XDCR errors:

      2012-11-09 04:19:34 - Error replicating vbucket 599: {http_request_failed, "POST", "http://Administrator:*****@ec2-175-41-184-132.ap-southeast-1.compute.amazonaws.com:8092/biXDCR_bucket%2f599%3b72fc27d81b8fd47c7fa4de3f138da083/_revs_diff", {error,

      {error,timeout}}}

      - beam.smp usage very high on certain nodes on C2:

      <ec2-122-248-221-237.ap-southeast-1.compute.amazonaws.com>

      807 couchbas 20 0 7333m 1.9g 6372 S 261 12.9 383:11.36 beam.smp
      1007 couchbas 20 0 1190m 986m 2960 S 21 6.6 38:25.10 memcached

      KVSTORE

      /opt/couchbase/bin/cbstats 127.0.0.1:11210 kvstore -b biXDCR_bucket | grep fail
      ro:failure_get: 476877
      ro:failure_open: 0
      rw:failure_del: 0
      rw:failure_get: 0
      rw:failure_open: 0
      rw:failure_set: 0
      rw:failure_vbset: 0

      Large number of get-meta failures: expected

      TIMINGS

      /opt/couchbase/bin/cbstats 127.0.0.1:11210 timings -b biXDCR_bucket
      bg_load (19368 total)
      32us - 64us : ( 14.86%) 2879 ########################
      64us - 128us : ( 97.43%) 15991 ##########################################################################################################################################
      128us - 256us : ( 99.99%) 497 ####
      1ms - 2ms : (100.00%) 1
      bg_wait (19368 total)
      16us - 32us : ( 0.01%) 2
      32us - 64us : ( 69.61%) 13481 ####################################################################################################################
      64us - 128us : ( 99.73%) 5833 ##################################################
      128us - 256us : ( 99.90%) 33
      256us - 512us : ( 99.95%) 9
      512us - 1ms : ( 99.97%) 4
      1ms - 2ms : ( 99.99%) 4
      2ms - 4ms : (100.00%) 2
      data_age (536 total)
      0 - 1s : ( 26.87%) 144 #############################################
      1s - 2s : ( 77.80%) 273 ######################################################################################
      2s - 4s : (100.00%) 119 #####################################
      disk_commit (526 total)
      0 - 1s : (100.00%) 526 ##########################################################################################################################################################################
      disk_insert (536 total)
      8us - 16us : ( 1.31%) 7 ##
      16us - 32us : ( 68.47%) 360 ##################################################################################################################
      32us - 64us : ( 99.44%) 166 ####################################################
      64us - 128us : ( 99.63%) 1
      2ms - 4ms : (100.00%) 2
      disk_vbstate_snapshot (307 total)
      1us - 2us : ( 0.33%) 1
      1ms - 2ms : ( 9.77%) 29 ################
      2ms - 4ms : ( 75.90%) 203 ################################################################################################################
      4ms - 8ms : ( 87.30%) 35 ###################
      8ms - 16ms : ( 93.49%) 19 ##########
      16ms - 32ms : ( 97.39%) 12 ######
      32ms - 65ms : ( 97.72%) 1
      65ms - 131ms : ( 98.05%) 1
      131ms - 262ms : ( 98.37%) 1
      262ms - 524ms : ( 98.70%) 1
      524ms - 1s : ( 99.35%) 2 #
      1s - 2s : (100.00%) 2 #
      get_stats_cmd (11540 total)
      64us - 128us : ( 0.02%) 2
      128us - 256us : ( 18.80%) 2167 ###############################
      256us - 512us : ( 34.64%) 1828 ##########################
      512us - 1ms : ( 96.88%) 7183 #########################################################################################################
      1ms - 2ms : ( 99.71%) 326 ####
      2ms - 4ms : ( 99.99%) 33
      4ms - 8ms : (100.00%) 1
      get_vb_cmd (3600 total)
      2us - 4us : ( 2.17%) 78 ###
      4us - 8us : ( 34.28%) 1156 ######################################################
      8us - 16us : ( 99.50%) 2348 ##############################################################################################################
      16us - 32us : ( 99.86%) 13
      32us - 64us : ( 99.94%) 3
      64us - 128us : (100.00%) 2
      notify_io (19368 total)
      2us - 4us : ( 0.02%) 4
      4us - 8us : ( 0.41%) 76
      8us - 16us : ( 87.84%) 16933 ##################################################################################################################################################
      16us - 32us : ( 96.92%) 1758 ###############
      32us - 64us : ( 99.49%) 498 ####
      64us - 128us : ( 99.98%) 96
      128us - 256us : ( 99.99%) 1
      256us - 512us : ( 99.99%) 1
      1ms - 2ms : (100.00%) 1
      storage_age (536 total)
      0 - 1s : ( 99.81%) 535 #########################################################################################################################################################################
      1s - 2s : (100.00%) 1
      tap_mutation (281 total)
      4us - 8us : ( 2.14%) 6 ###
      8us - 16us : ( 4.98%) 8 ####
      16us - 32us : ( 58.72%) 151 ###########################################################################################
      32us - 64us : ( 92.17%) 94 ########################################################
      64us - 128us : ( 93.59%) 4 ##
      512us - 1ms : (100.00%) 18 ##########

      KVTIMINGS


      /opt/couchbase/bin/cbstats 127.0.0.1:11210 kvtimings -b biXDCR_bucket
      rw:commit (67884 total)
      1ms - 2ms : ( 18.17%) 12332 ##############################
      2ms - 4ms : ( 55.91%) 25621 ###############################################################
      4ms - 8ms : ( 67.48%) 7854 ###################
      8ms - 16ms : ( 79.01%) 7831 ###################
      16ms - 32ms : ( 91.72%) 8623 #####################
      32ms - 65ms : ( 98.55%) 4637 ###########
      65ms - 131ms : ( 99.47%) 626 #
      131ms - 262ms : ( 99.94%) 319
      262ms - 524ms : (100.00%) 38
      524ms - 1s : (100.00%) 3
      rw:commitRetry (7 total)
      4ms - 8ms : ( 57.14%) 4 ##################################################################################################
      8ms - 16ms : ( 71.43%) 1 ########################
      32ms - 65ms : ( 85.71%) 1 ########################
      131ms - 262ms : (100.00%) 1 ########################
      rw:delete (8618 total)
      1us - 2us : ( 0.01%) 1
      2us - 4us : ( 1.47%) 126 ##
      4us - 8us : ( 17.24%) 1359 ##########################
      8us - 16us : ( 54.86%) 3242 ###############################################################
      16us - 32us : ( 92.49%) 3243 ###############################################################
      32us - 64us : ( 98.17%) 489 #########
      64us - 128us : ( 99.58%) 122 ##
      128us - 256us : ( 99.91%) 28
      256us - 512us : (100.00%) 8
      rw:fsReadSeek (177413 total)
      8KB - 16KB : ( 46.33%) 82193 #############################################################################
      16KB - 32KB : ( 52.21%) 10437 #########
      32KB - 64KB : ( 56.80%) 8135 #######
      64KB - 128KB : ( 62.96%) 10936 ##########
      128KB - 256KB : ( 71.70%) 15504 ##############
      256KB - 512KB : ( 81.97%) 18221 #################
      512KB - 1MB : ( 92.02%) 17833 ################
      1MB - 2MB : ( 98.63%) 11729 ###########
      2MB - 4MB : ( 99.98%) 2385 ##
      4MB - 8MB : (100.00%) 40
      rw:fsReadSize (246025 total)
      8KB - 16KB : (100.00%) 246025 #######################################################################################################################################################################
      rw:fsReadTime (246025 total)
      0 - 1us : ( 0.00%) 1
      1us - 2us : ( 0.00%) 1
      2us - 4us : ( 2.33%) 5734 ###
      4us - 8us : ( 82.65%) 197604 ######################################################################################################################################
      8us - 16us : ( 96.47%) 34008 #######################
      16us - 32us : ( 99.78%) 8124 #####
      32us - 64us : ( 99.86%) 197
      64us - 128us : ( 99.89%) 79
      128us - 256us : ( 99.90%) 43
      256us - 512us : ( 99.92%) 32
      512us - 1ms : ( 99.94%) 62
      1ms - 2ms : ( 99.96%) 34
      2ms - 4ms : ( 99.97%) 23
      4ms - 8ms : ( 99.98%) 39
      8ms - 16ms : ( 99.99%) 30
      16ms - 32ms : (100.00%) 6
      32ms - 65ms : (100.00%) 2
      65ms - 131ms : (100.00%) 3
      131ms - 262ms : (100.00%) 3
      rw:fsSyncTime (137636 total)
      8us - 16us : ( 0.01%) 18
      16us - 32us : ( 0.01%) 2
      128us - 256us : ( 0.02%) 4
      256us - 512us : ( 11.73%) 16118 ###################
      512us - 1ms : ( 16.31%) 6310 #######
      1ms - 2ms : ( 62.16%) 63105 #############################################################################
      2ms - 4ms : ( 73.34%) 15381 ##################
      4ms - 8ms : ( 81.10%) 10686 #############
      8ms - 16ms : ( 91.09%) 13746 ################
      16ms - 32ms : ( 98.26%) 9873 ############
      32ms - 65ms : ( 99.47%) 1669 ##
      65ms - 131ms : ( 99.80%) 452
      131ms - 262ms : ( 99.98%) 240
      262ms - 524ms : (100.00%) 31
      524ms - 1s : (100.00%) 1
      rw:fsWriteSize (520492 total)
      8 - 16 : ( 13.22%) 68818 ######################
      32 - 64 : ( 13.30%) 412
      64 - 128 : ( 27.07%) 71655 ######################
      128 - 256 : ( 28.90%) 9525 ###
      256 - 512 : ( 39.01%) 52631 ################
      512 - 1KB : ( 56.81%) 92627 #############################
      1KB - 2KB : ( 86.02%) 152041 ################################################
      2KB - 4KB : ( 95.48%) 49238 ###############
      4KB - 8KB : ( 98.18%) 14090 ####
      8KB - 16KB : ( 99.33%) 5987 #
      16KB - 32KB : ( 99.65%) 1652
      32KB - 64KB : ( 99.78%) 651
      64KB - 128KB : ( 99.91%) 710
      128KB - 256KB : (100.00%) 455
      rw:fsWriteTime (520492 total)
      1us - 2us : ( 0.00%) 1
      2us - 4us : ( 0.01%) 41
      4us - 8us : ( 51.50%) 268024 #####################################################################################
      8us - 16us : ( 67.09%) 81127 ##########################
      16us - 32us : ( 93.26%) 136226 ###########################################
      32us - 64us : ( 98.59%) 27720 ########
      64us - 128us : ( 98.87%) 1446
      128us - 256us : ( 99.10%) 1245
      256us - 512us : ( 99.30%) 1036
      512us - 1ms : ( 99.48%) 928
      1ms - 2ms : ( 99.71%) 1212
      2ms - 4ms : ( 99.80%) 450
      4ms - 8ms : ( 99.91%) 546
      8ms - 16ms : ( 99.98%) 384
      16ms - 32ms : ( 99.99%) 76
      32ms - 65ms : (100.00%) 7
      65ms - 131ms : (100.00%) 12
      131ms - 262ms : (100.00%) 7
      262ms - 524ms : (100.00%) 4
      rw:save_documents (67884 total)
      16us - 32us : ( 0.17%) 117
      32us - 64us : ( 0.38%) 144
      64us - 128us : ( 8.84%) 5740 ##############
      128us - 256us : ( 54.66%) 31104 ############################################################################
      256us - 512us : ( 84.47%) 20235 ##################################################
      512us - 1ms : ( 94.25%) 6642 ################
      1ms - 2ms : ( 96.87%) 1778 ####
      2ms - 4ms : ( 98.32%) 984 ##
      4ms - 8ms : ( 99.33%) 683 #
      8ms - 16ms : ( 99.89%) 379
      16ms - 32ms : ( 99.97%) 61
      32ms - 65ms : ( 99.98%) 6
      65ms - 131ms : ( 99.99%) 4
      131ms - 262ms : (100.00%) 4
      262ms - 524ms : (100.00%) 3
      rw:writeSize (760509 total)
      256 - 512 : ( 81.55%) 620167 ########################################################################################################################################
      512 - 1KB : (100.00%) 140342 ##############################
      rw:writeTime (760509 total)
      1us - 2us : ( 0.05%) 371
      2us - 4us : ( 10.16%) 76859 ################
      4us - 8us : ( 34.38%) 184200 ########################################
      8us - 16us : ( 65.51%) 236794 ###################################################
      16us - 32us : ( 88.58%) 175421 ######################################
      32us - 64us : ( 97.94%) 71214 ###############
      64us - 128us : ( 99.60%) 12588 ##
      128us - 256us : ( 99.95%) 2685
      256us - 512us : (100.00%) 360
      512us - 1ms : (100.00%) 17


      Reproduced this in ~15min on a 2:2 cluster set up on local VMs
      C1: 10.1.3.235, 10.1.3.236
      C2: 10.1.3.237, 10.1.3.238

      bucket: C1 --> C2

      - Load ~1M items on C1.
      - Stop load
      - Start replication from C1 to C2.
      - After a point a number of these errors seen in XDCR tab:
      2012-11-08 20:17:17 - Error replicating vbucket 552: {http_request_failed, "POST", "http://Administrator:*****@10.1.3.238:8092/bucket%2f552%3bb77a081e2e609bc48c228fab8b3c3161/_revs_diff", {error,{error,timeout}

      }}

      • Number of sets became extremely low.
      1. 10.1.3.235-8091-diag.txt.gz
        2.00 MB
        Abhinav Dangeti
      2. 10.1.3.236-8091-diag.txt.gz
        2.17 MB
        Abhinav Dangeti
      3. 10.1.3.237-8091-diag.txt.gz
        2.09 MB
        Abhinav Dangeti
      4. 10.1.3.238-8091-diag.txt.gz
        2.22 MB
        Abhinav Dangeti
      1. Screen Shot 2012-11-08 at 7.42.40 PM.png
        92 kB
      2. Screen Shot 2012-11-14 at 4.37.38 PM.png
        181 kB
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        dipti Dipti Borkar added a comment -

        what impact will users see with this scenario change? will performance of specific scenarios will be impacted?

        Show
        dipti Dipti Borkar added a comment - what impact will users see with this scenario change? will performance of specific scenarios will be impacted?
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        Dipti, It should be no much impact to users. Today the batch size of XDCR is 500 docs and 512K, assuming the average size of doc is 1K. By Fashid, it make more sense to assume average of doc is 4K, thus we increase the batch size to 2048K. For most cases users should not see much difference, users may see some difference for big doc workload. It should be improvement since we batch more than before.

        Show
        junyi Junyi Xie (Inactive) added a comment - Dipti, It should be no much impact to users. Today the batch size of XDCR is 500 docs and 512K, assuming the average size of doc is 1K. By Fashid, it make more sense to assume average of doc is 4K, thus we increase the batch size to 2048K. For most cases users should not see much difference, users may see some difference for big doc workload. It should be improvement since we batch more than before.
        Hide
        pavelpaulau Pavel Paulau added a comment - - edited

        Observing performance regression in recent builds.

        Growing XDCR queue is the case again, even on EC2.

        Show
        pavelpaulau Pavel Paulau added a comment - - edited Observing performance regression in recent builds. Growing XDCR queue is the case again, even on EC2.
        Hide
        ketaki Ketaki Gangal added a comment - - edited

        Pavel, what build did you run the latest tests on? Could you describe the workload/test.

        Also do we see higher gets v/s sets as well on the run?

        Show
        ketaki Ketaki Gangal added a comment - - edited Pavel, what build did you run the latest tests on? Could you describe the workload/test. Also do we see higher gets v/s sets as well on the run?
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        Pavel, please open a different bug and attach detailed problem description, diag logs and steps to reproduce.

        Show
        junyi Junyi Xie (Inactive) added a comment - Pavel, please open a different bug and attach detailed problem description, diag logs and steps to reproduce.

          People

          • Assignee:
            junyi Junyi Xie (Inactive)
            Reporter:
            abhinav Abhinav Dangeti
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes