Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7677

[system test] "Target database out of sync" when load items at source

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0.1
    • Component/s: XDCR
    • Security Level: Public
    • Labels:
    • Environment:
      centos 5.8 64bit

      Description

      Environment:

      • Source:
        Each node has 4 core CPU, 4GB RAM
        Install couchbase server 2.0.0-1976 on 2 node cluster
        2 nodes cluster using host name (not IP), one node with default data path and other node with custom data path
        Create 2 bucket, one default (2GB) with one replica and one sasl (1.1GB) bucket with 2 replica
      • Destination:
        Each node has 4 core CPU, 4GB RAM
        Install couchbase server 2.0.0-1976 on 2 node cluster
        2 nodes cluster using IP, one node with default data path and other node with custom data path
        Create 2 bucket, one default (2GB) with one replica and one sasl (1.1GB) bucket with 2 replica

      When load items to sasl bucket at source, I saw error at destination like

      2013-02-04 15:21:59 - Error replicating vbucket 469: <<"Target database out of sync. Try to increase max_dbs_open at the target's server.">>
      2013-02-04 15:21:59 - Error replicating vbucket 329: <<"Target database out of sync. Try to increase max_dbs_open at the target's server.">>
      2013-02-04 15:21:59 - Error replicating vbucket 231: <<"Target database out of sync. Try to increase max_dbs_open at the target's server.">>
      2013-02-04 15:21:56 - Error replicating vbucket 507: <<"Target database out of sync. Try to increase max_dbs_open at the target's server.">>
      2013-02-04 15:21:55 - Error replicating vbucket 487: <<"Target database out of sync. Try to increase max_dbs_open at the target's server.">>
      2013-02-04 15:21:54 - Error replicating vbucket 485: <<"Target database out of sync. Try to increase max_dbs_open at the target's server.">>
      2013-02-04 15:21:53 - Error replicating vbucket 430: <<"Target database out of sync. Try to increase max_dbs_open at the target's server.">>
      2013-02-04 15:21:53 - Error replicating vbucket 233: <<"Target database out of sync. Try to increase max_dbs_open at the target's server.">>
      2013-02-04 15:21:53 - Error replicating vbucket 18: <<"Target database out of sync. Try to increase max_dbs_open at the target's server.">>
      2013-02-04 15:21:49 - Error replicating vbucket 476: <<"Target database out of sync. Try to increase max_dbs_open at the target's server.">>

      Diags will be uploaded soon

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Show
        thuan Thuan Nguyen added a comment - Link to collect info of all nodes https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_1/201302/2nodes-200GA-des-xdcr-database-outofsync-20130204-173825.tgz https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_1/201302/2nodes-200GA-src-xdcr-database-outofsync-20130204-174021.tgz
        Hide
        thuan Thuan Nguyen added a comment -

        The xdcr was created last week

        Replication from bucket "sasl" to bucket "sasl" on cluster "src" created. menelaus_web_create_replication000 ns_1@10.3.3.7 19:04:06 - Wed Jan 30, 2013
        Replication from bucket "default" to bucket "default" on cluster "src" created. menelaus_web_create_replication000 ns_1@10.3.3.7 19:02:44 - Wed Jan 30, 2013

        Show
        thuan Thuan Nguyen added a comment - The xdcr was created last week Replication from bucket "sasl" to bucket "sasl" on cluster "src" created. menelaus_web_create_replication000 ns_1@10.3.3.7 19:04:06 - Wed Jan 30, 2013 Replication from bucket "default" to bucket "default" on cluster "src" created. menelaus_web_create_replication000 ns_1@10.3.3.7 19:02:44 - Wed Jan 30, 2013
        Hide
        ketaki Ketaki Gangal added a comment -

        Hi Junyi,

        We see a lot of these errors on initial replication starting

        • "Target DB out of sync"
        • "Failed to grab remote bucket info"

        Why do we see these errors?
        The destination bucket on most cases is ready ( ie the cluster + bucket has been setup atleast 10 minutes prior to the replication start, and on some occassions the bucket already has data too.)

        Most of the times, the replication eventually recovers, but seeing these errors indicates an underlying call was reaching too soon? and what should users ensure then before starting replication?

        Show
        ketaki Ketaki Gangal added a comment - Hi Junyi, We see a lot of these errors on initial replication starting "Target DB out of sync" "Failed to grab remote bucket info" Why do we see these errors? The destination bucket on most cases is ready ( ie the cluster + bucket has been setup atleast 10 minutes prior to the replication start, and on some occassions the bucket already has data too.) Most of the times, the replication eventually recovers, but seeing these errors indicates an underlying call was reaching too soon? and what should users ensure then before starting replication?
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        Ketaki, do you still see the errors? In your test on EC2 yesterday (MB-7657, Feb 13th), I did not see these errors in logs.

        I do not know if there is any way we can check destination bucket is ready on all nodes. Probably ns_server folks have more inputs.

        Show
        junyi Junyi Xie (Inactive) added a comment - Ketaki, do you still see the errors? In your test on EC2 yesterday ( MB-7657 , Feb 13th), I did not see these errors in logs. I do not know if there is any way we can check destination bucket is ready on all nodes. Probably ns_server folks have more inputs.
        Hide
        ketaki Ketaki Gangal added a comment -

        These are errors , looks like from 2.0 clusters only - different timeouts likely.

        Dont see these anymore on current 2.0.1 runs.

        Show
        ketaki Ketaki Gangal added a comment - These are errors , looks like from 2.0 clusters only - different timeouts likely. Dont see these anymore on current 2.0.1 runs.

          People

          • Assignee:
            ketaki Ketaki Gangal
            Reporter:
            thuan Thuan Nguyen
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes