Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6041

XDC replication keeps on replicating even after replication document is removed

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0
    • Component/s: XDCR
    • Security Level: Public
    • Labels:
      None

      Description

      • create replication
      • upload some data into the source bucket
      • remove the replication (replication document is not present in _replicator/_all_docs anymore)
      • observe that number of items in the destination bucket keeps growing

      seeing in this on current HEAD

      1. ns-diag-20120823192112.txt.bz2
        1.77 MB
        Aliaksey Artamonau
      2. ns-diag-20120727231728.txt.xz
        661 kB
        Aliaksey Artamonau
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        Aliaksey Artamonau Aliaksey Artamonau added a comment -

        I was able to reproduce it by creating two replications from the same bucket on the source to two different buckets on destination. Probably it's not very realistic scenario. But it might uncover an important issue. Will attach diag from the source cluster shortly.

        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - I was able to reproduce it by creating two replications from the same bucket on the source to two different buckets on destination. Probably it's not very realistic scenario. But it might uncover an important issue. Will attach diag from the source cluster shortly.
        Hide
        Aliaksey Artamonau Aliaksey Artamonau added a comment -

        Replications stopped finally stopped several minutes after I removed corresponding replication documents.

        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - Replications stopped finally stopped several minutes after I removed corresponding replication documents.
        Hide
        junyi Junyi Xie (Inactive) added a comment - - edited

        I tried the same setting as yours (1 -> 1 replication, default@node1 -> default@node2, and default@node1 -> default2@node2), and it seems there is nothing wrong.

        From the log below, XDCR replication manager got notified from ns_server instantly after I deleted the replication doc from UI, it instantly shutdown all ongoing bucket replication process, with no delay. And all XDCR activity stopped at source right after that. However, there could be some activity on destination cluster even after XDCR stopped replication on source side, because it may take a while to persist all items in memory to storage. I am not sure if there is any delay between UI stats and the real activity. Also, if both nodes in your test are on the local machine with 1024 vbuckets, it may take longer to finish. I think the delay should be much shorter if we use VMs to conduct the test.

        At this time I am not sure what to fix. I merged some logs for timing purpose, and will ask Ketaki to do the same test on VM. If it is really an issue, we will reopen this bug and investigate the logs from VM.

        [couchdb:info,2012-08-28T14:43:47.255,n_0@127.0.0.1:<0.742.0>:couch_log:info:39]127.0.0.1 - - DELETE /_replicator/1d38c26cdc5c5bb0e6be126e8ae272be%2Fdefault%2Fdefault?rev=1-9ee1a1c9 200
        [xdcr:debug,2012-08-28T14:43:47.257,n_0@127.0.0.1:xdc_rep_manager:xdc_rep_manager:process_update:174]replication doc deleted (docId: <<"1d38c26cdc5c5bb0e6be126e8ae272be/default/default">>), stop all replications
        [xdcr:debug,2012-08-28T14:43:47.258,n_0@127.0.0.1:xdc_rep_manager:xdc_replication_sup:stop_replication:49]all replications for DocId <<"1d38c26cdc5c5bb0e6be126e8ae272be/default/default">> have been stopped

        [ns_server:debug,2012-08-28T14:43:47.259,n_0@127.0.0.1:<0.2113.0>:ns_pubsub:do_subscribe_link:134]Parent process of subscription

        {ns_config_events,<0.2112.0>} exited with reason shutdown
        [ns_server:debug,2012-08-28T14:43:47.260,n_0@127.0.0.1:<0.2113.0>:ns_pubsub:do_subscribe_link:149]Deleting {ns_config_events,<0.2112.0>}

        event handler: ok
        [xdcr:debug,2012-08-28T14:43:47.296,n_0@127.0.0.1:<0.11655.0>:xdc_vbucket_rep_worker:find_missing:121]after conflict resolution at target ("http://Administrator:asdasd@127.0.0.1:9501/default%2f87%3b5816\
        f256233b9dffc119c2c32325a512/"), out of all 396 docs the number of docs we need to replicate is: 396
        [couchdb:info,2012-08-28T14:43:47.304,n_0@127.0.0.1:<0.1858.0>:couch_log:info:39]checkpointing view update at seq 5 for _replicator _design/_replicator_info
        [couchdb:info,2012-08-28T14:43:47.320,n_0@127.0.0.1:<0.1852.0>:couch_log:info:39]127.0.0.1 - - GET /replicator/_design/_replicator_info/_view/infos?group_level=1&=1346179427278 200
        [ns_server:debug,2012-08-28T14:44:00.037,n_0@127.0.0.1:compaction_daemon:compaction_daemon:handle_info:269]Starting compaction for the following buckets:
        [<<"default">>]
        [ns_server:info,2012-08-28T14:44:00.074,n_0@127.0.0.1:<0.13612.0>:compaction_daemon:try_to_cleanup_indexes:439]Cleaning up indexes for bucket `default`
        [ns_server:info,2012-08-28T14:44:00.164,n_0@127.0.0.1:<0.13612.0>:compaction_daemon:spawn_bucket_compactor:404]Compacting bucket default with config:
        [{database_fragmentation_threshold,{30,undefined}},

        Show
        junyi Junyi Xie (Inactive) added a comment - - edited I tried the same setting as yours (1 -> 1 replication, default@node1 -> default@node2, and default@node1 -> default2@node2), and it seems there is nothing wrong. From the log below, XDCR replication manager got notified from ns_server instantly after I deleted the replication doc from UI, it instantly shutdown all ongoing bucket replication process, with no delay. And all XDCR activity stopped at source right after that. However, there could be some activity on destination cluster even after XDCR stopped replication on source side, because it may take a while to persist all items in memory to storage. I am not sure if there is any delay between UI stats and the real activity. Also, if both nodes in your test are on the local machine with 1024 vbuckets, it may take longer to finish. I think the delay should be much shorter if we use VMs to conduct the test. At this time I am not sure what to fix. I merged some logs for timing purpose, and will ask Ketaki to do the same test on VM. If it is really an issue, we will reopen this bug and investigate the logs from VM. [couchdb:info,2012-08-28T14:43:47.255,n_0@127.0.0.1:<0.742.0>:couch_log:info:39] 127.0.0.1 - - DELETE /_replicator/1d38c26cdc5c5bb0e6be126e8ae272be%2Fdefault%2Fdefault?rev=1-9ee1a1c9 200 [xdcr:debug,2012-08-28T14:43:47.257,n_0@127.0.0.1:xdc_rep_manager:xdc_rep_manager:process_update:174] replication doc deleted (docId: <<"1d38c26cdc5c5bb0e6be126e8ae272be/default/default">>), stop all replications [xdcr:debug,2012-08-28T14:43:47.258,n_0@127.0.0.1:xdc_rep_manager:xdc_replication_sup:stop_replication:49] all replications for DocId <<"1d38c26cdc5c5bb0e6be126e8ae272be/default/default">> have been stopped [ns_server:debug,2012-08-28T14:43:47.259,n_0@127.0.0.1:<0.2113.0>:ns_pubsub:do_subscribe_link:134] Parent process of subscription {ns_config_events,<0.2112.0>} exited with reason shutdown [ns_server:debug,2012-08-28T14:43:47.260,n_0@127.0.0.1:<0.2113.0>:ns_pubsub:do_subscribe_link:149] Deleting {ns_config_events,<0.2112.0>} event handler: ok [xdcr:debug,2012-08-28T14:43:47.296,n_0@127.0.0.1:<0.11655.0>:xdc_vbucket_rep_worker:find_missing:121] after conflict resolution at target ("http://Administrator:asdasd@127.0.0.1:9501/default%2f87%3b5816\ f256233b9dffc119c2c32325a512/"), out of all 396 docs the number of docs we need to replicate is: 396 [couchdb:info,2012-08-28T14:43:47.304,n_0@127.0.0.1:<0.1858.0>:couch_log:info:39] checkpointing view update at seq 5 for _replicator _design/_replicator_info [couchdb:info,2012-08-28T14:43:47.320,n_0@127.0.0.1:<0.1852.0>:couch_log:info:39] 127.0.0.1 - - GET / replicator/_design/_replicator_info/_view/infos?group_level=1& =1346179427278 200 [ns_server:debug,2012-08-28T14:44:00.037,n_0@127.0.0.1:compaction_daemon:compaction_daemon:handle_info:269] Starting compaction for the following buckets: [<<"default">>] [ns_server:info,2012-08-28T14:44:00.074,n_0@127.0.0.1:<0.13612.0>:compaction_daemon:try_to_cleanup_indexes:439] Cleaning up indexes for bucket `default` [ns_server:info,2012-08-28T14:44:00.164,n_0@127.0.0.1:<0.13612.0>:compaction_daemon:spawn_bucket_compactor:404] Compacting bucket default with config: [{database_fragmentation_threshold,{30,undefined}},
        Show
        junyi Junyi Xie (Inactive) added a comment - http://review.couchbase.org/#/c/20196/5
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-ns-server-2-0 #456 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/456/)
        MB-6041: add logs to time replication stop (Revision 1b1cf1f99f6e84b0baaa90a9ac2504b46e1d583a)

        Result = SUCCESS
        Junyi Xie :
        Files :

        • src/xdc_rep_manager.erl
        • src/xdc_replication_sup.erl
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-ns-server-2-0 #456 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/456/ ) MB-6041 : add logs to time replication stop (Revision 1b1cf1f99f6e84b0baaa90a9ac2504b46e1d583a) Result = SUCCESS Junyi Xie : Files : src/xdc_rep_manager.erl src/xdc_replication_sup.erl

          People

          • Assignee:
            junyi Junyi Xie (Inactive)
            Reporter:
            Aliaksey Artamonau Aliaksey Artamonau
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes