Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7266

XDCR never starts up after deletion and recreation of bucket (with the same name) in a unidirectional scenario

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Incomplete
    • Affects Version/s: 2.0
    • Fix Version/s: 3.0
    • Component/s: ns_server, UI, XDCR
    • Security Level: Public
    • Labels:
      None
    • Environment:
      Ubuntu/Mac/Windows

      Description

      • Create a unidirectional XDCR on a bucket from one cluster to another.
      • On the source, delete the bucket and recreate one other with the same name.
      • On the XDCR tab in the UI, the status of the ongoing replication says "Starting up" but never actually starts replicating.

      A whole bunch of these seen on the source node:

      [xdcr:error,2012-11-26T16:15:43.020,ns_1@10.3.3.150:<0.8215.1>:xdc_vbucket_rep:terminate:284]Shutting xdcr vb replicator ({init_state,
      {rep,
      <<"7554df57af6a5ac799f90c77413e5c88/default/default">>,
      <<"default">>,
      <<"/remoteClusters/7554df57af6a5ac799f90c77413e5c88/buckets/default">>,
      [

      {connection_timeout,180000},
      {continuous,true},
      {http_connections,20},
      {retries,2},
      {socket_options,
      [{keepalive,true},{nodelay,false}]},
      {worker_batch_size,500},
      {worker_processes,4}]},
      187,<0.4677.0>,<0.4678.0>,<0.4674.0>}) down without ever successfully initializing: {db_not_found, <<"default/187">>}
      [xdcr:error,2012-11-26T16:15:43.074,ns_1@10.3.3.150:<0.6622.1>:xdc_vbucket_rep:handle_info:83]Error initializing vb replicator ({init_state,
      {rep,
      <<"7554df57af6a5ac799f90c77413e5c88/default/default">>,
      <<"default">>,
      <<"/remoteClusters/7554df57af6a5ac799f90c77413e5c88/buckets/default">>,
      [{connection_timeout,180000}

      ,

      {continuous,true}

      ,

      {http_connections,20}

      ,

      {retries,2}

      ,
      {socket_options,
      [

      {keepalive,true}

      ,

      {nodelay,false}

      ]},

      {worker_batch_size,500}

      ,

      {worker_processes,4}

      ]},
      176,<0.4677.0>,<0.4678.0>,<0.4674.0>}):{throw,
      {db_not_found,
      <<"default/176">>}}

      If replication is recreated however, replication starts up just fine.
      Solutions:

      • Bucket deletion either not to be allowed with an ongoing replication
      • Upon bucket recreation with the same name, replication to be automatically recreated too.
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        junyi Junyi Xie (Inactive) added a comment -

        ON the source side, if you delete and recreate a bucket with the same name, the XDCR bucket replicator will not be able to initialize successfully since the source bucket UUID changed. All vb replicator initialization will crash.

        debug.1:=========================CRASH REPORT=========================
        debug.1- crasher:
        debug.1- initial call: xdc_vbucket_rep:init/1
        debug.1- pid: <0.8610.0>
        debug.1- registered_name: []
        debug.1- exception exit: {bad_return_value,
        debug.1- {db_not_found,
        debug.1- <<"http://Administrator:*****@10.3.3.61:8092/default%2f670%3b536782493226c7ca2cc53dac86bf84ef/">>}}
        debug.1- in function gen_server:terminate/6
        debug.1- ancestors: [<0.8224.0>,<0.8219.0>,xdc_replication_sup,ns_server_sup,
        debug.1- ns_server_cluster_sup,<0.58.0>]
        debug.1- messages: []
        debug.1- links: [<0.5152.2>,<0.8224.0>]
        debug.1- dictionary: []
        debug.1- trap_exit: true
        debug.1- status: running
        debug.1- heap_size: 75025
        debug.1- stack_size: 24
        debug.1- reductions: 177292
        debug.1- neighbours:
        debug.1- neighbour: [

        {pid,<0.5152.2>}

        ,

        This is probably the reason you see on UI the status "starting up" never changes to "replicating".

        Since it is the UUID, rather than the bucket name, that identifies a bucket, in this sense, XDCR did correctly what it is supposed to do.

        IMO, the fix is on UI: we should not allow users to delete a bucket when there is a live XDC replication originating from that bucket. At least, we should raise some warnings to alert users about the consequence of deleting a bucket during XDCR.

        Show
        junyi Junyi Xie (Inactive) added a comment - ON the source side, if you delete and recreate a bucket with the same name, the XDCR bucket replicator will not be able to initialize successfully since the source bucket UUID changed. All vb replicator initialization will crash. debug.1:=========================CRASH REPORT========================= debug.1- crasher: debug.1- initial call: xdc_vbucket_rep:init/1 debug.1- pid: <0.8610.0> debug.1- registered_name: [] debug.1- exception exit: {bad_return_value, debug.1- {db_not_found, debug.1- <<"http://Administrator:*****@10.3.3.61:8092/default%2f670%3b536782493226c7ca2cc53dac86bf84ef/">>}} debug.1- in function gen_server:terminate/6 debug.1- ancestors: [<0.8224.0>,<0.8219.0>,xdc_replication_sup,ns_server_sup, debug.1- ns_server_cluster_sup,<0.58.0>] debug.1- messages: [] debug.1- links: [<0.5152.2>,<0.8224.0>] debug.1- dictionary: [] debug.1- trap_exit: true debug.1- status: running debug.1- heap_size: 75025 debug.1- stack_size: 24 debug.1- reductions: 177292 debug.1- neighbours: debug.1- neighbour: [ {pid,<0.5152.2>} , – This is probably the reason you see on UI the status "starting up" never changes to "replicating". Since it is the UUID, rather than the bucket name, that identifies a bucket, in this sense, XDCR did correctly what it is supposed to do. IMO, the fix is on UI: we should not allow users to delete a bucket when there is a live XDC replication originating from that bucket. At least, we should raise some warnings to alert users about the consequence of deleting a bucket during XDCR.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Moving to .next as it's not a blocker and thus outside of 2.0.x

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Moving to .next as it's not a blocker and thus outside of 2.0.x
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        Per comment from Alk, mark the fixed version to 2.1 since it is outside 2.0.x

        Show
        junyi Junyi Xie (Inactive) added a comment - Per comment from Alk, mark the fixed version to 2.1 since it is outside 2.0.x
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        I don't have logs to verify anything but description and Junyi's comments are not consistent with my understanding of how things work.

        I.e. uuid is used to identify destination bucket. But description says you recreated bucket on source side.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - I don't have logs to verify anything but description and Junyi's comments are not consistent with my understanding of how things work. I.e. uuid is used to identify destination bucket. But description says you recreated bucket on source side.

          People

          • Assignee:
            alkondratenko Aleksey Kondratenko (Inactive)
            Reporter:
            abhinav Abhinav Dangeti
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes