Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6240

XDCR replication stops replication ( only 2.3M items of 3.5M are replicated) while/after rebalancing-In one node at Destination cluster.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Major
    • 2.0
    • 2.0
    • XDCR
    • Security Level: Public
    • None
    • 2.0-1587-rel
      Unidirectional replication
      1024 vbuckets, Centos

    Description

      Setup
      1. Setup a unidirectional replication on a 2:2 node , source and destination cluster.
      2. Load 2M items at source, seeing 2M items replicated to destination cluster.
      3. Load 2M items on source, and Start mutating data at source.
      4. Add 1 node on destination cluster, rebalance
      5. Expect 4M items in total on destination.

      Error
      1. Seeing only 2.31M items replicated destination cluster and replication appears to have stopped.

      Logs show mutliple Crash reports, stating "worked died.. timeout error". while connecting to the destination nodes.

        • Reason for termination ==
        • {worker_died,<0.14149.70>,
          {http_request_failed,"POST",
          "http://Administrator:*****@10.3.121.33:8092/default%2f843%3b6faefb8ab82b9b2b9f2ecb8381ce4a94/_revs_diff",
          {error,
          Unknown macro: {error,timeout}

          }}}

      [xdcr:info,2012-08-15T15:53:54.544,ns_1@10.3.121.38:xdc_rep_manager:xdc_rep_manager:handle_info:244]9a3d22328fbbcd0a28e91164dc000b21: replication of vbucket 843 failed due to reason: {worker_died,
      <0.14149.70>,
      {http_request_failed,
      "POST",
      "http://Administrator:*****@10.3.121.33:8092/default%2f843%3b6faefb8ab82b9b2b9f2ecb8381ce4a94/_revs_diff",
      {error,
      {error,
      timeout}}}}
      [xdcr:info,2012-08-15T15:53:54.544,ns_1@10.3.121.38:xdc_rep_manager:xdc_rep_manager:max_concurrent_reps:604]MAX_CONCURRENT_REPS_PER_DOC set to 8
      [error_logger:error,2012-08-15T15:53:54.549,ns_1@10.3.121.38:error_logger:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: xdc_replicator:init/1
      pid: <0.14138.70>
      registered_name: []
      exception exit: {worker_died,<0.14149.70>,
      {http_request_failed,"POST",
      "http://Administrator:*****@10.3.121.33:8092/default%2f843%3b6faefb8ab82b9b2b9f2ecb8381ce4a94/_revs_diff",
      {error,{error,timeout}}}}
      in function gen_server:terminate/6
      ancestors: [xdc_rep_sup,ns_server_sup,ns_server_cluster_sup,<0.60.0>]
      messages: [

      {'EXIT',<0.14145.70>,normal}

      ]
      links: [<0.14148.70>,<0.14151.70>,<0.14153.70>,<0.14150.70>,
      <0.408.0>,<0.14146.70>]
      dictionary: [{task_status_props,
      [

      {checkpointed_source_seq,4722}

      ,

      {continuous,false}

      ,

      {doc_write_failures,0}

      ,

      {docs_read,31}

      ,

      {docs_written,31}

      ,

      {missing_revisions_found,31}

      ,

      {progress,51}

      ,

      {replication_id,<<"3ed992b3af448ef7170c09b2d472ed16">>}

      ,

      {revisions_checked,31}

      ,

      {source,<<"default/843">>}

      ,

      {source_seq,9193}

      ,

      {started_on,1345070646}

      ,

      {target, <<"http://Administrator:*****@10.3.121.33:8092/default%2f843%3b6faefb8ab82b9b2b9f2ecb8381ce4a94/">>}

      ,

      {type,replication}

      ,

      {updated_on,1345070646}

      ]},
      {task_status_update,1345,71234,539370},1000000]
      trap_exit: true
      status: running
      heap_size: 4181
      stack_size: 24
      reductions: 58230
      neighbours:
      neighbour: [

      {pid,<0.18978.70>}

      ,

      {registered_name,[]}

      ,
      {initial_call,{lhttpc_client,request,9}},
      {current_function,{prim_inet,recv0,3}},

      {ancestors,[]}

      ,

      {messages,[]}

      ,

      {links,[<0.14151.70>,#Port<0.2352439>]}

      ,

      {dictionary,[]}

      ,

      {trap_exit,false}

      ,

      {status,waiting}

      ,

      {heap_size,610}

      ,

      {stack_size,36}

      ,

      {reductions,921}

      ]
      neighbour: [

      {pid,<0.18984.70>}

      ,

      Attaching logs from the nodes https://s3.amazonaws.com/bugdb/jira/rebalance_1/temp.tar

      Rebalance completed successfully. 12:05:23

      Started rebalance : 11:56:22

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            junyi Junyi Xie (Inactive)
            ketaki Ketaki Gangal (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty