Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5611

XDCR: UI should show the reason why replication failed

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Critical
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: 2.0
    • Component/s: XDCR
    • Security Level: Public
    • Labels:
      None

      Description

      Simply showing the XDCR status "failed" is not enough. It would be helpful let users know the reason why some replication failed. We have such logs of errors in XDCR manager. Need to expose to UI.

      the most common 5 errors need to be displayed. QE should provide feedback about what these errors are.

      Abhinav, as we discussed please add the top 5 errors here. In addition to this we have discussed the errors and warnings to raise when buckets are flushed or deleted. (that is being tracked as a separate bug)

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Dipti, we haven't planned this for current sprint. Doesn't feel like big task, but it's not trivial either

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Dipti, we haven't planned this for current sprint. Doesn't feel like big task, but it's not trivial either
        Hide
        ketaki Ketaki Gangal added a comment -

        Add more information on what type of errors to display on the UI.

        Show
        ketaki Ketaki Gangal added a comment - Add more information on what type of errors to display on the UI.
        Hide
        abhinav Abhinav Dangeti added a comment -

        Some common XDCR failure reasons:

        1 db_not_found error: when node is unresponsive, for e.g:
        "could not open http://Administrator:
        *****@10.3.3.28:8092/default%2f120%3b093c0a978eb59342ea52d87eae424bb3/"

        2 badmatch,

        {error,corrupted_data}

        , Erlang-related corruption
        [

        {couch_compress,decompress,1}

        ,

        {couch_doc,with_uncompressed_body,1}

        ,

        {couch_doc,to_json_base64,1}

        ,

        {xdc_vbucket_rep_worker,maybe_flush_docs,3}

        ,

        {lists,foldl,3}

        ,

        {xdc_vbucket_rep_worker,local_process_batch,5}

        ,

        {xdc_vbucket_rep_worker,queue_fetch_loop,4}

        ]

        3 checkpoint_commit_failure
        {bad_return_value,
        {checkpoint_commit_failure,
        <<"Failure on target commit:

        {error,<<\"not_found\">>}

        ">>}}

        4 http_request_failed
        xdc_replicator:handle_info:282] Worker <0.11173.72> died with reason: {http_request_failed,"POST",
        "http://10.3.121.33:8092/default%2F684/_bulk_docs",
        {error,

        {code,500}

        }}

        Replicator: couldn't write document
        xdc_replicator_worker:flush_docs:111] Replicator: couldn't
        write document ``, revision ``,
        to target database `http://10.3.121.33:8092/default%2F683/`. Error: ``, reason: ``.

        5 replicator_died
        {replicator_died, {'EXIT',<15849.2212.0>, {badmatch,{error,closed}}}}

        6 bulk_set_vbucket_state_failed
        General error seen when rebalance fails due to vbucket_map not ready (possibly)
        that may cause replication to fail.

        Show
        abhinav Abhinav Dangeti added a comment - Some common XDCR failure reasons: 1 db_not_found error: when node is unresponsive, for e.g: "could not open http://Administrator: *****@10.3.3.28:8092/default%2f120%3b093c0a978eb59342ea52d87eae424bb3/" 2 badmatch, {error,corrupted_data} , Erlang-related corruption [ {couch_compress,decompress,1} , {couch_doc,with_uncompressed_body,1} , {couch_doc,to_json_base64,1} , {xdc_vbucket_rep_worker,maybe_flush_docs,3} , {lists,foldl,3} , {xdc_vbucket_rep_worker,local_process_batch,5} , {xdc_vbucket_rep_worker,queue_fetch_loop,4} ] 3 checkpoint_commit_failure {bad_return_value, {checkpoint_commit_failure, <<"Failure on target commit: {error,<<\"not_found\">>} ">>}} 4 http_request_failed xdc_replicator:handle_info:282] Worker <0.11173.72> died with reason: {http_request_failed,"POST", "http://10.3.121.33:8092/default%2F684/_bulk_docs", {error, {code,500} }} Replicator: couldn't write document xdc_replicator_worker:flush_docs:111] Replicator: couldn't write document ``, revision ``, to target database ` http://10.3.121.33:8092/default%2F683/ `. Error: ``, reason: ``. 5 replicator_died {replicator_died, {'EXIT',<15849.2212.0>, {badmatch,{error,closed}}}} 6 bulk_set_vbucket_state_failed General error seen when rebalance fails due to vbucket_map not ready (possibly) that may cause replication to fail.
        Hide
        junyi Junyi Xie (Inactive) added a comment -

        We already have a few bugs filed for XDCR error/warning msg on UI, Let's put everything in one place holder (MB-6763).

        Show
        junyi Junyi Xie (Inactive) added a comment - We already have a few bugs filed for XDCR error/warning msg on UI, Let's put everything in one place holder ( MB-6763 ).
        Hide
        junyi Junyi Xie (Inactive) added a comment -
        Show
        junyi Junyi Xie (Inactive) added a comment - MB-6763

          People

          • Assignee:
            dipti Dipti Borkar
            Reporter:
            junyi Junyi Xie (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes