Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: 2.0
Affects Version/s: 2.0
Component/s: UI, XDCR
Security Level: Public
Labels:
None
Environment:
2.0-1856

Description

Hi,

With the new error logging code, we now display "recent 10 errors". Added a screenshot at end of email.

At any point, the last 10 error are displayed on the replication - 10 errors, which may or may not be valid depending upon the current time.

This issue needs to be addressed at two levels -
1. Level of error logging - Currently too much information is displayed, which also gives misleading idea on state of replication.
2. Classification of errors v/s warnings.

Having lower level information on the ns_logs can help trouble shoot , but having all of that information on the web-console might just confuse and overwhelm end-user IMO.

XDCR can have an error at any of the following levels

xdc vbucket replicators - timing out, checkpoint failures, db_not_found
xdc replication manager
ns_server level - where it is unable to talk to the other remote cluster and so on.

With some recent trials on the new code, we see a lot of errors on the level of bucket replicators, say vbucket XXX commit_checkpoint_failure.
But the replication is continuing as expected. Replication has not failed, but it is continuing minus the above checkpoint failure.

It might be nicer to classify errors v/s warnings.

Errors - When finally xdcr has stopped working . No more data is being sent over to the destination.
Replication will be attempted for X number of times, and is finally given up?

Warnings - When there are timeouts, but it is a recoverable situation.

-Ketaki

Screenshot