Copy and paste discussion with Damien
Junyi, can you write jira ticket that documentation and implementation in this area don't agree? Ideally also fix it and get Filipe to review it, but either way we need the ticket to track the discrepancy so that even when fixed when have a record that some versions don't have the behavior for support and historical purposes.
Couchbase CTO | http://damienkatz.com | 510-421-8914
As you said, there was a discussion about it and it was determined that upper (ep_engine, couchstore, or couchdb) layer will check non-utf-8 ids and log them before these ids reach XDCR so XDCR won't see any non-utf8 ids. Thus XDCR itself does not validate and log them. Looks like the validation does happen in CouchDB layer (look at couchdb/src/couchdb/couch_doc:json_id()), but I cannot find where the non-utf8 ids are logged.
If this design does change, seems to me we shall log non-utf8 ids in CouchDB layer, the couch_db:changes_since() function called by XDCR to read changes from CouchDB should filter out and log all non-utf8 ids in CouchDB logs, is that correct?
|For Gerrit Dashboard: &For+MB-8427=message:MB-8427|
|27866,4||MB-8427: filter out non-UTF-8 keys in XDCR and log them||ns_server||Status: ABANDONED||0||0|