Details
-
Bug
-
Resolution: Fixed
-
Minor
-
7.1.0
-
None
-
Untriaged
-
1
-
Unknown
-
KV 2021-Dec
Description
What is the issue?
Currently, couch_dbdump doesn't handle the escape sequences that include '\x' correctly and instead it outputs "
u00ffffff" (unidentified character), which breaks many UTF-8 symbols. I am almost certain that '
u00fffffff' is always outputted for '\x' since, at least for the travel-sample bucket, substituting all its occurrences for '\x' makes the contents of all documents in the couch_dbdump output match the contents of all corresponding documents in the cbriftdump output for a full backup of the same data.
Example:
Instead of outputting
\xe2\x80\x93
|
escape sequence, which stands for "en dash" symbol, couch_dbdump outputs
\\u00ffffffe2\\u00ffffff80\\u00ffffff93
|
Steps to reproduce:
- Set up and configure a cluster with one data node
- Import the travel-sample sample bucket using web UI
- Run something like
couch_dbdump --json ~/cb/source/ns_server/data/n_0/data/travel-sample/*.couch.1 | grep "\\u00ffffff"
to get all of the documents that contain '
u00ffffff'.