Description
When performing a backup with cbbackup, we stream the documents including xattrs (because, of course, these need backing up too). However we store these directly in the val column of the sqlite cbb_msg table.
In some cases, this seems to get stored in the rough format:
NULL ... NULL XATTR_NAME NULL {XATTR} NULL {BODY}
|
This seems in line with the binary protocol format.
In fact, when stored in this way, the restore seems to work fine for some keys (more on these below. There are cases though where the padding seems different, which leads to the document being restored as binary.
Steps to reproduce
- Create a cluster with two buckets, and load one with data including xattrs (I've used SG as a real world example here).
- Picking a key (key_123) we can see that it has both a body and xattr:
subdoc> get key_123 -x _sync
key_123 CAS=0x1507d50c13e60000
0. Size=327, RC=0x00 Success (Not an error)
{"rev":"2-7a40fa2b51531d65895b0f64d652bd20","sequence":391,"recent_sequences":[125,391],"history":{"revs":["1-ca9ad22802b66f662ff171f226211d5c","2-7a40fa2b51531d65895b0f64d652bd20"],"parents":[-1,0],"channels":[null,["04070"]]},"channels":{"04070":null},"cas":"0x0000e6130cd50715","time_saved":"2018-01-08T12:20:48.819534016Z"}
1. Size=22, RC=0x00 Success (Not an error)
{"channels":["04070"]}
- Perform a backup of the bucket:
$ /opt/couchbase/bin/cbbackup http://localhost:8091/ /vagrant/backup -b sg1 -u Administrator -p password
- Restore the backup to the second bucket:
$ /opt/couchbase/bin/cbrestore /vagrant/backup/2018-01-08T122403Z/2018-01-08T122403Z-full/ http://localhost:8091/ -b sg1 -B sg2 -u Administrator -p password
- Verify the document in the second bucket - note that the xattr (_sync) no longer exists. Also note the spurious RN:
subdoc> get key_123 -x _sync
key_123 CAS=0x1507d50c13e60000
0. Size=0, RC=0x3f Sub-document path does not exist
1. Size=364, RC=0x00 Success (Not an error)
RN_sync{"rev":"2-7a40fa2b51531d65895b0f64d652bd20","sequence":391,"recent_sequences":[125,391],"history":{"revs":["1-ca9ad22802b66f662ff171f226211d5c","2-7a40fa2b51531d65895b0f64d652bd20"],"parents":[-1,0],"channels":[null,["04070"]]},"channels":{"04070":null},"cas":"0x0000e6130cd50715","time_saved":"2018-01-08T12:20:48.819534016Z"}{"channels":["04070"]}
Perhaps more clearly seen comparing the output couch_dbdump on both buckets, where we can also see that it's now of raw datatype:
# /opt/couchbase/bin/couch_dbdump --key key_123 /opt/couchbase/var/lib/couchbase/data/sg1/65.couch.1
Dumping "/opt/couchbase/var/lib/couchbase/data/sg1/65.couch.1":
Doc ID: key_123
seq: 2
rev: 2
content_meta: 128
size (on disk): 312
cas: 1515414047483625472, expiry: 0, flags: 0, datatype: 0x05 (json,xattr)
size: 364
xattrs: {"_sync":{"rev":"2-7a40fa2b51531d65895b0f64d652bd20","sequence":391,"recent_sequences":[125,391],"history":{"revs":["1-ca9ad22802b66f662ff171f226211d5c","2-7a40fa2b51531d65895b0f64d652bd20"],"parents":[-1,0],"channels":[null,["04070"]]},"channels":{"04070":null},"cas":"0x0000e6130cd50715","time_saved":"2018-01-08T12:20:48.819534016Z"}}
data: (snappy) {"channels":["04070"]}
# /opt/couchbase/bin/couch_dbdump --key key_123 /opt/couchbase/var/lib/couchbase/data/sg2/65.couch.1
Dumping "/opt/couchbase/var/lib/couchbase/data/sg2/65.couch.1":
Doc ID: key_123
seq: 1
rev: 2
content_meta: 131
size (on disk): 312
cas: 1515414047483625472, expiry: 0, flags: 0, datatype: 0x00 (raw)
size: 364
data: (snappy)
Total docs: 1
Interestingly, as mentioned, this doesn't seem to affect all documents:
# /opt/couchbase/bin/couch_dbdump --no-body /opt/couchbase/var/lib/couchbase/data/sg1/* | grep 'datatype:.*' -o | sort | uniq -c
|
Failed to open "/opt/couchbase/var/lib/couchbase/data/sg1/stats.json": malformed data in file
|
Failed to open "/opt/couchbase/var/lib/couchbase/data/sg1/stats.json.old": malformed data in file
|
267 datatype: 0x00 (raw)
|
829 datatype: 0x01 (json)
|
2048 datatype: 0x05 (json,xattr)
|
|
# /opt/couchbase/bin/couch_dbdump --no-body /opt/couchbase/var/lib/couchbase/data/sg2/* | grep 'datatype:.*' -o | sort | uniq -c
|
Failed to open "/opt/couchbase/var/lib/couchbase/data/sg2/stats.json": malformed data in file
|
Failed to open "/opt/couchbase/var/lib/couchbase/data/sg2/stats.json.old": malformed data in file
|
1639 datatype: 0x00 (raw)
|
830 datatype: 0x01 (json)
|
675 datatype: 0x05 (json,xattr)
|
Taking one such example, key_1236, we can see that it's restored correctly:
# /opt/couchbase/bin/couch_dbdump --key key_1236 /opt/couchbase/var/lib/couchbase/data/sg2/1002.couch.1
|
Dumping "/opt/couchbase/var/lib/couchbase/data/sg2/1002.couch.1":
|
Doc ID: key_1236
|
seq: 2
|
rev: 1
|
content_meta: 128
|
size (on disk): 278
|
cas: 1515414065754734592, expiry: 0, flags: 0, datatype: 0x05 (json,xattr)
|
size: 318
|
xattrs: {"_sync":{"rev":"1-7e5eb5682c9532f9907c8255e725cb54","sequence":1504,"recent_sequences":[1504],"history":{"revs":["1-7e5eb5682c9532f9907c8255e725cb54"],"parents":[-1],"channels":[["15966"]]},"channels":{"15966":null},"cas":"0x0000f15410d50715","time_saved":"2018-01-08T12:21:07.080455581Z"}}
|
data: (snappy) {"channels":["15966"]}
|
|
Total docs: 1
|
Inspecting these keys in sqlitebrowser shows that the difference is the padding:
I've also attached a repro backup: backup.zip