Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-27459

cbbackup can corrupt documents with xattrs

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Blocker
    • Resolution: Fixed
    • 5.0.0, 5.0.1
    • 5.1.0
    • tools
    • Untriaged
    • Unknown

    Description

      When performing a backup with cbbackup, we stream the documents including xattrs (because, of course, these need backing up too). However we store these directly in the val column of the sqlite cbb_msg table.

      In some cases, this seems to get stored in the rough format:

      NULL  ...  NULL  XATTR_NAME  NULL  {XATTR}  NULL  {BODY}
      

      This seems in line with the binary protocol format.

      In fact, when stored in this way, the restore seems to work fine for some keys (more on these below. There are cases though where the padding seems different, which leads to the document being restored as binary.

      Steps to reproduce

      • Create a cluster with two buckets, and load one with data including xattrs (I've used SG as a real world example here).
      • Picking a key (key_123) we can see that it has both a body and xattr:

        subdoc> get key_123 -x _sync
        key_123              CAS=0x1507d50c13e60000
        0. Size=327, RC=0x00 Success (Not an error)
        {"rev":"2-7a40fa2b51531d65895b0f64d652bd20","sequence":391,"recent_sequences":[125,391],"history":{"revs":["1-ca9ad22802b66f662ff171f226211d5c","2-7a40fa2b51531d65895b0f64d652bd20"],"parents":[-1,0],"channels":[null,["04070"]]},"channels":{"04070":null},"cas":"0x0000e6130cd50715","time_saved":"2018-01-08T12:20:48.819534016Z"}
        1. Size=22, RC=0x00 Success (Not an error)
        {"channels":["04070"]}
        

      • Perform a backup of the bucket:

        $  /opt/couchbase/bin/cbbackup http://localhost:8091/ /vagrant/backup -b sg1 -u Administrator -p password
        

      • Restore the backup to the second bucket:

        $ /opt/couchbase/bin/cbrestore /vagrant/backup/2018-01-08T122403Z/2018-01-08T122403Z-full/ http://localhost:8091/ -b sg1 -B sg2 -u Administrator -p password
        

      • Verify the document in the second bucket - note that the xattr (_sync) no longer exists. Also note the spurious RN:

        subdoc> get key_123 -x _sync
        key_123              CAS=0x1507d50c13e60000
        0. Size=0, RC=0x3f Sub-document path does not exist
        1. Size=364, RC=0x00 Success (Not an error)
        RN_sync{"rev":"2-7a40fa2b51531d65895b0f64d652bd20","sequence":391,"recent_sequences":[125,391],"history":{"revs":["1-ca9ad22802b66f662ff171f226211d5c","2-7a40fa2b51531d65895b0f64d652bd20"],"parents":[-1,0],"channels":[null,["04070"]]},"channels":{"04070":null},"cas":"0x0000e6130cd50715","time_saved":"2018-01-08T12:20:48.819534016Z"}{"channels":["04070"]}
        

        Perhaps more clearly seen comparing the output couch_dbdump on both buckets, where we can also see that it's now of raw datatype:

        # /opt/couchbase/bin/couch_dbdump --key key_123 /opt/couchbase/var/lib/couchbase/data/sg1/65.couch.1
        Dumping "/opt/couchbase/var/lib/couchbase/data/sg1/65.couch.1":
          Doc ID: key_123
             seq: 2
             rev: 2
             content_meta: 128
             size (on disk): 312
             cas: 1515414047483625472, expiry: 0, flags: 0, datatype: 0x05 (json,xattr)
             size: 364
             xattrs: {"_sync":{"rev":"2-7a40fa2b51531d65895b0f64d652bd20","sequence":391,"recent_sequences":[125,391],"history":{"revs":["1-ca9ad22802b66f662ff171f226211d5c","2-7a40fa2b51531d65895b0f64d652bd20"],"parents":[-1,0],"channels":[null,["04070"]]},"channels":{"04070":null},"cas":"0x0000e6130cd50715","time_saved":"2018-01-08T12:20:48.819534016Z"}}
             data: (snappy) {"channels":["04070"]}
        

        # /opt/couchbase/bin/couch_dbdump --key key_123 /opt/couchbase/var/lib/couchbase/data/sg2/65.couch.1
        Dumping "/opt/couchbase/var/lib/couchbase/data/sg2/65.couch.1":
          Doc ID: key_123
             seq: 1
             rev: 2
             content_meta: 131
             size (on disk): 312
             cas: 1515414047483625472, expiry: 0, flags: 0, datatype: 0x00 (raw)
             size: 364
             data: (snappy)
         
        Total docs: 1
        


      Interestingly, as mentioned, this doesn't seem to affect all documents:

      # /opt/couchbase/bin/couch_dbdump --no-body /opt/couchbase/var/lib/couchbase/data/sg1/* | grep 'datatype:.*' -o | sort | uniq -c
      Failed to open "/opt/couchbase/var/lib/couchbase/data/sg1/stats.json": malformed data in file
      Failed to open "/opt/couchbase/var/lib/couchbase/data/sg1/stats.json.old": malformed data in file
          267 datatype: 0x00 (raw)
          829 datatype: 0x01 (json)
         2048 datatype: 0x05 (json,xattr)
       
      # /opt/couchbase/bin/couch_dbdump --no-body /opt/couchbase/var/lib/couchbase/data/sg2/* | grep 'datatype:.*' -o | sort | uniq -c
      Failed to open "/opt/couchbase/var/lib/couchbase/data/sg2/stats.json": malformed data in file
      Failed to open "/opt/couchbase/var/lib/couchbase/data/sg2/stats.json.old": malformed data in file
         1639 datatype: 0x00 (raw)
          830 datatype: 0x01 (json)
          675 datatype: 0x05 (json,xattr)
      

      Taking one such example, key_1236, we can see that it's restored correctly:

      # /opt/couchbase/bin/couch_dbdump --key key_1236 /opt/couchbase/var/lib/couchbase/data/sg2/1002.couch.1
      Dumping "/opt/couchbase/var/lib/couchbase/data/sg2/1002.couch.1":
        Doc ID: key_1236
           seq: 2
           rev: 1
           content_meta: 128
           size (on disk): 278
           cas: 1515414065754734592, expiry: 0, flags: 0, datatype: 0x05 (json,xattr)
           size: 318
           xattrs: {"_sync":{"rev":"1-7e5eb5682c9532f9907c8255e725cb54","sequence":1504,"recent_sequences":[1504],"history":{"revs":["1-7e5eb5682c9532f9907c8255e725cb54"],"parents":[-1],"channels":[["15966"]]},"channels":{"15966":null},"cas":"0x0000f15410d50715","time_saved":"2018-01-08T12:21:07.080455581Z"}}
           data: (snappy) {"channels":["15966"]}
       
      Total docs: 1
      

      Inspecting these keys in sqlitebrowser shows that the difference is the padding:

      I've also attached a repro backup: backup.zip

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-27459
          # Subject Branch Project Status CR V

          Activity

            Build couchbase-server-5.1.0-5526 contains couchbase-cli commit 0bed2e193dbd466404cf0efe4fed52a2d96633a7 with commit message:
            MB-27459: Track the sqlite version during restore
            https://github.com/couchbase/couchbase-cli/commit/0bed2e193dbd466404cf0efe4fed52a2d96633a7

            build-team Couchbase Build Team added a comment - Build couchbase-server-5.1.0-5526 contains couchbase-cli commit 0bed2e193dbd466404cf0efe4fed52a2d96633a7 with commit message: MB-27459 : Track the sqlite version during restore https://github.com/couchbase/couchbase-cli/commit/0bed2e193dbd466404cf0efe4fed52a2d96633a7

            Verified in 5.1.0-5536:

            1. Setup default bucket with 10k items and sync gateway enabled
            2. couch_dbdump on key pymc0:

            [root@node1-cb500-testing-centos6 tools]# /opt/couchbase/bin/couch_dbdump --key pymc0 /opt/couchbase/var/lib/couchbase/data/default/449.couch.1
            Dumping "/opt/couchbase/var/lib/couchbase/data/default/449.couch.1":
              Doc ID: pymc0
                 seq: 18
                 rev: 2
                 content_meta: 128
                 size (on disk): 286
                 cas: 1516060263255375872, expiry: 0, flags: 0, datatype: 0x05 (json,xattr)
                 size: 317
                 xattrs: {"_sync":{"rev":"1-6f0f37c188033fc5b75796eeec33383d","sequence":449,"recent_sequences":[449],"history":{"revs":["1-6f0f37c188033fc5b75796eeec33383d"],"parents":[-1],"channels":[null]},"cas":"0x00006ce9c6200a15","time_saved":"2018-01-15T23:51:03.120864277Z"}}
                 data: (snappy) {"age":0,"body":"VTKGNKUHMP","index":0,"name":"pymc0"}
             
            Total docs: 1
            

            3. cbbackup the data
            4. cbrestore the data to another bucket
            5. couch_dbdump on same key:

            [root@node3-cb500-testing-centos6 ~]#  /opt/couchbase/bin/couch_dbdump --key pymc0 /opt/couchbase/var/lib/couchbase/data/default/449.couch.2
            Dumping "/opt/couchbase/var/lib/couchbase/data/default/449.couch.2":
              Doc ID: pymc0
                 seq: 1
                 rev: 2
                 content_meta: 128
                 size (on disk): 286
                 cas: 1516060263255375872, expiry: 0, flags: 0, datatype: 0x05 (json,xattr)
                 size: 317
                 xattrs: {"_sync":{"rev":"1-6f0f37c188033fc5b75796eeec33383d","sequence":449,"recent_sequences":[449],"history":{"revs":["1-6f0f37c188033fc5b75796eeec33383d"],"parents":[-1],"channels":[null]},"cas":"0x00006ce9c6200a15","time_saved":"2018-01-15T23:51:03.120864277Z"}}
                 data: (snappy) {"age":0,"body":"VTKGNKUHMP","index":0,"name":"pymc0"}
             
            Total docs: 1
            

            Xattrs are intact - checked couple of random docs for same result

            arunkumar Arunkumar Senthilnathan added a comment - Verified in 5.1.0-5536: 1. Setup default bucket with 10k items and sync gateway enabled 2. couch_dbdump on key pymc0: [root@node1-cb500-testing-centos6 tools]# /opt/couchbase/bin/couch_dbdump --key pymc0 /opt/couchbase/var/lib/couchbase/data/default/449.couch.1 Dumping "/opt/couchbase/var/lib/couchbase/data/default/449.couch.1": Doc ID: pymc0 seq: 18 rev: 2 content_meta: 128 size (on disk): 286 cas: 1516060263255375872, expiry: 0, flags: 0, datatype: 0x05 (json,xattr) size: 317 xattrs: {"_sync":{"rev":"1-6f0f37c188033fc5b75796eeec33383d","sequence":449,"recent_sequences":[449],"history":{"revs":["1-6f0f37c188033fc5b75796eeec33383d"],"parents":[-1],"channels":[null]},"cas":"0x00006ce9c6200a15","time_saved":"2018-01-15T23:51:03.120864277Z"}} data: (snappy) {"age":0,"body":"VTKGNKUHMP","index":0,"name":"pymc0"}   Total docs: 1 3. cbbackup the data 4. cbrestore the data to another bucket 5. couch_dbdump on same key: [root@node3-cb500-testing-centos6 ~]# /opt/couchbase/bin/couch_dbdump --key pymc0 /opt/couchbase/var/lib/couchbase/data/default/449.couch.2 Dumping "/opt/couchbase/var/lib/couchbase/data/default/449.couch.2": Doc ID: pymc0 seq: 1 rev: 2 content_meta: 128 size (on disk): 286 cas: 1516060263255375872, expiry: 0, flags: 0, datatype: 0x05 (json,xattr) size: 317 xattrs: {"_sync":{"rev":"1-6f0f37c188033fc5b75796eeec33383d","sequence":449,"recent_sequences":[449],"history":{"revs":["1-6f0f37c188033fc5b75796eeec33383d"],"parents":[-1],"channels":[null]},"cas":"0x00006ce9c6200a15","time_saved":"2018-01-15T23:51:03.120864277Z"}} data: (snappy) {"age":0,"body":"VTKGNKUHMP","index":0,"name":"pymc0"}   Total docs: 1 Xattrs are intact - checked couple of random docs for same result

            Sorry, I think I might've been a little vague with my testing steps - I think it's important to validate the entire dataset rather than just individual docs. Obviously this is a little difficult when we're testing a backup itself.

            A reasonably heavyweight solution would be to restore to a second bucket and then iterate over every key, comparing both body and xattr.

            A slightly more lightweight solution might be to use a much larger dataset, and validate the datatypes across all vBuckets ( see the couch_dbdump | grep | ... in the issue description).

            James Flather James Flather added a comment - Sorry, I think I might've been a little vague with my testing steps - I think it's important to validate the entire dataset rather than just individual docs. Obviously this is a little difficult when we're testing a backup itself. A reasonably heavyweight solution would be to restore to a second bucket and then iterate over every key, comparing both body and xattr. A slightly more lightweight solution might be to use a much larger dataset, and validate the datatypes across all vBuckets ( see the couch_dbdump | grep | ... in the issue description).

            From node1 where my source bucket was:

            [root@node1-cb500-testing-centos6 ~]# /opt/couchbase/bin/couch_dbdump --no-body /opt/couchbase/var/lib/couchbase/data/default/* | grep 'datatype:.*' -o | sort | uniq -c
            Failed to open "/opt/couchbase/var/lib/couchbase/data/default/stats.json": malformed data in file
            Failed to open "/opt/couchbase/var/lib/couchbase/data/default/stats.json.old": malformed data in file
            1 datatype: 0x00 (raw)
            1027 datatype: 0x01 (json)
            10000 datatype: 0x05 (json,xattr)

            From node3 where my target bucket was:

            [root@node3-cb500-testing-centos6 ~]# /opt/couchbase/bin/couch_dbdump --no-body /opt/couchbase/var/lib/couchbase/data/default/* | grep 'datatype:.*' -o | sort | uniq -c
            Failed to open "/opt/couchbase/var/lib/couchbase/data/default/flushseq": error reading file
            Failed to open "/opt/couchbase/var/lib/couchbase/data/default/stats.json": malformed data in file
            Failed to open "/opt/couchbase/var/lib/couchbase/data/default/stats.json.old": malformed data in file
            1028 datatype: 0x01 (json)
            10000 datatype: 0x05 (json,xattr)

            Looks like all the documents are in sync with lightweight approach - I am gonna consider this enough verification - we will do the heavyweight verification when we automate this testcase as part of CBQE-4421

            arunkumar Arunkumar Senthilnathan added a comment - From node1 where my source bucket was: [root@node1-cb500-testing-centos6 ~] # /opt/couchbase/bin/couch_dbdump --no-body /opt/couchbase/var/lib/couchbase/data/default/* | grep 'datatype:.*' -o | sort | uniq -c Failed to open "/opt/couchbase/var/lib/couchbase/data/default/stats.json": malformed data in file Failed to open "/opt/couchbase/var/lib/couchbase/data/default/stats.json.old": malformed data in file 1 datatype: 0x00 (raw) 1027 datatype: 0x01 (json) 10000 datatype: 0x05 (json,xattr) From node3 where my target bucket was: [root@node3-cb500-testing-centos6 ~] # /opt/couchbase/bin/couch_dbdump --no-body /opt/couchbase/var/lib/couchbase/data/default/* | grep 'datatype:.*' -o | sort | uniq -c Failed to open "/opt/couchbase/var/lib/couchbase/data/default/flushseq": error reading file Failed to open "/opt/couchbase/var/lib/couchbase/data/default/stats.json": malformed data in file Failed to open "/opt/couchbase/var/lib/couchbase/data/default/stats.json.old": malformed data in file 1028 datatype: 0x01 (json) 10000 datatype: 0x05 (json,xattr) Looks like all the documents are in sync with lightweight approach - I am gonna consider this enough verification - we will do the heavyweight verification when we automate this testcase as part of CBQE-4421

            Build couchbase-server-5.5.0-2376 contains couchbase-cli commit 0bed2e193dbd466404cf0efe4fed52a2d96633a7 with commit message:
            MB-27459: Track the sqlite version during restore
            https://github.com/couchbase/couchbase-cli/commit/0bed2e193dbd466404cf0efe4fed52a2d96633a7

            build-team Couchbase Build Team added a comment - Build couchbase-server-5.5.0-2376 contains couchbase-cli commit 0bed2e193dbd466404cf0efe4fed52a2d96633a7 with commit message: MB-27459 : Track the sqlite version during restore https://github.com/couchbase/couchbase-cli/commit/0bed2e193dbd466404cf0efe4fed52a2d96633a7

            People

              arunkumar Arunkumar Senthilnathan
              James Flather James Flather
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty