Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-38782

On a full file scan the chunks should be iterated in order

    XMLWordPrintable

Details

    • Triaged
    • Yes

    Description

      When performing a full file scan we iterate over all chunks to get all the data for the restore or merge. The chunks are stored in a map[int]BackupFile which means it won't necesarly iterate though them in order. This causes an issue in the merge where the snapshot sequence numbers are not correct as the last seqno is invalid as the last item read is not the last item that was backed up.

      This issue will be hit by any merge where one of the previous backups has more than one chunk (the more chunks the backup ha the more likely they will be to hit this issue). By default cbbackupmgr will chunk each vBucket after 100K mutations. This means this issue will be hit by any customer that backups up more than 103M items and then tries to do a merge.

      Steps to reproduce:
      Note that to repoduce using less data we are setting the environmental variable CB_CHUNK_SIZE=X to a very low number this means that each chunk will only contain up to X items.

      1. Setup a 6.5.0 cluster
      2. Load the beer-sample bucket
      3. Set the environmental variable CB_CHUNK_SIZE to 10. This can be done by doing export CB_CHUNK_SIZE=5
      4. config a repo and do the first backup as follows:

        $ cbbackupmgr config -a /archive -r test-repo
        Backup repository `test-reop` created successfully in archive `/archive`
         
        $ cbbackupmgr backup -a /archive -r test-repo -c localhost:8091 -u Administrator -p password
        

      5. Load some more data into the cluster this can be done using cbworkloadgen as follows:

        $ cbworkloadgen -n localhost:8091 -u Administrator -p password -b beer-sample -i 5000 -s 100 -j -r 1 -t 5
        

      6. Perform an incremental backup:

        $ cbbackupmgr backup -a /archive -r test-repo -c localhost:8091 -u Administrator -p password
        

      7. Perform a merge:

        $ cbbackupmgr merge -a /archive -r test-repo --start oldest --end latest
        Error merging data: Failed to transfer snapshot marker with key due to error `Snapshot Marker for vb 37 (25 to 880) does not progress from old marker (0 to 110)`
        

      This issue will also cause issues in restore but this will be less common. For restores there is an edge case where the latest version of a document may not be restored for this to happen all the the things bellow must happen first.

      1. The user does a large backup with several chunks that fails half way through
      2. Before the user resumes a lot of mutations have to happen in each vBucket that alter some of the keys that we had already back up
      3. The user resumes the backup and we receive a copy of the same key but different value that gets backed up to a different chunk
      4. The user does a restore using --force-update.

      In this cases there is a chance that instead of restoring the latest version of the key we end up overwritting it with an older one.

      Issue was introduced by http://review.couchbase.org/c/backup/+/113068

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            carlos.gonzalez Carlos Gonzalez Betancort (Inactive)
            carlos.gonzalez Carlos Gonzalez Betancort (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty