Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-44772

Backup Service test one_off_merge fails with permission denied

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Not a Bug
    • Cheshire-Cat
    • None
    • tools
    • centos

    Description

      Summary
      As soon as the test finishes transferring all the data for a cluster, it attempts to access .collection_manifest and fails with permission denied.
       
      Steps to Reproduce

      Run the centos-backup_recovery_backup-service-general-samba test
       
      {Expected Results

      I expected the one off merge task to succeed{{}}
       
      Actual Results

      On trying to access .collection_manifest, fails with

      (Cmd) Error merging data: open /tmp/entbackup_172.23.106.116/da7b7aec-99ab-489e-8895-7e5095bf2ad6/.merge/default-4d38be5f5319834083138df1de7b2365/data/.collection_manifest: permission denied
      

      (Taken from backup-0.log)

      Artefacts
      Same error in test.log:

      [backup_service_base:538] INFO - 
      {'end': '2021-03-02T05:38:50.247087719-08:00',
       'error': 'exit status 1: open '
                '/tmp/entbackup_172.23.96.183/fa05a8de-0b39-45a3-84e3-18bb9f20c8a0/.merge/default-f38ccf88b049edee9dbab304849902b1/data/.collection_manifest: '
                'permission denied',
       'error_code': 2,
       'node_runs': [{'end': '2021-03-02T05:38:50.228419783-08:00',
                      'error': 'exit status 1: open '
                               '/tmp/entbackup_172.23.96.183/fa05a8de-0b39-45a3-84e3-18bb9f20c8a0/.merge/default-f38ccf88b049edee9dbab304849902b1/data/.collection_manifest: '
                               'permission denied',
                      'error_code': 2,
                      'node_id': '1ea820d0fd35ff5f3944cf7ca32560a5',
                      'out': None,
                      'progress': 0.0,
                      'start': '2021-03-02T05:37:03.179951981-08:00',
                      'stats': {'id': None, 'transfers': None},
                      'status': 'failed'}],
       'start': '2021-03-02T05:37:03.180290159-08:00',
       'status': 'failed',
       'target_cluster': None,
       'task_name': 'MERGE-fc769fc2-9f36-4e8e-a235-3b7f5061d5bb',
       'type': 'MERGE'}
      

      Attachments

        1. backup-0.log
          1.68 MB
          Joe Mitchell Jones
        2. backup-0-1.log
          1.68 MB
          James Lee
        3. dirlisting.txt
          1.23 MB
          James Lee
        4. merge-complete.log
          1.10 MB
          James Lee
        5. straces.tar.zst
          14.91 MB
          James Lee
        6. test.log
          202 kB
          Joe Mitchell Jones

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Joe Mitchell Jones Could we get the cluster logs as well I am interested to see the backup service logs.

            carlos.gonzalez Carlos Gonzalez Betancort (Inactive) added a comment - Joe Mitchell Jones Could we get the cluster logs as well I am interested to see the backup service logs.

            Just uploaded the full log set for a new run as test_4.zip

            joe.mitchelljones Joe Mitchell Jones added a comment - Just uploaded the full log set for a new run as test_4.zip
            james.lee James Lee added a comment - - edited

            Joe Mitchell Jones,

            This looks like an interesting issue, I'm not exactly sure whats going on here at the moment; although here are a few observations.

            We appear to be failing to write the collection manifests from the final backup into the merge directory. We know this because we can see that we've successfully completed the merge data transfer.

            Merge data transfer complete

            2021-03-10T08:45:16.697-08:00 (Plan) Transfer of all data complete
            

            After this we read then write an update '.info' file, interestingly enough this process completes as expected.

            Write updated .info file

            err = ioutil.WriteFile(resourceBackupInfo.Path(mergeDir.path), iJSON, utils.DefaultDirMode)
            if err != nil {
                return fmt.Errorf("failed to write merge backup info: %w", err)
            }
            

            We then attempt to write the latest manifest from the last backup into the merged backup:

            err := utils.WriteJSONFile(filepath.Join(mergeDir.path, fmt.Sprintf("%s-%s", bucket, uuid), "data", storage.ManifestFileName), lastManifests[bucket])
            if err != nil {
                return err
            }
            

            This appears to be the bit that's failing. Interesting to note that internally the 'WriteJSONFile' uses 'ioutil.WriteFile' so this doesn't appear to be an issue with the way we're writing the file, perhaps just the location which is the major different between these two operations.

            The first is writing to '/tmp/entbackup_172.23.100.20/843a4ed1-c9ea-4858-ae1a-20249e10330e/.merge/.info', the second '/tmp/entbackup_172.23.100.20/843a4ed1-c9ea-4858-ae1a-20249e10330e/.merge/default-8abdb94c68b1547611fa4e8429a41c93/data/.collection_manifest'.

            In theory, any manifest file that already existed would have been created with the permissions '0o660', please note this file would have also been written using 'ioutil.WriteFile'.

            Would if be possible to provide a directory listing using 'ls -R', and possibly the samba configuration used here (this would allow me to reproduce the issue locally)?

            james.lee James Lee added a comment - - edited Joe Mitchell Jones , This looks like an interesting issue, I'm not exactly sure whats going on here at the moment; although here are a few observations. We appear to be failing to write the collection manifests from the final backup into the merge directory. We know this because we can see that we've successfully completed the merge data transfer. Merge data transfer complete 2021-03-10T08:45:16.697-08:00 (Plan) Transfer of all data complete After this we read then write an update ' .info ' file, interestingly enough this process completes as expected. Write updated .info file err = ioutil.WriteFile(resourceBackupInfo.Path(mergeDir.path), iJSON, utils.DefaultDirMode) if err != nil { return fmt.Errorf("failed to write merge backup info: %w", err) } We then attempt to write the latest manifest from the last backup into the merged backup: err := utils.WriteJSONFile(filepath.Join(mergeDir.path, fmt.Sprintf("%s-%s", bucket, uuid), "data", storage.ManifestFileName), lastManifests[bucket]) if err != nil { return err } This appears to be the bit that's failing. Interesting to note that internally the ' WriteJSONFile ' uses ' ioutil.WriteFile ' so this doesn't appear to be an issue with the way we're writing the file, perhaps just the location which is the major different between these two operations. The first is writing to ' /tmp/entbackup_172.23.100.20/843a4ed1-c9ea-4858-ae1a-20249e10330e/.merge/.info ', the second ' /tmp/entbackup_172.23.100.20/843a4ed1-c9ea-4858-ae1a-20249e10330e/.merge/default-8abdb94c68b1547611fa4e8429a41c93/data/.collection_manifest '. In theory, any manifest file that already existed would have been created with the permissions ' 0o660 ', please note this file would have also been written using ' ioutil.WriteFile '. Would if be possible to provide a directory listing using ' ls -R ', and possibly the samba configuration used here (this would allow me to reproduce the issue locally)?
            james.lee James Lee added a comment -

            Hi Joe Mitchell Jones,

            I'm going to close this as not a bug, Asad Zaidi and I have been investigating this; I was able to reproduce the issue outside of 'cbbackupmgr' using a simple test Go program and with 'echo'.

            echo reproduction

            [root@localhost ~]# echo "test" > /tmp/entbackup_172.23.136.111/55aa0c18-69fa-4df7-ad16-5496256ac2d8/.merge/default-87ac7eef4181148ec81c3ed98dbb76a2/data/.collection_manifest                                                                                                                                                 
            -bash: /tmp/entbackup_172.23.136.111/55aa0c18-69fa-4df7-ad16-5496256ac2d8/.merge/default-87ac7eef4181148ec81c3ed98dbb76a2/data/.collection_manifest: Permission denied
            

            With the samba logging set to be more verbose, we see:

            Samba Logging

            [2021/03/15 11:14:22.531734, 2] ../../source3/smbd/open.c:1447(open_file)
             couchbase opened file 55aa0c18-69fa-4df7-ad16-5496256ac2d8/.merge/default-87ac7eef4181148ec81c3ed98dbb76a2/data/.collection_manifest read=No write=No (numopen=1)
            

            We therefore believe this is an issue with testrunners samba permissions and not with 'cbbackupmgr'. Once we've got an appropriate fix, we'll open a DOC bug (if applicable) to extend our documentation with the required information.

            Thanks,
            James

            james.lee James Lee added a comment - Hi Joe Mitchell Jones , I'm going to close this as not a bug, Asad Zaidi and I have been investigating this; I was able to reproduce the issue outside of ' cbbackupmgr ' using a simple test Go program and with ' echo '. echo reproduction [root@localhost ~]# echo "test" > /tmp/entbackup_172.23.136.111/55aa0c18-69fa-4df7-ad16-5496256ac2d8/.merge/default-87ac7eef4181148ec81c3ed98dbb76a2/data/.collection_manifest -bash: /tmp/entbackup_172.23.136.111/55aa0c18-69fa-4df7-ad16-5496256ac2d8/.merge/default-87ac7eef4181148ec81c3ed98dbb76a2/data/.collection_manifest: Permission denied With the samba logging set to be more verbose, we see: Samba Logging [2021/03/15 11:14:22.531734, 2] ../../source3/smbd/open.c:1447(open_file) couchbase opened file 55aa0c18-69fa-4df7-ad16-5496256ac2d8/.merge/default-87ac7eef4181148ec81c3ed98dbb76a2/data/.collection_manifest read=No write=No (numopen=1) We therefore believe this is an issue with testrunners samba permissions and not with ' cbbackupmgr '. Once we've got an appropriate fix, we'll open a DOC bug (if applicable) to extend our documentation with the required information. Thanks, James
            james.lee James Lee added a comment - - edited

            Quick update one this, I currently believe it's an issues with how the umasks have been setup of the samba server; when running the same test with our 'master' branch it passes as expected. I currently believe this is due to the 'fsutil' changes which ensure filesystem operations correctly run 'chmod' after file creation (which was previously, a underused pattern).

            We're currently running some tests to grab an 'ls -R' of the created files, to determine whether we're correct. Our assumption being that we'll need to document the umask requirements for samba for older versions of 'cbbackupmgr'.

            james.lee James Lee added a comment - - edited Quick update one this, I currently believe it's an issues with how the umasks have been setup of the samba server; when running the same test with our ' master ' branch it passes as expected. I currently believe this is due to the ' fsutil ' changes which ensure filesystem operations correctly run ' chmod ' after file creation (which was previously, a underused pattern). We're currently running some tests to grab an ' ls -R ' of the created files, to determine whether we're correct. Our assumption being that we'll need to document the umask requirements for samba for older versions of ' cbbackupmgr '.
            james.lee James Lee added a comment -

            Hi Joe Mitchell Jones,

            First things first, I'll clarify a few things since (as I now know) my previous comment is slightly inaccurate. This isn't directly an issue with 'cbbackupmgr' but is an issue with the way the samba server is setup:
            1) We're using '/tmp' for both the exported share/the mounted share (not directly an issue, but we should fix this)
            2) We're using 'hide dot files = yes' (which is the default)
            3) This affects both 'cheshire-cat' and 'neo' (on the contrary to my last comment)

            After some investigation using the provided machines, I spotted an interesting log line (after enabling the most verbose log level).

            attributes mismatch

            open_file_ntcreate: attributes mismatch for file 8cec5a09-f6a7-4cef-9634-9dc170acfb39/.merge/default-3a2eb4029d960884550dc3b163c4d74b/data/.collection_manifest (22 0) (0100777, 0777)
            

            This lead me to this issue that users may face issues when overwriting hidden files on a samba share; I believe this is due to an attribute which is set which allows the files to be hidden on Windows clients.

            After running with 'hide dot files = no', I was able to complete the merge as expected (please see 'merge-complete.log').

            For those that are interested, I've attached a file called 'straces.tar.zstd' which contains an strace for 'cheshire-cat' and 'neo' showing both fail when attempting to overwrite the collection manifest in the latest stages of the merge.

            Thanks,
            James

            james.lee James Lee added a comment - Hi Joe Mitchell Jones , First things first, I'll clarify a few things since (as I now know) my previous comment is slightly inaccurate. This isn't directly an issue with ' cbbackupmgr ' but is an issue with the way the samba server is setup: 1) We're using ' /tmp ' for both the exported share/the mounted share (not directly an issue, but we should fix this) 2) We're using ' hide dot files = yes ' (which is the default) 3) This affects both ' cheshire-cat ' and ' neo ' (on the contrary to my last comment) After some investigation using the provided machines, I spotted an interesting log line (after enabling the most verbose log level). attributes mismatch open_file_ntcreate: attributes mismatch for file 8cec5a09-f6a7-4cef-9634-9dc170acfb39/.merge/default-3a2eb4029d960884550dc3b163c4d74b/data/.collection_manifest (22 0) (0100777, 0777) This lead me to this issue that users may face issues when overwriting hidden files on a samba share; I believe this is due to an attribute which is set which allows the files to be hidden on Windows clients. After running with ' hide dot files = no ', I was able to complete the merge as expected (please see ' merge-complete.log '). For those that are interested, I've attached a file called ' straces.tar.zstd ' which contains an strace for ' cheshire-cat ' and ' neo ' showing both fail when attempting to overwrite the collection manifest in the latest stages of the merge. Thanks, James

            People

              joe.mitchelljones Joe Mitchell Jones
              joe.mitchelljones Joe Mitchell Jones
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty