Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40764

cbbackupmgr backup to S3 fails to perform sub-command operations due to file being used by another process

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 6.6.0
    • Fix Version/s: 6.6.0
    • Component/s: tools
    • Environment:
      Windows 2016 64-bit
    • Triage:
      Untriaged
    • Story Points:
      1
    • Is this a Regression?:
      No

      Description

      Install Couchbase server 6.6.0-7897 on a windows server 2016.
      Create travel-sample bucket.
      Run backup to s3, backup all data in bucket but final sub process failed.

       
      C:\Users\Administrator>cd "c:\Program Files\Couchbase\Server\bin"
       
      c:\Program Files\Couchbase\Server\bin>.\cbbackupmgr config -r backup -a s3://bkrepo --obj-access-key-id AKIAJP --obj-secret-access-key xzsNfaT --obj-staging-dir /root/bk-staging  --obj-region us-west-2
      Backup repository `backup` created successfully in archive `s3://bkrepo`
       
      c:\Program Files\Couchbase\Server\bin>.\cbbackupmgr backup -c localhost -u Administrator -p password  -r backup -a s3://bkrepo --obj-access-key-id AKIAJP --obj-secret-access-key xzsNfaT  --obj-staging-dir /root/bk-staging  --obj-region us-west-2
      Backing up to '2020-08-04T22_48_59.4251885Z'
      Copied all data in 3m38.0058358s (Avg. 98.61KB/Sec)                                               31591 items / 20.99MB
      [=============================================================================================================] 100.00%
      Backup successfully completed
      Backed up bucket "travel-sample" succeeded
      Mutations backed up: 31591, Mutations failed to backup: 0
      Deletions backed up: 0, Deletions failed to backup: 0
      Skipped due to purge number or conflict resolution: Mutations: 0 Deletions: 0
      Post sub-command object store task failed: The process cannot access the file because it is being used by another process.
       
      c:\Program Files\Couchbase\Server\bin>
      
      

      I will upload log soon

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          thuan Thuan Nguyen added a comment -

          I have this live cluster in AWS. If you need to access this node to debug, let me know. I will provide credential to login

          Show
          thuan Thuan Nguyen added a comment - I have this live cluster in AWS. If you need to access this node to debug, let me know. I will provide credential to login
          Hide
          james.lee James Lee added a comment -

          Hi Thuan Nguyen,

          Please could you attach the logs collected using 'cbbackupmgr collect-logs' as well.

          Show
          james.lee James Lee added a comment - Hi Thuan Nguyen , Please could you attach the logs collected using ' cbbackupmgr collect-logs ' as well.
          Hide
          james.lee James Lee added a comment - - edited

          Hi Thuan Nguyen,

          I've reproduced the issue with a Windows virtual machine, however, in the future could you include the logs; it would have saved setting up a Windows machine locally (or borrowing the one you've got in AWS). Please could you still provide the logs so that I can compare, and be sure that what I have reproduced is in fact the issue.

          This issue is quite subtle and is caused by the slightly different way that Linux/Unix and Windows handle removing files. Below is an example of exactly what we are seeing:

          package main
           
          import "os"
           
          func main() {
          	file, err := os.OpenFile("test.file", os.O_CREATE|os.O_WRONLY, 0755)
          	if err != nil {
          		panic(err)
          	}
           
          	err = os.Remove("test.file")
          	if err != nil {
          		panic(err)
          	}
           
          	file.Close()
          }
          

          If we run this snippet using Go we should see:

          panic: remove test.file: The process cannot access the file because it is being used by another process.
           
          goroutine 1 [running]:
          main.main()
          	C:/Users/User/Documents/test.go:13 +0xd4
          exit status 2
          

          The issue here is that we are trying to remove the file before it has been closed. This is the exact behavior which is being triggered when compressing the repository metadata (this is where the logs where required) before it is uploaded to S3. The fix is simple, we should just ensure that we close the file handle before we call 'os.Remove()'.

          Show
          james.lee James Lee added a comment - - edited Hi Thuan Nguyen , I've reproduced the issue with a Windows virtual machine, however, in the future could you include the logs; it would have saved setting up a Windows machine locally (or borrowing the one you've got in AWS). Please could you still provide the logs so that I can compare, and be sure that what I have reproduced is in fact the issue. This issue is quite subtle and is caused by the slightly different way that Linux/Unix and Windows handle removing files. Below is an example of exactly what we are seeing: package main   import "os"   func main() { file, err := os.OpenFile("test.file", os.O_CREATE|os.O_WRONLY, 0755) if err != nil { panic(err) }   err = os.Remove("test.file") if err != nil { panic(err) }   file.Close() } If we run this snippet using Go we should see: panic: remove test.file: The process cannot access the file because it is being used by another process.   goroutine 1 [running]: main.main() C:/Users/User/Documents/test.go:13 +0xd4 exit status 2 The issue here is that we are trying to remove the file before it has been closed. This is the exact behavior which is being triggered when compressing the repository metadata (this is where the logs where required) before it is uploaded to S3. The fix is simple, we should just ensure that we close the file handle before we call 'os.Remove()' .
          Hide
          thuan Thuan Nguyen added a comment -

          I uploaded the collect-logs from Windows in AWS

          Show
          thuan Thuan Nguyen added a comment - I uploaded the collect-logs from Windows in AWS
          Hide
          james.lee James Lee added a comment -

          Thanks Thuan Nguyen, the issue you encountered appears to be the one that I reproduced locally. I'm currently creating a toy build so that I can test the fix before I submit the patch.

          Show
          james.lee James Lee added a comment - Thanks Thuan Nguyen , the issue you encountered appears to be the one that I reproduced locally. I'm currently creating a toy build so that I can test the fix before I submit the patch.
          Hide
          build-team Couchbase Build Team added a comment -

          Build couchbase-server-6.6.0-7905 contains backup commit 44128c1 with commit message:
          MB-40764 Fix Windows specific issues when backing up to cloud

          Show
          build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.0-7905 contains backup commit 44128c1 with commit message: MB-40764 Fix Windows specific issues when backing up to cloud
          Hide
          thuan Thuan Nguyen added a comment - - edited

          Verified on bkrs in build 6.6.0-7905 with s3 bucket on windows 2016

          c:\Program Files\Couchbase\Server\bin>.\cbbackupmgr config -r backup -a s3://bkrepo --obj-access-key-id xxxxxxx --obj-secret-access-key xxxxxxxx  --obj-staging-dir /root/bk-staging  --obj-region us-west-2
          Backup repository `backup` created successfully in archive `s3://bkrepo`
           
          c:\Program Files\Couchbase\Server\bin>.\cbbackupmgr backup -c localhost -u Administrator -p password  -r backup -a s3://bkrepo --obj-access-key-id xxxxxxx --obj-secret-access-key xxxxxx  --obj-staging-dir /root/bk-staging  --obj-region us-west-2
          Backing up to '2020-08-05T20_41_03.8759241Z'
          Copied all data in 3m36.7648568s (Avg. 99.52KB/Sec)                                                               31591 items / 20.99MB
          [=============================================================================================================================] 100.00%
          Backup successfully completed
          Backed up bucket "travel-sample" succeeded
          Mutations backed up: 31591, Mutations failed to backup: 0
          Deletions backed up: 0, Deletions failed to backup: 0
          Skipped due to purge number or conflict resolution: Mutations: 0 Deletions: 0
           
          c:\Program Files\Couchbase\Server\bin>
          c:\Program Files\Couchbase\Server\bin>
          c:\Program Files\Couchbase\Server\bin>
          c:\Program Files\Couchbase\Server\bin>.\cbbackupmgr restore  -c localhost -u Administrator -p password  -r backup -a s3://bkrepo --obj-access-key-id xxxxxx --obj-secret-access-key xxxxxx --obj-staging-dir /root/bk-staging  --obj-region us-west-2 --start 1 --end 1
          (1/1) Restoring backup 2020-08-05T20_41_03.8759241Z '2020-08-05T20_41_03.8759241Z'
          Copied all data in 2m12.9281117s (Avg. 162.64KB/Sec)                                                              31591 items / 20.97MB
          [=============================================================================================================================] 100.00%
          Restore bucket 'travel-sample' succeeded
          Mutations restored: 31591, Mutations failed to restore: 0
          Deletions restored: 0, Deletions failed to restore: 0
          Skipped due to purge number or conflict resolution: Mutations: 0 Deletions: 0
          Restore completed successfully
           
          c:\Program Files\Couchbase\Server\bin>
          

          Show
          thuan Thuan Nguyen added a comment - - edited Verified on bkrs in build 6.6.0-7905 with s3 bucket on windows 2016 c:\Program Files\Couchbase\Server\bin>.\cbbackupmgr config -r backup -a s3://bkrepo --obj-access-key-id xxxxxxx --obj-secret-access-key xxxxxxxx --obj-staging-dir /root/bk-staging --obj-region us-west-2 Backup repository `backup` created successfully in archive `s3://bkrepo`   c:\Program Files\Couchbase\Server\bin>.\cbbackupmgr backup -c localhost -u Administrator -p password -r backup -a s3://bkrepo --obj-access-key-id xxxxxxx --obj-secret-access-key xxxxxx --obj-staging-dir /root/bk-staging --obj-region us-west-2 Backing up to '2020-08-05T20_41_03.8759241Z' Copied all data in 3m36.7648568s (Avg. 99.52KB/Sec) 31591 items / 20.99MB [=============================================================================================================================] 100.00% Backup successfully completed Backed up bucket "travel-sample" succeeded Mutations backed up: 31591, Mutations failed to backup: 0 Deletions backed up: 0, Deletions failed to backup: 0 Skipped due to purge number or conflict resolution: Mutations: 0 Deletions: 0   c:\Program Files\Couchbase\Server\bin> c:\Program Files\Couchbase\Server\bin> c:\Program Files\Couchbase\Server\bin> c:\Program Files\Couchbase\Server\bin>.\cbbackupmgr restore -c localhost -u Administrator -p password -r backup -a s3://bkrepo --obj-access-key-id xxxxxx --obj-secret-access-key xxxxxx --obj-staging-dir /root/bk-staging --obj-region us-west-2 --start 1 --end 1 (1/1) Restoring backup 2020-08-05T20_41_03.8759241Z '2020-08-05T20_41_03.8759241Z' Copied all data in 2m12.9281117s (Avg. 162.64KB/Sec) 31591 items / 20.97MB [=============================================================================================================================] 100.00% Restore bucket 'travel-sample' succeeded Mutations restored: 31591, Mutations failed to restore: 0 Deletions restored: 0, Deletions failed to restore: 0 Skipped due to purge number or conflict resolution: Mutations: 0 Deletions: 0 Restore completed successfully   c:\Program Files\Couchbase\Server\bin>
          Hide
          build-team Couchbase Build Team added a comment -

          Build couchbase-server-7.0.0-2772 contains backup commit 44128c1 with commit message:
          MB-40764 Fix Windows specific issues when backing up to cloud

          Show
          build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-2772 contains backup commit 44128c1 with commit message: MB-40764 Fix Windows specific issues when backing up to cloud

            People

            Assignee:
            thuan Thuan Nguyen
            Reporter:
            thuan Thuan Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Due:
              Created:
              Updated:
              Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty