Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-39653

[CBM] cbbackupmgr failed to backup data

    XMLWordPrintable

Details

    • Untriaged
    • Centos 64-bit
    • 1
    • Yes

    Description

      Install Couchbase server 7.0.0-2208 on a centos 7.6 server.

      Create default bucket and load 1K doc

      Run backup, backup failed.

      2020-05-29T10:42:09.555-07:00 (Cmd) Error backing up cluster: failed to execute cluster operations: failed to execute bucket operations: failed to transfer bucket data for bucket 'default': failed to transfer key value data: failed to transfer key value data: failed to initilise worker 0: failed to get gocbcore DCP agent: agent failed to connect to the cluster: unambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"unambiguous timeout"}},"OperationID":"WaitUntilReady","Opaque":"","TimeObserved":30000231971,"RetryReasons":null,"RetryAttempts":0,"LastDispatchedTo":"","LastDispatchedFrom":"","LastConnectionID":""}
      2020-05-29T10:42:09.555-07:00 (Cmd) Backed up bucket "default" failed
      2020-05-29T10:42:09.555-07:00 (Cmd) Mutations backed up: 0, Mutations failed to backup: 0
      2020-05-29T10:42:09.555-07:00 (Cmd) Deletions backed up: 0, Deletions failed to backup: 0
      2020-05-29T10:42:09.555-07:00 (Cmd) Skipped due to purge number or conflict resolution: Mutations: 0 Deletions: 0 

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            This is simple test, no kill memcached or erlang in 7.0.0

            thuan Thuan Nguyen added a comment - This is simple test, no kill memcached or erlang in 7.0.0

            Need the cluster logs as gocbcore is reporting a timeout issue.

            pvarley Patrick Varley added a comment - Need the cluster logs as gocbcore is reporting a timeout issue.
            james.lee James Lee added a comment -

            Hi Thuan Nguyen, please could you provide detailed steps to reproduce with cbbackupmgr and Couchbase Server collect logs.

            james.lee James Lee added a comment - Hi Thuan Nguyen , please could you provide detailed steps to reproduce with cbbackupmgr and Couchbase Server collect logs.
            pvarley Patrick Varley added a comment - - edited

            Reproduced the issue locally here are my steps:

            1. Downloaded 2208 onto Centos7 Vagrant
            2. Configured a One node cluster with just the Data service
            2. Created a bucket called default
            3. used cbworkloadgen

            /opt/couchbase/bin/cbworkloadgen  -u Administrator -p password
            

            4. Created a backup repo

            /opt/couchbase/bin/cbbackupmgr config -a backup -r MB-39653
            

            5. Ran a backup

             /opt/couchbase/bin/cbbackupmgr backup -u Administrator -p password -c localhost -a backup -r MB-39653
            Backing up to '2020-05-29T18_17_20.039976728Z'
            Copying at 0B/s (about 0s remaining) - Transferring key value data for 'default'                                                                                                                                                                                             0 items / 0B
            [===============================================================================================================================================================================================================================================================================] 100.00%
            Error backing up cluster: operation has timed out
            Backed up bucket "default" failed
            Mutations backed up: 0, Mutations failed to backup: 0
            Deletions backed up: 0, Deletions failed to backup: 0
            Skipped due to purge number or conflict resolution: Mutations: 0 Deletions: 0
            

            The issue is that the cluster is not supporting the backfill_order control message:

            2020-05-29T18:17:45.505412+00:00 INFO 44: DCP connection opened successfully. PRODUCER, INCLUDE_XATTRS [ [::1]:57896 - [::1]:11210 (<ud>Administrator</ud>) ]
            2020-05-29T18:17:45.505588+00:00 WARNING 44: (default) DCP (Producer) eq_dcpq:cbbackupmgr_2020-05-29T18:17:20Z_19653_0 - Invalid ctrl parameter 'sequential' for backfill_order
            2020-05-29T18:17:45.505734+00:00 INFO 44: (No Engine) DCP (Producer) eq_dcpq:cbbackupmgr_2020-05-29T18:17:20Z_19653_0 - Removing connection [ [::1]:57896 - [::1]:11210 (<ud>Administrator</ud>) ]
            

            I suspect KV-engine has not merged forward that change yet into 6.6.0

            In any case the error message produce was not useful, we will open up a defect with gocb and see how that is bubbled up to cbbackupmgr.

            pvarley Patrick Varley added a comment - - edited Reproduced the issue locally here are my steps: 1. Downloaded 2208 onto Centos7 Vagrant 2. Configured a One node cluster with just the Data service 2. Created a bucket called default 3. used cbworkloadgen /opt/couchbase/bin/cbworkloadgen -u Administrator -p password 4. Created a backup repo /opt/couchbase/bin/cbbackupmgr config -a backup -r MB-39653 5. Ran a backup /opt/couchbase/bin/cbbackupmgr backup -u Administrator -p password -c localhost -a backup -r MB-39653 Backing up to '2020-05-29T18_17_20.039976728Z' Copying at 0B/s (about 0s remaining) - Transferring key value data for 'default' 0 items / 0B [===============================================================================================================================================================================================================================================================================] 100.00% Error backing up cluster: operation has timed out Backed up bucket "default" failed Mutations backed up: 0, Mutations failed to backup: 0 Deletions backed up: 0, Deletions failed to backup: 0 Skipped due to purge number or conflict resolution: Mutations: 0 Deletions: 0 — The issue is that the cluster is not supporting the backfill_order control message: 2020-05-29T18:17:45.505412+00:00 INFO 44: DCP connection opened successfully. PRODUCER, INCLUDE_XATTRS [ [::1]:57896 - [::1]:11210 (<ud>Administrator</ud>) ] 2020-05-29T18:17:45.505588+00:00 WARNING 44: (default) DCP (Producer) eq_dcpq:cbbackupmgr_2020-05-29T18:17:20Z_19653_0 - Invalid ctrl parameter 'sequential' for backfill_order 2020-05-29T18:17:45.505734+00:00 INFO 44: (No Engine) DCP (Producer) eq_dcpq:cbbackupmgr_2020-05-29T18:17:20Z_19653_0 - Removing connection [ [::1]:57896 - [::1]:11210 (<ud>Administrator</ud>) ] I suspect KV-engine has not merged forward that change yet into 6.6.0 In any case the error message produce was not useful, we will open up a defect with gocb and see how that is bubbled up to cbbackupmgr.

            So I will assign it to kv team.

            thuan Thuan Nguyen added a comment - So I will assign it to kv team.
            owend Daniel Owen added a comment -

            We should not fail like this because non pre 6.6.0 cluster will not have the backfill_order control message.

            owend Daniel Owen added a comment - We should not fail like this because non pre 6.6.0 cluster will not have the backfill_order control message.
            james.lee James Lee added a comment -

            Hi Daniel Owen,

            We won't fail backing up clusters that don't support sequential backfilling because there is a version check to handle this case. We will only send the control flag if the version is greater than 6.6.0.

            james.lee James Lee added a comment - Hi Daniel Owen , We won't fail backing up clusters that don't support sequential backfilling because there is a version check to handle this case. We will only send the control flag if the version is greater than 6.6.0.
            drigby Dave Rigby added a comment -

            James Lee In general I wouldn't use version checks if possible - they are brittle for exactly this kind of reason.

            The whole point of things like DCP_CONTROL is the client / server can negotiate functionality based on the response of the server. If cbbackupmgr tried to use the new scheduling order, and adapts as appropriate given the response (in this case it might be sufficient to just accept either a success or EINVAL result and continue regardless).

            As another example, what if in future (7.x?) KV-Engine decided we needed to change the permitted values of backfill_order. If you did the feature negotiation (and didn't mind if the request failed) then you're automatically future-compatible for this particular control message.

            drigby Dave Rigby added a comment - James Lee In general I wouldn't use version checks if possible - they are brittle for exactly this kind of reason. The whole point of things like DCP_CONTROL is the client / server can negotiate functionality based on the response of the server. If cbbackupmgr tried to use the new scheduling order, and adapts as appropriate given the response (in this case it might be sufficient to just accept either a success or EINVAL result and continue regardless). As another example, what if in future (7.x?) KV-Engine decided we needed to change the permitted values of backfill_order . If you did the feature negotiation (and didn't mind if the request failed) then you're automatically future-compatible for this particular control message.
            james.lee James Lee added a comment -

            Marking as resolved since KV have merged the 'backfill_order' control flag forward into master. I completely agree Dave Rigby in an ideal world that's exactly how it should be done, however, that isn't the interface which is exposed by gocbcore.

            james.lee James Lee added a comment - Marking as resolved since KV have merged the 'backfill_order' control flag forward into master. I completely agree Dave Rigby in an ideal world that's exactly how it should be done, however, that isn't the interface which is exposed by gocbcore.

            James Lee Dave Rigby do we know the ticket responsible for merging the 'backfill_order' control flag forward into master? I also see a GOCBC linked - will the testcase fail until this issue is resolved?

            arunkumar Arunkumar Senthilnathan (Inactive) added a comment - James Lee Dave Rigby do we know the ticket responsible for merging the 'backfill_order' control flag forward into master? I also see a GOCBC linked - will the testcase fail until this issue is resolved?

            I also see a GOCBC linked - will the testcase fail until this issue is resolved?

            GOCBC-905 is just to improve the error message if this happens again.

            do we know the ticket responsible for merging the 'backfill_order' control flag forward into master?

            It's in 7.0.0-2192 - see MB-39529

            pvarley Patrick Varley added a comment - I also see a GOCBC linked - will the testcase fail until this issue is resolved? GOCBC-905 is just to improve the error message if this happens again. do we know the ticket responsible for merging the 'backfill_order' control flag forward into master? It's in 7.0.0-2192 - see MB-39529

            Are there more fixes expected from KV side which will fix this issue? We still see the testcases failing in 7.0.0-2221

            arunkumar Arunkumar Senthilnathan (Inactive) added a comment - Are there more fixes expected from KV side which will fix this issue? We still see the testcases failing in 7.0.0-2221
            james.lee James Lee added a comment -

            Arunkumar Senthilnathan It looks like I closed this prematurely, not all of the KV changes have been merged forward into master yet. I'll keep an eye on MB-37680 and close this issue once the required change has been merged forward.

            james.lee James Lee added a comment - Arunkumar Senthilnathan It looks like I closed this prematurely, not all of the KV changes have been merged forward into master yet. I'll keep an eye on MB-37680 and close this issue once the required change has been merged forward.
            james.lee James Lee added a comment -

            Marking as resolved, the KV required has now been merged into master.

            james.lee James Lee added a comment - Marking as resolved, the KV required has now been merged into master.

            I will test again in build which has this fix.

            thuan Thuan Nguyen added a comment - I will test again in build which has this fix.

            Verified on build 7.0.0-2253. cbbackupmgr could run backup without error.

            thuan Thuan Nguyen added a comment - Verified on build 7.0.0-2253. cbbackupmgr could run backup without error.

            People

              thuan Thuan Nguyen
              thuan Thuan Nguyen
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty