Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31830

CLONE - cbbackupmgr backup did not fail or warn when a node was down

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 5.0.0, 5.1.0, 5.5.0
    • 5.5.3
    • tools
    • 4 nodes of data, 2 of query +index
    • Triaged
    • No

    Description

      Problem

      Backup is hangs instead of returning an error when trying to backup a cluster which is not in full capacity (1 data node is down, not failed over):

      Copying at 0B (estimating time remaining)                                                                                                                                                                                                                                                                                                                     0 items / 0B
      beer-sample             [                                                                                                                                                                                                                                                                                                                                        ] 100.00%
      

      Background

      When trying to backup a cluster, if 1 or more node(s) is down, when fail-over is not active, backup should fail, 

      Steps to reproduce

      1. turn off auto-failover
      2. stop the couchbase-server on 1 data node
      3. try to do a backup with cbbackupmgr on that cluster
      4. backup would hang without error message 
      5. starting the server again on that node would not change the state

      Request

      cbbackupmgr should fail with an error when a node is down and message the client that there is a data integrity issue and a backup cannot be completed at this time.#

      Logs

      2018-06-07T16:47:40.461+00:00 (Cmd) config -a /home/vagrant/backup -r test 
      2018-06-07T16:47:40.461+00:00 (Cmd) Backup repository `test` created successfully in archive `/home/vagrant/backup`
      2018-06-07T16:48:22.858+00:00 (Cmd) backup -a /home/vagrant/backup -r test -c localhost -u Administrator -p ******** 
      2018-06-07T16:48:22.859+00:00 (Rest) GET http://localhost:8091/pools 200
      2018-06-07T16:48:22.863+00:00 (Rest) GET http://localhost:8091/pools/default/buckets 200
      2018-06-07T16:48:22.864+00:00 (Plan) Executing transfer plan
      2018-06-07T16:48:22.864+00:00 (Plan) Checking for data movement restrictions between beer-sample and beer-sample
      2018-06-07T16:48:22.868+00:00 (Rest) GET http://localhost:8091/pools/default/buckets 200
      2018-06-07T16:48:22.870+00:00 (Rest) GET http://localhost:8091/pools/default 200
      2018-06-07T16:48:22.870+00:00 Transfering from Couchbase Server 5.1.0
      2018-06-07T16:48:22.871+00:00 Lowering forestdb buffer cache size to 1073741824 due to insufficient memory
      2018-06-07T16:48:22.871+00:00 Lowering forestdb buffer cache size to 536870912 due to insufficient memory
      2018-06-07T16:48:22.900+00:00 [INFO][FDB] Forestdb blockcache size 536870912 initialized in 28688 us
       
      2018-06-07T16:48:22.900+00:00 [INFO][FDB] Forestdb opened database file /home/vagrant/backup/test/2018-06-07T16_48_22.864320064Z/beer-sample-bde74a655e67fb765d08580b94aeb41a/data/shard_0.fdb
      2018-06-07T16:48:22.904+00:00 [INFO][FDB] Forestdb closed database file /home/vagrant/backup/test/2018-06-07T16_48_22.864320064Z/beer-sample-bde74a655e67fb765d08580b94aeb41a/data/shard_0.fdb
      2018-06-07T16:48:22.911+00:00 (Plan) Transfering bucket configuration for beer-sample to beer-sample
      2018-06-07T16:48:22.916+00:00 (Rest) GET http://localhost:8091/pools/default/buckets 200
      2018-06-07T16:48:22.917+00:00 (Plan) Transfering views definitions for beer-sample to beer-sample
      2018-06-07T16:48:22.921+00:00 (Rest) GET http://localhost:8091/pools/default/buckets 200
      2018-06-07T16:48:22.922+00:00 (Rest) GET http://localhost:8091/pools/default/nodeServices 200
      2018-06-07T16:48:22.926+00:00 (Rest) GET http://10.112.175.101:8091/pools/default/buckets/beer-sample/ddocs 200
      2018-06-07T16:48:22.927+00:00 (Plan) Transfering GSI index definitions for beer-sample to beer-sample
      2018-06-07T16:48:22.927+00:00 (Rest) GET http://localhost:8091/pools/default/nodeServices 200
      2018-06-07T16:48:22.928+00:00 (Plan) Transfering full text index definitions for beer-sample to beer-sample
      2018-06-07T16:48:22.928+00:00 (Rest) GET http://localhost:8091/pools/default/nodeServices 200
      2018-06-07T16:48:22.928+00:00 (Plan) Deciding which key value data to transfer for beer-sample
      2018-06-07T16:48:22.928+00:00 (Rest) GET http://localhost:8091/pools/default/nodeServices 200
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Seems to be working fine in 5.1.3-6210:

            [root@node2-cb500-testing-centos6 bin]# ./cbbackupmgr backup -a /tmp/entbackup -r backup -c 10.111.170.101:8091 -u Administrator -p password

            Backing up to 2018-11-21T23_22_41.409543964Z
            Copying at 0B (estimating time remaining) 0 items / 0B
            default [ ] 100.00%
            Error backing up cluster: Unable to find the latest vbucket sequence numbers. This might be due to a node in the cluster being unreachable.
            [root@node2-cb500-testing-centos6 bin]#

            arunkumar Arunkumar Senthilnathan added a comment - Seems to be working fine in 5.1.3-6210: [root@node2-cb500-testing-centos6 bin] # ./cbbackupmgr backup -a /tmp/entbackup -r backup -c 10.111.170.101:8091 -u Administrator -p password Backing up to 2018-11-21T23_22_41.409543964Z Copying at 0B (estimating time remaining) 0 items / 0B default [ ] 100.00% Error backing up cluster: Unable to find the latest vbucket sequence numbers. This might be due to a node in the cluster being unreachable. [root@node2-cb500-testing-centos6 bin] #
            wayne Wayne Siu added a comment -

            Patrick Varley

            Did you have a chance to review the ticket?  Arun reported that the issue still there in 5.5.3.  Thanks.

            wayne Wayne Siu added a comment - Patrick Varley Did you have a chance to review the ticket?  Arun reported that the issue still there in 5.5.3.  Thanks.

            It was because the 5.5.3 branch does not contain the fix. Mike Wiederhold [X] has merged in the Vulcan branch into 5.5.3 see the orginal MB-30013.

            pvarley Patrick Varley added a comment - It was because the 5.5.3 branch does not contain the fix. Mike Wiederhold [X] has merged in the Vulcan branch into 5.5.3 see the orginal MB-30013 .

            Build couchbase-server-5.5.3-4035 contains backup commit e19c4bb with commit message:
            MB-30013 Handle timeouts correctly on GetSequenceNumbers()

            arunkumar Arunkumar Senthilnathan added a comment - Build couchbase-server-5.5.3-4035 contains backup commit e19c4bb with commit message: MB-30013 Handle timeouts correctly on GetSequenceNumbers()

            Verified in 5.5.3-4035:

            [root@node2-cb500-testing-centos6 bin]# ./cbbackupmgr backup -a /tmp/entbackup -r backup -u Administrator -p password -c 10.111.170.101:8091

            Backing up to 2018-11-27T23_33_20.880082272Z
            Copying at 0B (estimating time remaining) 0 items / 0B
            default [ ] 100.00%
            Error backing up cluster: Unable to find the latest vbucket sequence numbers. This might be due to a node in the cluster being unreachable.

            arunkumar Arunkumar Senthilnathan added a comment - Verified in 5.5.3-4035: [root@node2-cb500-testing-centos6 bin] # ./cbbackupmgr backup -a /tmp/entbackup -r backup -u Administrator -p password -c 10.111.170.101:8091 Backing up to 2018-11-27T23_33_20.880082272Z Copying at 0B (estimating time remaining) 0 items / 0B default [ ] 100.00% Error backing up cluster: Unable to find the latest vbucket sequence numbers. This might be due to a node in the cluster being unreachable.

            People

              pvarley Patrick Varley
              arunkumar Arunkumar Senthilnathan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty