Description
What's the problem?
When using 'WaitUnitlReady' to wait for the gocbcore agent to connect to the cluster we are seeing an 'unambiguous timeout' error when in reality the server disconnected us because we were using the (unmerged in CC) 'backfill_order' control flag.
What do we expect to see?
When we get disconnected from the server, the error should be bubbled up to cbbackupmgr so that it can be handled correctly and returned to the user. I imagine that this isn't the only case in which a timeout will be masking an error that has occurred behind the scenes.
Steps to reproduce
Patrick Varley has commented a concise set of steps needed to reproduce this issue with cbbackupmgr in MB-39653 but to briefly recap:
1) Install CC build 2208 onto a CentOS 7 vagrant
2) Configure a one node cluster with only the data service
3) Create a bucket
4) Load some data in the bucket using cbworkloadgen
5) Run a backup
If we look in the memcached logs we will see:
2020-05-29T18:17:45.505412+00:00 INFO 44: DCP connection opened successfully. PRODUCER, INCLUDE_XATTRS [ [::1]:57896 - [::1]:11210 (<ud>Administrator</ud>) ]
|
2020-05-29T18:17:45.505588+00:00 WARNING 44: (default) DCP (Producer) eq_dcpq:cbbackupmgr_2020-05-29T18:17:20Z_19653_0 - Invalid ctrl parameter 'sequential' for backfill_order
|
2020-05-29T18:17:45.505734+00:00 INFO 44: (No Engine) DCP (Producer) eq_dcpq:cbbackupmgr_2020-05-29T18:17:20Z_19653_0 - Removing connection [ [::1]:57896 - [::1]:11210 (<ud>Administrator</ud>) ]
|
However cbbackupmgr will display:
/opt/couchbase/bin/cbbackupmgr backup -u Administrator -p password -c localhost -a backup -r MB-39653
|
Backing up to '2020-05-29T18_17_20.039976728Z'
|
Copying at 0B/s (about 0s remaining) - Transferring key value data for 'default' 0 items / 0B
|
[===============================================================================================================================================================================================================================================================================] 100.00%
|
Error backing up cluster: operation has timed out
|
Backed up bucket "default" failed
|
Mutations backed up: 0, Mutations failed to backup: 0
|
Deletions backed up: 0, Deletions failed to backup: 0
|
Skipped due to purge number or conflict resolution: Mutations: 0 Deletions: 0
|