Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: 7.6.0
Affects Version/s: 7.1.0
Component/s: tools
Labels:
- couchbase

Triage:
Untriaged
Story Points:
1
Is this a Regression?:
Unknown

Description

What happens
In a forum post Hyun-Ju Vega noticed that we see a weird series of task statuses for the given logs:

 ❯ rg "status\"|cbbackupmgr backup|Received store event|Added task|Running task|Dispatching task|Set running task|Task dispatched"

CBLogs.txt

2:2022-09-12T10:00:05.198Z INFO (Dispatcher) Dispatching task {"clusterName": "self", "repositoryID": "PersistentStorage_06_09_2022", "task": "Hourly"}

3:2022-09-12T10:00:05.678Z INFO (Dispatcher) Dispatching task {"targetNode": {"node_id":"5dfdb3ca7f84ab86904f08fac9d9eef3","host":"10.100.110.129","http_port":8097,"grpc_port":9124,"status":"","leader":false}}

5:2022-09-12T10:00:05.811Z INFO (Worker) Added task to queue {"cluster": "self", "repository": "PersistentStorage_06_09_2022", "task": "Hourly"}

6:2022-09-12T10:00:05.811Z INFO (Worker) Running task {"cluster": "self", "repositoryID": "PersistentStorage_06_09_2022", "state": "active", "taskName": "Hourly", "taskType": "BACKUP"}

7:2022-09-12T10:00:05.812Z DEBUG (Leader Manager) Received store event {"eventType": 5}

8:2022-09-12T10:00:05.812Z DEBUG (Event Handler) (Node Run) Task updates {"cluster": "self", "repository": "PersistentStorage_06_09_2022", "task": "Hourly", "status": "running"}

10:2022-09-12T10:00:05.813Z INFO (Runner) Running command /opt/couchbase/bin/cbbackupmgr backup --cluster http://10.100.110.129:8091 --archive s3://<obscured-s3-bucket>/PersistentStorage --repo 0ef4be99-30f6-46b3-a1e3-8abf0af43ace --purge --auto-select-threads --json-progress 994e4338-8f08-4530-adb5-04880e8a3e5d --obj-region eu-west-1 --obj-staging-dir /cb_backup/PersistentStorage_06_09_2022-0ef4be99-30f6-46b3-a1e3-8abf0af43ace --obj-auth-by-instance-metadata

12:2022-09-12T10:00:05.815Z INFO (Dispatcher) Set running task {"task": {"task_name":"Hourly","status":"waiting","start":"2022-09-12T10:00:05.230739377Z","error_code":0,"type":"BACKUP"}}

13:2022-09-12T10:00:05.822Z INFO (Dispatcher) Task dispatched {"nodeID": "5dfdb3ca7f84ab86904f08fac9d9eef3"}

14:2022-09-12T10:00:05.839Z DEBUG (Leader Manager) Received store event {"eventType": 5}

15:2022-09-12T10:00:05.839Z DEBUG (Event Handler) (Node Run) Task updates {"cluster": "self", "repository": "PersistentStorage_06_09_2022", "task": "Hourly", "status": "waiting"}

It appears the task status goes waiting->running->waiting, which should not happen.

Explanation
The interesting parts here are when we update node runs (eventType 5). Tasks come in to a worker in the Add method. Here we add the task to a channel (which the worker drains), take a lock to update some state and then call SetNodeRun. The channel is drained and tasks are Run. This is the main method in running tasks and we SetNodeRun through the call to updateWorkerCurrentTask.

I think what is happening is Add and Run are on different coroutines. When we send over the channel the Run goroutine wakes up immediately and executes - managing to call updateWorkerCurrentTask before Add has called SetNodeRun.

Fix
We could check the task isn't already running before updating it.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Gilad Kalchheim

Reporter:: Matt Hall

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 15/Sep/22 1:48 AM

Updated:: 12/Feb/24 2:39 AM

Resolved:: 20/Sep/23 1:39 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

MB-53695 Make test store more flexible: Gerrit Review:

MB-53695 Fix race condition w/ incorrect run status: Gerrit Review:

[CBBS] Running task can have status waiting due to concurrency

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty