Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
6.6.0
-
Untriaged
-
1
-
Unknown
Description
Repo "c360_no_analytics" had a number of failures and timeouts and ended up repeatedly failing fast on a client rollback error
Server logs:
https://s3.amazonaws.com/cb-engineering/perry/timers_lost/collectinfo-2021-01-25T091645-ns_1%40172.23.97.84.zip
https://s3.amazonaws.com/cb-engineering/perry/timers_lost/collectinfo-2021-01-25T091645-ns_1%40172.23.97.85.zip
https://s3.amazonaws.com/cb-engineering/perry/timers_lost/collectinfo-2021-01-25T091645-ns_1%40172.23.97.86.zip
and backup logs attached to this ticket
Attachments
Issue Links
For Gerrit Dashboard: MB-43845 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
144352,2 | MB-43845 Improve the DCP rollback error message | master | backup | Status: MERGED | +2 | +1 |
144382,4 | MB-43845 Document the possible solutions to receiving a rollback | master | backup | Status: MERGED | +2 | +1 |
Hi Perry Krug,
Please could you expand on what the issue is here (and possibly what the use case/situation is), I've had a quick look at the logs (assuming the logs from
MB-43844are the correct logs) and I can see that 'cbbackupmgr' is having to handle lots of timeouts (in this case, the cluster appears to be taking a very long time to respond to a simple "vbucket-details" stats call):Timeouts Calculating Data Range
2020-11-09T07:08:38.154-08:00 WARN: (Couchbase) Unexpected error 'operation timed out after 5s' while trying to get sequence numbers, will retry in 5s -- couchbase.GetSequenceNumbers() at sequence_numbers.go:38
...
2020-11-09T07:11:03.535-08:00 WARN: (Couchbase) Unexpected error 'operation timed out after 5s' while trying to get sequence numbers, will retry in 5s -- couchbase.GetSequenceNumbers() at sequence_numbers.go:38 2020-11-09T07:11:18.541-08:00 WARN: (Couchbase) Unexpected error 'operation timed out after 10s' while trying to get sequence numbers, will retry in 10s -- couchbase.GetSequenceNumbers() at sequence_numbers.go:38
...
2021-01-22T08:17:19.998-08:00 WARN: (Couchbase) Unexpected error 'operation timed out after 5s' while trying to get sequence numbers, will retry in 5s -- couchbase.GetSequenceNumbers() at sequence_numbers.go:38
2021-01-22T08:17:35.001-08:00 WARN: (Couchbase) Unexpected error 'operation timed out after 10s' while trying to get sequence numbers, will retry in 10s -- couchbase.GetSequenceNumbers() at sequence_numbers.go:38
2021-01-22T08:18:00.012-08:00 WARN: (Couchbase) Unexpected error 'operation timed out after 15s' while trying to get sequence numbers, will retry in 15s -- couchbase.GetSequenceNumbers() at sequence_numbers.go:38
2021-01-22T08:18:35.014-08:00 WARN: (Couchbase) Unexpected error 'operation timed out after 20s' while trying to get sequence numbers, will retry in 20s -- couchbase.GetSequenceNumbers() at sequence_numbers.go:38
2021-01-22T08:19:20.018-08:00 WARN: (Couchbase) Unexpected error 'operation timed out after 25s' while trying to get sequence numbers, will retry in 25s -- couchbase.GetSequenceNumbers() at sequence_numbers.go:38
We see these log lines scattered throughout the logs, however, it looks like 'cbbackupmgr' is behaving as expected. Please note that failing fast upon receiving a rollback is the intended behavior in 'cbbackupmgr' (whether we're hitting it due to valid reasons is another matter).
I'll have a look though the cluster logs (after looking at
MB-43846).Thanks,
James