Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.6.0
-
Triaged
-
1
-
Yes
-
KV Sprint 2020-June
Description
A couple of our regression tests started intermittently failing recently because of some missing documents that are supposed to be streamed via DCP within a certain timeout (60 seconds).
The tests use cluster_run with 3 nodes as follows:
Node n0: data + analytics
Node n1: data
Node n2: analytics
At around time 2020-06-14T19:52:40.348-07:00, the beer-sample bucket is created and loaded. After that, Analytics starts DCP streams to the bucket.
When I checked the logs, I noticed that we are receiving all the documents from one of the data nodes but none from the other.
Following the memcached logs for a vbucket from each node, we can see the following:
Node n0 (vb:1):
2020-06-14T19:52:40.471388-07:00 INFO 57: HELO [cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:1] XERROR, Select bucket [ 172.22.0.2:35852 - 172.22.0.2:12000 (<ud>@cbas</ud>) ]
|
2020-06-14T19:52:40.528744-07:00 INFO 57: (beer-sample) DCP (Producer) eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:1 - (vb:1) Creating stream with start seqno 0 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615, collections-manifest uid:none, sid:none
|
2020-06-14T19:52:40.528785-07:00 INFO 57: (beer-sample) DCP (Producer) eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:1 - (vb:1) ActiveStream::scheduleBackfill_UNLOCKED register cursor with name "eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:1" backfill:true, seqno:9
|
2020-06-14T19:52:40.528801-07:00 INFO 57: (beer-sample) DCP (Producer) eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:1 - (vb:1) Scheduling backfill from 1 to 8, reschedule flag : False
|
2020-06-14T19:52:40.528977-07:00 INFO 57: (beer-sample) DCP (Producer) eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:1 - (vb:1) ActiveStream::markDiskSnapshot: Sending disk snapshot with start 0, end 8, and high completed --, max visible 8
|
2020-06-14T19:52:40.528986-07:00 INFO 57: (beer-sample) DCP (Producer) eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:1 - (vb:1) ActiveStream::registerCursor name "eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:1", backfill:true, seqno:9
|
2020-06-14T19:52:40.639592-07:00 INFO 57: (beer-sample) DCP (Producer) eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:1 - (vb:1) Backfill complete, 0 items read from disk, 8 from memory, last seqno read: 8, pendingBackfill : False
|
2020-06-14T19:52:40.639657-07:00 INFO 57: (beer-sample) DCP (Producer) eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:1 - (vb:1) ActiveStream::transitionState: Transitioning from backfilling to in-memory
|
2020-06-14T19:53:40.931213-07:00 WARNING 57: (beer-sample) DCP (Producer) eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:1 - (vb:1) Stream closing, sent until seqno 8 remaining items 0, reason: The stream closed early because the conn was disconnected
|
Node: n1 (vb:516) the backfill snapshot is never sent:
2020-06-14T19:52:40.508361-07:00 INFO 50: HELO [cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:0] XERROR, Select bucket [ 172.22.0.2:37014 - 172.22.0.2:12002 (<ud>@cbas</ud>) ]
|
2020-06-14T19:52:40.541900-07:00 INFO 50: (beer-sample) DCP (Producer) eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:0 - (vb:516) Creating stream with start seqno 0 and end seqno 18446744073709551615; requested end seqno was 18446744073709551615, collections-manifest uid:none, sid:none
|
2020-06-14T19:52:40.541936-07:00 INFO 50: (beer-sample) DCP (Producer) eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:0 - (vb:516) ActiveStream::scheduleBackfill_UNLOCKED register cursor with name "eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:0" backfill:true, seqno:6
|
2020-06-14T19:52:40.541951-07:00 INFO (beer-sample) Backfill for eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:0 vb:516 is pending
|
2020-06-14T19:52:40.541956-07:00 INFO 50: (beer-sample) DCP (Producer) eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:0 - (vb:516) Scheduling backfill from 1 to 5, reschedule flag : False
|
2020-06-14T19:53:40.916919-07:00 WARNING 50: (beer-sample) DCP (Producer) eq_dcpq:cbas:Default.Local:beer-sample:78d4ab44c7eb544b6cd41208352a4783:0 - (vb:516) Stream closing, sent until seqno 0 remaining items 0, reason: The stream closed early because the conn was disconnected
|
We haven't made any major changes to the DCP streaming client and that test used to pass all the time within that configured timeout of 60 seconds.
cbcollect_info from the nodes is attached.