Operations are silently ignored if the send queue is full

Description

If the send queue is full, operations are not sent and no error is reported back to sender

 

Code to reproduce: (irrelevant code removed)

// disable breaker to prevent it from opening due to timeouts

CircuitBreakerConfiguration.Default.Enabled = false;
var clusterOptions = new ClusterOptions()
{
Password = "xxx",
UserName = "yyy",
CircuitBreakerConfiguration = CircuitBreakerConfiguration.Default,
};

var getOptions = new GetOptions().Timeout(TimeSpan.FromSeconds(100));

var collection = bucket.DefaultCollection();

try
{
Console.WriteLine("Reading key:xyz");
sw.Restart();
var getTasks = Enumerable.Range(1, 10000).Select(i => collection.GetAsync("key:xyz", getOptions).ConfigureAwait(false)).ToArray();
await Task.WhenAll(getTasks.Select(async t => await t));

sw.Stop();

// raw = doc.ContentAs<JToken>();
Console.Write($"GetAsync done in {sw.ElapsedMilliseconds} ms : ");

 

 

 

Environment

None

Gerrit Reviews

None

Release Notes Description

None

Attachments

1

Activity

Show:

Brant Burnett October 30, 2020 at 12:53 AM

Here are some additional settings to help with repro:

In DataFlowConnectionPool, reduce the _sendQueue BoundedCapacity to 1

Under this test scenario, operations do complete, but only at a trickle. No errors appear to be reported on the operations being dropped because the _sendQueue is full.

Tommy Jakobsen October 12, 2020 at 1:36 PM

alternative to projecting with select
var getTasks = Enumerable.Range(1, 10000).Select(i => collection.GetAsync("key:xyz", getOptions)).ToArray();
await Task.WhenAll(getTasks);

Tommy Jakobsen October 12, 2020 at 1:33 PM

Note on how to reproduce:
-Couchbase is running in DockerDesktop on the local dev PC, It may need more load to reproduce in a more powerful env
-The document is present in the DB
-getTasks.Select(async t => await t) just projects to a new list of tasks from , no waiting happens until WhenAll, its just there to convert from configured task,

  • outside the dev env, it typically happens during server warmup i.e. temporary failure lasting for a few minutes or very heavy server load

I am using a local fix with infinite queue size

I am not sure what you mean by "the operation is retried just after that in CleanupDeadConnectionsAsync (line 122). " The connection is not dead in this scenario

when the queue is full, the operation is not queued and thus never sent to the server, but the caller will still wait for it to complete.

Jeffry Morris October 9, 2020 at 3:52 PM

Hi -

I spent some time looking into this and I don't see operations being ignored when the queue is full:

  • If I run you code above without adding the "key:xyz" document to Couchbase, then we end up with 10k DocumentNotFoundExceptions being thrown - this takes an incredibly long time to process, but they all end up firing leading to the intial exception being bubbled up when they complete. If I add the "key:xyz" all operations succeed rather quickly.

  • If I turn the queue-size down (the static default is 1024) to 1, I then see _sendQueue.Post(operation); return false eventually; however, the operation is retried just after that in CleanupDeadConnectionsAsync (line 122). Adding unique keys I can map this in the log as well as debugging through it.

  • If I make the queue-size -1 (unlimited size) and add the "key:xyz" - _sendQueue will never be full and thus never return false.

All that being said, not everything is working correctly, If the queue-size is 1 the first op to skip the queue will be retried on line 122 and succeed; subsequent docs will timeout and the exception bubbled up.

I don't know why they timeout after the initial operation succeeds TBH. This needs more investigation; also the slowness in DocumentNotFoundExceptions can probably be reduced substantially by adding an option to suppress certain exceptions and return null in this case.

Another aside, in your code above, you build a list of Tasks<IGetResult> and await them using a Task.WhenAll, however, before that is awaited you loop through and await each item:

Personally, I would remove the getTasks.Select(async t => await t) and just await the tasks in Task.WhenAll.

Jeffry Morris October 5, 2020 at 4:50 PM

-

Thanks for reporting, we'll get this fixed ASAP.

-Jeff

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Story Points

Components

Fix versions

Affects versions

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created October 5, 2020 at 4:41 PM
Updated November 2, 2020 at 12:33 PM
Resolved November 2, 2020 at 12:33 PM
Instabug