Uploaded image for project: 'Couchbase .NET client library'
  1. Couchbase .NET client library
  2. NCBC-2664

Operations are silently ignored if the send queue is full

    XMLWordPrintable

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.5
    • 3.0.7
    • library
    • None
    • 1

    Description

      If the send queue is full, operations are not sent and no error is reported back to sender

       

      Code to reproduce: (irrelevant code removed)

      // disable breaker to prevent it from opening due to timeouts

      CircuitBreakerConfiguration.Default.Enabled = false;
      var clusterOptions = new ClusterOptions()

      { Password = "xxx", UserName = "yyy", CircuitBreakerConfiguration = CircuitBreakerConfiguration.Default, }

      ;

      var getOptions = new GetOptions().Timeout(TimeSpan.FromSeconds(100));

      var collection = bucket.DefaultCollection();

      try

      { Console.WriteLine("Reading key:xyz"); sw.Restart(); var getTasks = Enumerable.Range(1, 10000).Select(i => collection.GetAsync("key:xyz", getOptions).ConfigureAwait(false)).ToArray(); await Task.WhenAll(getTasks.Select(async t => await t)); sw.Stop(); // raw = doc.ContentAs<JToken>(); Console.Write($"GetAsync done in \{sw.ElapsedMilliseconds}

      ms : ");

       

       

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          _sendQueue.Post(operation); returns false if queue is full. this is ignored

          tommyja Tommy Jakobsen added a comment - _sendQueue.Post(operation); returns false if queue is full. this is ignored
          jmorris Jeff Morris added a comment -

          Tommy Jakobsen -

          Thanks for reporting, we'll get this fixed ASAP.

          -Jeff

          jmorris Jeff Morris added a comment - Tommy Jakobsen - Thanks for reporting, we'll get this fixed ASAP. -Jeff
          jmorris Jeff Morris added a comment -

          Hi Tommy Jakobsen -

          I spent some time looking into this and I don't see operations being ignored when the queue is full:

          • If I run you code above without adding the "key:xyz" document to Couchbase, then we end up with 10k DocumentNotFoundExceptions being thrown - this takes an incredibly long time to process, but they all end up firing leading to the intial exception being bubbled up when they complete. If I add the "key:xyz" all operations succeed rather quickly.
          • If I turn the queue-size down (the static default is 1024) to 1, I then see _sendQueue.Post(operation); return false eventually; however, the operation is retried just after that in CleanupDeadConnectionsAsync (line 122). Adding unique keys I can map this in the log as well as debugging through it.
          • If I make the queue-size -1 (unlimited size) and add the "key:xyz" - _sendQueue will never be full and thus never return false.

          All that being said, not everything is working correctly, If the queue-size is 1 the first op to skip the queue will be retried on line 122 and succeed; subsequent docs will timeout and the exception bubbled up.

          I don't know why they timeout after the initial operation succeeds TBH. This needs more investigation; also the slowness in DocumentNotFoundExceptions can probably be reduced substantially by adding an option to suppress certain exceptions and return null in this case.

          Another aside, in your code above, you build a list of Tasks<IGetResult> and await them using a Task.WhenAll, however, before that is awaited you loop through and await each item:

           await Task.WhenAll(getTasks.Select(async t => await t));
          

          Personally, I would remove the getTasks.Select(async t => await t) and just await the tasks in Task.WhenAll.

          jmorris Jeff Morris added a comment - Hi Tommy Jakobsen - I spent some time looking into this and I don't see operations being ignored when the queue is full: If I run you code above without adding the "key:xyz" document to Couchbase, then we end up with 10k DocumentNotFoundExceptions being thrown - this takes an incredibly long time to process, but they all end up firing leading to the intial exception being bubbled up when they complete. If I add the "key:xyz" all operations succeed rather quickly. If I turn the queue-size down (the static default is 1024) to 1, I then see _sendQueue.Post(operation); return false eventually; however, the operation is retried just after that in CleanupDeadConnectionsAsync (line 122). Adding unique keys I can map this in the log as well as debugging through it. If I make the queue-size -1 (unlimited size) and add the "key:xyz" - _sendQueue will never be full and thus never return false. All that being said, not everything is working correctly, If the queue-size is 1 the first op to skip the queue will be retried on line 122 and succeed; subsequent docs will timeout and the exception bubbled up. I don't know why they timeout after the initial operation succeeds TBH. This needs more investigation; also the slowness in DocumentNotFoundExceptions can probably be reduced substantially by adding an option to suppress certain exceptions and return null in this case. Another aside, in your code above, you build a list of Tasks<IGetResult> and await them using a Task.WhenAll, however, before that is awaited you loop through and await each item: await Task.WhenAll(getTasks.Select(async t => await t)); Personally, I would remove the getTasks.Select(async t => await t) and just await the tasks in Task.WhenAll.

          Note on how to reproduce:
          -Couchbase is running in DockerDesktop on the local dev PC, It may need more load to reproduce in a more powerful env
          -The document is present in the DB
          -getTasks.Select(async t => await t) just projects to a new list of tasks from , no waiting happens until WhenAll, its just there to convert from configured task,

          • outside the dev env, it typically happens during server warmup i.e. temporary failure lasting for a few minutes or very heavy server load

          I am using a local fix with infinite queue size

          I am not sure what you mean by "the operation is retried just after that in CleanupDeadConnectionsAsync (line 122). " The connection is not dead in this scenario

          when the queue is full, the operation is not queued and thus never sent to the server, but the caller will still wait for it to complete.

          tommyja Tommy Jakobsen added a comment - Note on how to reproduce: -Couchbase is running in DockerDesktop on the local dev PC, It may need more load to reproduce in a more powerful env -The document is present in the DB -getTasks.Select(async t => await t) just projects to a new list of tasks from , no waiting happens until WhenAll, its just there to convert from configured task, outside the dev env, it typically happens during server warmup i.e. temporary failure lasting for a few minutes or very heavy server load I am using a local fix with infinite queue size I am not sure what you mean by "the operation is retried just after that in CleanupDeadConnectionsAsync (line 122). " The connection is not dead in this scenario when the queue is full, the operation is not queued and thus never sent to the server, but the caller will still wait for it to complete.

          alternative to projecting with select
          var getTasks = Enumerable.Range(1, 10000).Select(i => collection.GetAsync("key:xyz", getOptions)).ToArray();
          await Task.WhenAll(getTasks);

          tommyja Tommy Jakobsen added a comment - alternative to projecting with select var getTasks = Enumerable.Range(1, 10000).Select(i => collection.GetAsync("key:xyz", getOptions)).ToArray(); await Task.WhenAll(getTasks);

          Here are some additional settings to help with repro:

          In DataFlowConnectionPool, reduce the _sendQueue BoundedCapacity to 1

          // Only use one processor core
          var proc = Process.GetCurrentProcess();
          long affinity = (long) proc.ProcessorAffinity;
          affinity &= 1;
          proc.ProcessorAffinity = (IntPtr) affinity;
           
          // Limit to 2 connections, can't do 1 because we need to use the pool
          options.NumKvConnections = 2;
          options.MaxKvConnections = 2;
          options.CircuitBreakerConfiguration.Enabled = false;
          

          Under this test scenario, operations do complete, but only at a trickle. No errors appear to be reported on the operations being dropped because the _sendQueue is full.

          btburnett3 Brant Burnett added a comment - Here are some additional settings to help with repro: In DataFlowConnectionPool, reduce the _sendQueue BoundedCapacity to 1 // Only use one processor core var proc = Process.GetCurrentProcess(); long affinity = (long) proc.ProcessorAffinity; affinity &= 1; proc.ProcessorAffinity = (IntPtr) affinity;   // Limit to 2 connections, can't do 1 because we need to use the pool options.NumKvConnections = 2; options.MaxKvConnections = 2; options.CircuitBreakerConfiguration.Enabled = false; Under this test scenario, operations do complete, but only at a trickle. No errors appear to be reported on the operations being dropped because the _sendQueue is full.

          People

            btburnett3 Brant Burnett
            tommyja Tommy Jakobsen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty