Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31869

Remove DCPConsumer internal buffer

    XMLWordPrintable

Details

    Description

      Summary

      The DCP consumer has a replication throttle that allows new items to be placed on a buffer and then processed at a later time.

      The throttle was designed to be based on the amount of free memory and the disk queue size. In reality the disk queue element is not used.

      The problem with having a buffer (and one that is only employed when have very high memory usage) is that introduces potential for increased bugs - as we have to ensure that all the items in the buffer are processed before any new items, see MB-31547.

      Also buffering and returning ENGINE_SUCCESS when we are out of memory is providing no benefit, as we are using additional memory to buffer.

       

      Details

      The DCPConsumer Buffer is functional to our FlowControl implementation. Thus, before removing the Buffer we need to understand what role it plays in the FlowControl implementation and replace that with something else.

      A couple of preliminary points:

      • Purpose of FlowControl is to impose a limit on the DCP allocations at Consumer.
      • A DCPConsumer cannot reject any inbound DCP message in the current DCP replication design. That is because we don't have any re-send mechanism at Producer other than going through a stream shutdown/reopen.

      So FlowControl isn't implemented as a direct DCPConsumer backoff mechanism. Differently, FlowControl is based on Consumer->Producer buffer-acknowledgement of received DCP bytes. Consumer's backoff mechanism consists of stopping to acknowledge received bytes to the Producer. The Producer stops sending DCP data after accumulating "some" amount of unacked bytes over the stream. That "some" is number defined at Consumer and set at Producer via a DcpControl message at stream handshake.

      Quick collateral note: Historically we have discussed about completely removing any internal FlowControl implementation and just relying on the TCP congestion control at OS level. Unfortunately we can't do that in the current ns_server/memcached design, see below for in the "Original proposal" section on why that proposal was discarded.

      So for what described so far our FlowControl implementation must have two characteristics:

      1. It has to implement the BufferAck mechanism (ie, the Consumer acks bytes of received DCP data back to the Producer when the Consumer's node is healthy; the Consumer stops acking bytes when the bucket hits some OOM-condition)
      2. It can't reject any DCP data

      The DCP Consumer's Buffer solves those two problems:

      • The Consumer receives some DCP message (eg mutation) over a PassiveStream
      • If the bucket memory state is under control then PassiveStream processes the message (ie, insert into the HashTable and queue into Checkpoint) and sends a BufferAck message back to the Producer
      • Else (OOM-condition) then PassiveStream inserts the message into its Buffer (ie, no processing, no BufferAck)
      • At OOM PassiveStream also schedules the DcpConsumerTask, which is responsible for checking whether the bucket has recovered from the OOM-condition and processing the buffered massages. Processing the buffered messages also means acking those bytes back to the Producer.

      Proposed alternative for getting rid of the Consumer's Buffer:

      • The Consumer receives some DCP message (eg mutation) over a PassiveStream (not changed)
      • If the bucket memory state is under control then PassiveStream processes the message (ie, insert into the HashTable and queue into Checkpoint) and sends a BufferAck message back to the Producer (not changed)
      • Else (OOM-condition) then PassiveStream forcibly processes the message (ie, insert into the HashTable and queue into Checkpoint) but it does NOT send any BufferAck to the Producer. Outstanding unacked bytes are added into some PassiveStream accumulator member (a simple size_t)
      • At OOM PassiveStream also schedules the DcpConsumerTask, which is responsible for checking whether the bucket has recovered from the OOM-condition and only for acking outstanding unacked bytes back to the Producer

       

      Original proposal (Won't do)

      Reason for discarding:

      Removing FlowControl and DCP buffering at Consumer may lead to filling the kernel socket buffer on the Consumer node.
      While that is perfectly fine from memcached point of view, that may block ns_server when it wants to send DCP messages over the proxy-connection like AddStream/CloseStream (eg, tearing down replication).

      By removing the DCP consumer buffer we simplify the control flow within the passive_stream and can remove the code associated with ensuring the items in the buffer are processed before new incoming items.

      Instead of using a separate buffer that needs to be managed we can keep the incoming item(s) in the libevent buffer and returning EWOULD_BLOCK. Then when the consumer frees some memory we can notify libevent to restart processing the items.

      Removing the DCP consumer buffer allows us to remove the replication_throttle, which was only used to control that we should buffer when hitting the memory threshold.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              owend Daniel Owen
              owend Daniel Owen
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are 2 open Gerrit changes

                  PagerDuty