Details
-
Bug
-
Resolution: Fixed
-
Critical
-
master
-
KV Sprint 2020-Oct, KV 2023-4, March-June 24
Description
Summary
The DCP consumer has a replication throttle that allows new items to be placed on a buffer and then processed at a later time.
The throttle was designed to be based on the amount of free memory and the disk queue size. In reality the disk queue element is not used.
The problem with having a buffer (and one that is only employed when have very high memory usage) is that introduces potential for increased bugs - as we have to ensure that all the items in the buffer are processed before any new items, see MB-31547.
Also buffering and returning ENGINE_SUCCESS when we are out of memory is providing no benefit, as we are using additional memory to buffer.
Details
The DCPConsumer Buffer is functional to our FlowControl implementation. Thus, before removing the Buffer we need to understand what role it plays in the FlowControl implementation and replace that with something else.
A couple of preliminary points:
- Purpose of FlowControl is to impose a limit on the DCP allocations at Consumer.
- A DCPConsumer cannot reject any inbound DCP message in the current DCP replication design. That is because we don't have any re-send mechanism at Producer other than going through a stream shutdown/reopen.
So FlowControl isn't implemented as a direct DCPConsumer backoff mechanism. Differently, FlowControl is based on Consumer->Producer buffer-acknowledgement of received DCP bytes. Consumer's backoff mechanism consists of stopping to acknowledge received bytes to the Producer. The Producer stops sending DCP data after accumulating "some" amount of unacked bytes over the stream. That "some" is number defined at Consumer and set at Producer via a DcpControl message at stream handshake.
Quick collateral note: Historically we have discussed about completely removing any internal FlowControl implementation and just relying on the TCP congestion control at OS level. Unfortunately we can't do that in the current ns_server/memcached design, see below for in the "Original proposal" section on why that proposal was discarded.
So for what described so far our FlowControl implementation must have two characteristics:
- It has to implement the BufferAck mechanism (ie, the Consumer acks bytes of received DCP data back to the Producer when the Consumer's node is healthy; the Consumer stops acking bytes when the bucket hits some OOM-condition)
- It can't reject any DCP data
The DCP Consumer's Buffer solves those two problems:
- The Consumer receives some DCP message (eg mutation) over a PassiveStream
- If the bucket memory state is under control then PassiveStream processes the message (ie, insert into the HashTable and queue into Checkpoint) and sends a BufferAck message back to the Producer
- Else (OOM-condition) then PassiveStream inserts the message into its Buffer (ie, no processing, no BufferAck)
- At OOM PassiveStream also schedules the DcpConsumerTask, which is responsible for checking whether the bucket has recovered from the OOM-condition and processing the buffered massages. Processing the buffered messages also means acking those bytes back to the Producer.
Proposed alternative for getting rid of the Consumer's Buffer:
- The Consumer receives some DCP message (eg mutation) over a PassiveStream (not changed)
- If the bucket memory state is under control then PassiveStream processes the message (ie, insert into the HashTable and queue into Checkpoint) and sends a BufferAck message back to the Producer (not changed)
- Else (OOM-condition) then PassiveStream forcibly processes the message (ie, insert into the HashTable and queue into Checkpoint) but it does NOT send any BufferAck to the Producer. Outstanding unacked bytes are added into some PassiveStream accumulator member (a simple size_t)
- At OOM PassiveStream also schedules the DcpConsumerTask, which is responsible for checking whether the bucket has recovered from the OOM-condition and only for acking outstanding unacked bytes back to the Producer
Original proposal (Won't do)
Reason for discarding:
Removing FlowControl and DCP buffering at Consumer may lead to filling the kernel socket buffer on the Consumer node.
While that is perfectly fine from memcached point of view, that may block ns_server when it wants to send DCP messages over the proxy-connection like AddStream/CloseStream (eg, tearing down replication).
By removing the DCP consumer buffer we simplify the control flow within the passive_stream and can remove the code associated with ensuring the items in the buffer are processed before new incoming items.
Instead of using a separate buffer that needs to be managed we can keep the incoming item(s) in the libevent buffer and returning EWOULD_BLOCK. Then when the consumer frees some memory we can notify libevent to restart processing the items.
Removing the DCP consumer buffer allows us to remove the replication_throttle, which was only used to control that we should buffer when hitting the memory threshold.
Attachments
Issue Links
- causes
-
MB-61200 DCP replication can allocate over the Checkpoint Quota
- Closed
- is duplicated by
-
MB-59012 Logic data race in PassiveStream::processBufferedMessages might cause stream receiving duplicate seqnos
- Closed
- is triggering
-
MB-60436 Review and possibly remove dcp_consumer_process_unacked_bytes_yield_limit
- Open
-
MB-60438 Review and possibly remove legacy consumer-buffer logic in PassiveStream::streamEnd
- Open
-
MB-60415 Review and possibly remove the collateral code needed to handle the old DCP Consumer buffering
- Open
-
MB-46388 Investigate re-design of DCP for splitting data/control message over dedicated physical links
- Closed
- relates to
-
MB-60420 STDcpTest.ReplicateJustBeforeThrottleThreshold/7 test failure on Windows CV job
- Resolved