Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43046

[BP 6.5.2 MB-38269] - Projector goes into a stream termination loop while trying to stream a near 20 MB document

    XMLWordPrintable

Details

    • Untriaged
    • Unknown

    Description

      For any document, the 20 MB limit applies to document body + User xattrs.
      Memcached reserves a buffer of 1 MB for System xattrs.
      While streaming a document to a DCP consumer, KV will stream body + user xattrs + system xattrs.
      Hence, consider a case where the document body is 19.9 MB. With 1 MB of system xattr buffer fully used, KV will sent 19.9 + 1 = 20.9 MB to the consumer.

      Actual behaviour:

      When projector receives this mutation, it extracts the payload (which also includes system xattrs) and compares it to the hard coded value of 20 MB:

      https://github.com/couchbase/indexing/blob/c8065f887280aad5106d65c087d72415ff8f366c/secondary/dcp/transport/mc_req.go#L152-L153

      For a document described above, this condition is met, after which projector logs a message as follows:

      2020-02-27T19:30:23.716+00:00 [Error] DCPT[secidx:proj-sxoprd_posentities-MAINT_STREAM_TOPIC_ea678d87a86f96705627397d634ec781-1339850182609080814/1] doReceive(): 20976104 is too big (max 20971520)
      

      More importantly, it then terminates this DCP stream and tries to recreate it, hence going into a loop:

       2020-02-27T19:30:23.716+00:00 [Info] DCPT[secidx:proj-sxoprd_posentities-MAINT_STREAM_TOPIC_ea678d87a86f96705627397d634ec781-1339850182609080814/1] ##45fe ... stopped
      

      Expected behaviour:

      1. Firstly, checking for document size at consumer level is redundant code, as KV will do the filtering itself before sending over the document.

      2. Even if a consumer is checking for the size, it should actually be having the logic to distinguish between document body size and system xattr size and prevent only those mutations having body size > 20 MB (however this again will be dead code).

      3. The most important aspect here is the way projector handles this document. Views currently simply log the document ID and skip processing this document. Projector on the other hand goes into a loop of recreating DCPT streams. This means that it doesn't stream any further sequence numbers from that particular vbucket and affects index builds, and eventually request_plus queries causing an outage.

      Instead, projector should also simply skip such a mutation.

      Attachments

        Issue Links

          Activity

            People

              girish.benakappa Girish Benakappa
              jeelan.poola Jeelan Poola
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty