Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-40016

[BP to 6.6.1] Projector goes into a stream termination loop while trying to stream a near 20 MB document

    XMLWordPrintable

    Details

    • Triage:
      Untriaged
    • Is this a Regression?:
      Unknown

      Description

      For any document, the 20 MB limit applies to document body + User xattrs.
      Memcached reserves a buffer of 1 MB for System xattrs.
      While streaming a document to a DCP consumer, KV will stream body + user xattrs + system xattrs.
      Hence, consider a case where the document body is 19.9 MB. With 1 MB of system xattr buffer fully used, KV will sent 19.9 + 1 = 20.9 MB to the consumer.

      Actual behaviour:

      When projector receives this mutation, it extracts the payload (which also includes system xattrs) and compares it to the hard coded value of 20 MB:

      https://github.com/couchbase/indexing/blob/c8065f887280aad5106d65c087d72415ff8f366c/secondary/dcp/transport/mc_req.go#L152-L153

      For a document described above, this condition is met, after which projector logs a message as follows:

      2020-02-27T19:30:23.716+00:00 [Error] DCPT[secidx:proj-sxoprd_posentities-MAINT_STREAM_TOPIC_ea678d87a86f96705627397d634ec781-1339850182609080814/1] doReceive(): 20976104 is too big (max 20971520)
      

      More importantly, it then terminates this DCP stream and tries to recreate it, hence going into a loop:

       2020-02-27T19:30:23.716+00:00 [Info] DCPT[secidx:proj-sxoprd_posentities-MAINT_STREAM_TOPIC_ea678d87a86f96705627397d634ec781-1339850182609080814/1] ##45fe ... stopped
      

      Expected behaviour:

      1. Firstly, checking for document size at consumer level is redundant code, as KV will do the filtering itself before sending over the document.

      2. Even if a consumer is checking for the size, it should actually be having the logic to distinguish between document body size and system xattr size and prevent only those mutations having body size > 20 MB (however this again will be dead code).

      3. The most important aspect here is the way projector handles this document. Views currently simply log the document ID and skip processing this document. Projector on the other hand goes into a loop of recreating DCPT streams. This means that it doesn't stream any further sequence numbers from that particular vbucket and affects index builds, and eventually request_plus queries causing an outage.

      Instead, projector should also simply skip such a mutation.

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            jeelan.poola Jeelan Poola added a comment -

            Discussed during GSI scrum today. Adding this to 6.6.1 approved list. Agreed to by Mihir Kamdar.

            Show
            jeelan.poola Jeelan Poola added a comment - Discussed during GSI scrum today. Adding this to 6.6.1 approved list. Agreed to by Mihir Kamdar .
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.6.1-9083 contains indexing commit 8776cb6 with commit message:
            MB-40016 Remove doc size checks in projector

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9083 contains indexing commit 8776cb6 with commit message: MB-40016 Remove doc size checks in projector
            Hide
            girish.benakappa Girish Benakappa added a comment -

            could able to reproduce with 6.6.1-9082 and saw below messages. Verified with 6.6.1-9083 and do not see these messages.

            2020-11-23T23:33:32.161-08:00 [Error] DCPT[secidx:proj-default-MAINT_STREAM_TOPIC_9b6d749258fc75f90d1c030fdeeeb273-15104621787000396076/1] doReceive(): 21892092 is too big (max 20971520)
            2020-11-23T23:33:32.162-08:00 [Info] DCPT[secidx:proj-default-MAINT_STREAM_TOPIC_9b6d749258fc75f90d1c030fdeeeb273-15104621787000396076/1] ##3 ... stopped
            2020-11-23T23:33:33.211-08:00 [Info] DCPT[secidx:getfailoverlog-default-1606203213183012587] ##19 ... stopped
            2020-11-23T23:33:33.338-08:00 [Error] DCPT[secidx:proj-default-MAINT_STREAM_TOPIC_9b6d749258fc75f90d1c030fdeeeb273-15104621787000396076/1] doReceive(): 21892092 is too big (max 20971520)
            2020-11-23T23:33:33.339-08:00 [Info] DCPT[secidx:proj-default-MAINT_STREAM_TOPIC_9b6d749258fc75f90d1c030fdeeeb273-15104621787000396076/1] ##3 ... stopped
            

            Show
            girish.benakappa Girish Benakappa added a comment - could able to reproduce with 6.6.1-9082 and saw below messages. Verified with 6.6.1-9083 and do not see these messages. 2020-11-23T23:33:32.161-08:00 [Error] DCPT[secidx:proj-default-MAINT_STREAM_TOPIC_9b6d749258fc75f90d1c030fdeeeb273-15104621787000396076/1] doReceive(): 21892092 is too big (max 20971520) 2020-11-23T23:33:32.162-08:00 [Info] DCPT[secidx:proj-default-MAINT_STREAM_TOPIC_9b6d749258fc75f90d1c030fdeeeb273-15104621787000396076/1] ##3 ... stopped 2020-11-23T23:33:33.211-08:00 [Info] DCPT[secidx:getfailoverlog-default-1606203213183012587] ##19 ... stopped 2020-11-23T23:33:33.338-08:00 [Error] DCPT[secidx:proj-default-MAINT_STREAM_TOPIC_9b6d749258fc75f90d1c030fdeeeb273-15104621787000396076/1] doReceive(): 21892092 is too big (max 20971520) 2020-11-23T23:33:33.339-08:00 [Info] DCPT[secidx:proj-default-MAINT_STREAM_TOPIC_9b6d749258fc75f90d1c030fdeeeb273-15104621787000396076/1] ##3 ... stopped

              People

              Assignee:
              girish.benakappa Girish Benakappa
              Reporter:
              varun.velamuri Varun Velamuri
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty