Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-41255

Value eviction bucket - Replication of deletes with XATTRS will fail if return EWOULD_BLOCK

    XMLWordPrintable

Details

    • Triaged
    • 1
    • Yes

    Description

      Problem

      As described in CBSE-8828, for a replica DCP stream. If a previous item that uses XATTRs is to be deleted, however it is not resident - then we must perform a bgfetch - see MB-36087. When the bgfetch is complete we replay PassiveStream::messageReceived. However this time it fails because the last seqno now equals the sequence number of the deletion and so it is discarded.

      Suggested Possible Solution

      • only set the last seqno to the seqno of the message just received, once the message has been completed or
      • If an EWOULD_BLOCK occurs reserve the setting of the last seqno so the message can be received again.

      Attachments

        1. image001.png
          90 kB
        2. image001.png
          90 kB
        3. image001.png
          90 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            owend Daniel Owen created issue -
            owend Daniel Owen made changes -
            Field Original Value New Value
            Link This issue relates to CBSE-8828 [ CBSE-8828 ]
            owend Daniel Owen made changes -
            Link This issue relates to MB-36087 [ MB-36087 ]
            owend Daniel Owen made changes -
            Fix Version/s 6.6.1 [ 17002 ]
            owend Daniel Owen made changes -
            Is this a Regression? Unknown [ 10452 ] Yes [ 10450 ]
            drigby Dave Rigby added a comment -
            • only set the last seqno to the seqno of the message just received, one the message has been completed or
            • If an EWOULD_BLOCK occurs reserve the setting of the last seqno so the message can be received again.

            The former makes most sense to me - note that EWOULDBLOCK can in theory happen mutliple times (say we are unlucky and it gets evicted before the request is scheduled again.

            drigby Dave Rigby added a comment - only set the last seqno to the seqno of the message just received, one the message has been completed or If an EWOULD_BLOCK occurs reserve the setting of the last seqno so the message can be received again. The former makes most sense to me - note that EWOULDBLOCK can in theory happen mutliple times (say we are unlucky and it gets evicted before the request is scheduled again.
            owend Daniel Owen made changes -
            Triage Untriaged [ 10351 ] Triaged [ 10350 ]
            owend Daniel Owen made changes -
            Summary Replication of deletes with XATTRS can fail due to EWOULD_BLOCK Replication of deletes with XATTRS will fail if return EWOULD_BLOCK
            owend Daniel Owen made changes -
            Description +Problem+

            As described in CBSE-8828, for a replica DCP stream. If a previous item that uses XATTRs is to be deleted, however it is not resident - then we must perform a bgfetch - see MB-36087. When the bgfetch is complete we replay PassiveStream::messageReceived. However this time it fails because the last seqno now equals the sequence number of the deletion and so it is discarded.

            +Suggested Possible Solution+

            - only set the last seqno to the seqno of the message just received, one the message has been completed or
            - If an EWOULD_BLOCK occurs reserve the setting of the last seqno so the message can be received again.
            +Problem+

            As described in CBSE-8828, for a replica DCP stream. If a previous item that uses XATTRs is to be deleted, however it is not resident - then we must perform a bgfetch - see MB-36087. When the bgfetch is complete we replay PassiveStream::messageReceived. However this time it fails because the last seqno now equals the sequence number of the deletion and so it is discarded.

            +Suggested Possible Solution+

            - only set the last seqno to the seqno of the message just received, once the message has been completed or
            - If an EWOULD_BLOCK occurs reserve the setting of the last seqno so the message can be received again.
            owend Daniel Owen made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            drigby Dave Rigby made changes -
            Rank Ranked higher
            drigby Dave Rigby made changes -
            Link This issue is caused by MB-36087 [ MB-36087 ]
            drigby Dave Rigby made changes -
            Link This issue relates to MB-36087 [ MB-36087 ]
            owend Daniel Owen made changes -
            Assignee Daniel Owen [ owend ] Jim Walker [ jwalker ]
            jwalker Jim Walker added a comment - - edited

            The fix is actually inside deleteWithMeta - when that function is invoked for a replica delete, it shouldn't be trying to 'prune' xattrs (i.e. need to bg-fetch existing data). As per comments already in that function, when invoked for a replica, just take what you're given and store it.

            The issue here is that a clause earlier in the function triggers the bg-fetch for either replica or not

            jwalker Jim Walker added a comment - - edited The fix is actually inside deleteWithMeta - when that function is invoked for a replica delete, it shouldn't be trying to 'prune' xattrs (i.e. need to bg-fetch existing data). As per comments already in that function, when invoked for a replica, just take what you're given and store it. The issue here is that a clause earlier in the function triggers the bg-fetch for either replica or not
            jwalker Jim Walker made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            perry Perry Krug made changes -
            Attachment image001.png [ 106493 ]
            perry Perry Krug made changes -
            Attachment image001.png [ 106524 ]
            perry Perry Krug made changes -
            Attachment image001.png [ 106531 ]
            James Flather James Flather made changes -
            Comment [ A comment with security level 'Membase Inc' was removed. ]
            James Flather James Flather made changes -
            Comment [ A comment with security level 'Membase Inc' was removed. ]
            James Flather James Flather made changes -
            Comment [ A comment with security level 'Membase Inc' was removed. ]
            James Flather James Flather made changes -
            Security Private [ 10010 ]
            wayne Wayne Siu made changes -
            Link This issue blocks MB-41310 [ MB-41310 ]
            wayne Wayne Siu made changes -
            Link This issue blocks MB-40528 [ MB-40528 ]
            wayne Wayne Siu made changes -
            Labels approved-for-6.0.5 approved-for-6.6.1
            wayne Wayne Siu made changes -
            Link This issue blocks MB-40511 [ MB-40511 ]
            James Flather James Flather made changes -
            Link This issue relates to CBSP-3173 [ CBSP-3173 ]

            Build couchbase-server-6.6.1-9038 contains kv_engine commit 8c73162 with commit message:
            MB-41255: Don't bgfetch for a replica delete of an xattr

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9038 contains kv_engine commit 8c73162 with commit message: MB-41255 : Don't bgfetch for a replica delete of an xattr
            ianmccloy Ian McCloy made changes -
            Security Private [ 10010 ]

            Build couchbase-server-7.0.0-3058 contains kv_engine commit 8c73162 with commit message:
            MB-41255: Don't bgfetch for a replica delete of an xattr

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-3058 contains kv_engine commit 8c73162 with commit message: MB-41255 : Don't bgfetch for a replica delete of an xattr
            jwalker Jim Walker added a comment -
            • Merged to mad-hatter (for 6.6.1)
            • Forward merged to master
            jwalker Jim Walker added a comment - Merged to mad-hatter (for 6.6.1) Forward merged to master
            jwalker Jim Walker made changes -
            Assignee Jim Walker [ jwalker ] Daniel Owen [ owend ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Resolved [ 5 ]
            owend Daniel Owen made changes -
            Status Resolved [ 5 ] Closed [ 6 ]
            owend Daniel Owen made changes -
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Reopened [ 4 ]

            Build couchbase-server-6.0.4-3098 contains kv_engine commit 829f660 with commit message:
            [BP] MB-41255: Don't bgfetch for a replica delete of an xattr

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.0.4-3098 contains kv_engine commit 829f660 with commit message: [BP] MB-41255 : Don't bgfetch for a replica delete of an xattr

            Build couchbase-server-6.0.5-3302 contains kv_engine commit 829f660 with commit message:
            [BP] MB-41255: Don't bgfetch for a replica delete of an xattr

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.0.5-3302 contains kv_engine commit 829f660 with commit message: [BP] MB-41255 : Don't bgfetch for a replica delete of an xattr
            wayne Wayne Siu made changes -
            Fix Version/s 6.0.5 [ 16729 ]
            wayne Wayne Siu made changes -
            Remote Link This issue links to "Page (Couchbase, Inc. Wiki)" [ 20577 ]
            wayne Wayne Siu made changes -
            Link This issue blocks MB-41489 [ MB-41489 ]
            owend Daniel Owen added a comment -

            Hey Jim Walker Could you merge the patch into the 6.5.1 branch that Chris Hillery created? (see MB-41489) thanks

            owend Daniel Owen added a comment - Hey Jim Walker Could you merge the patch into the 6.5.1 branch that Chris Hillery created? (see MB-41489) thanks
            owend Daniel Owen made changes -
            Assignee Daniel Owen [ owend ] Jim Walker [ jwalker ]
            owend Daniel Owen made changes -
            Link This issue relates to MB-41471 [ MB-41471 ]
            jwalker Jim Walker made changes -
            Summary Replication of deletes with XATTRS will fail if return EWOULD_BLOCK Value eviction bucket - Replication of deletes with XATTRS will fail if return EWOULD_BLOCK

            Build couchbase-server-6.5.1-6303 contains kv_engine commit 5958386 with commit message:
            [BP] MB-41255: Don't bgfetch for a replica delete of an xattr

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.1-6303 contains kv_engine commit 5958386 with commit message: [BP] MB-41255 : Don't bgfetch for a replica delete of an xattr

            Build couchbase-server-7.0.0-3163 contains kv_engine commit a242708 with commit message:
            MB-41255: Create an improve unit test that has more coverage

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-3163 contains kv_engine commit a242708 with commit message: MB-41255 : Create an improve unit test that has more coverage
            jwalker Jim Walker made changes -
            Assignee Jim Walker [ jwalker ] Daniel Owen [ owend ]
            Resolution Fixed [ 1 ]
            Status Reopened [ 4 ] Resolved [ 5 ]
            owend Daniel Owen made changes -
            Link This issue backports to MB-41387 [ MB-41387 ]

            Build couchbase-server-6.6.1-9110 contains kv_engine commit 829f660 with commit message:
            [BP] MB-41255: Don't bgfetch for a replica delete of an xattr

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.1-9110 contains kv_engine commit 829f660 with commit message: [BP] MB-41255 : Don't bgfetch for a replica delete of an xattr

            Build couchbase-server-6.5.1-6307 contains kv_engine commit 5958386 with commit message:
            [BP] MB-41255: Don't bgfetch for a replica delete of an xattr

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.1-6307 contains kv_engine commit 5958386 with commit message: [BP] MB-41255 : Don't bgfetch for a replica delete of an xattr
            owend Daniel Owen made changes -
            Link This issue relates to CBSE-9100 [ CBSE-9100 ]

            Build couchbase-server-7.0.0-3630 contains kv_engine commit 829f660 with commit message:
            [BP] MB-41255: Don't bgfetch for a replica delete of an xattr

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-3630 contains kv_engine commit 829f660 with commit message: [BP] MB-41255 : Don't bgfetch for a replica delete of an xattr
            wayne Wayne Siu made changes -
            Link This issue blocks MB-42583 [ MB-42583 ]
            jwalker Jim Walker added a comment -

            To reproduce the orginal issue something like the following is required.

            • At least 2 nodes (1 value eviction bucket) are needed as we need replication.
            • Bucket should have a very low quota to make this quicket/easier to trigger.
            • Fill the bucket with items that have xattrs (this bit requires a client that can create items with xattrs).
              • The bucket become less than 100% resident
              • The items that we write in this stage will later need to be deleted, so it maybe convenient to set an expiry value, but if you record the keys that are written it is more reliably to explicitly delete them later.
            • Once the bucket has a good number of non-resident items start deleting the items (or wait for the expiry interval and force compaction)
            • The issue is reproduced once one of the items is deleted and replicated, provided that on the replica the item is not resident the bug occurs.
            jwalker Jim Walker added a comment - To reproduce the orginal issue something like the following is required. At least 2 nodes (1 value eviction bucket) are needed as we need replication. Bucket should have a very low quota to make this quicket/easier to trigger. Fill the bucket with items that have xattrs (this bit requires a client that can create items with xattrs). The bucket become less than 100% resident The items that we write in this stage will later need to be deleted, so it maybe convenient to set an expiry value, but if you record the keys that are written it is more reliably to explicitly delete them later. Once the bucket has a good number of non-resident items start deleting the items (or wait for the expiry interval and force compaction) The issue is reproduced once one of the items is deleted and replicated, provided that on the replica the item is not resident the bug occurs.

            Thanks for providing the validation steps Jim Walker

            Validated the fix using 6.6.1-9182 build.

            Closing this ticket

            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Thanks for providing the validation steps Jim Walker Validated the fix using 6.6.1-9182 build. Closing this ticket
            ashwin.govindarajulu Ashwin Govindarajulu made changes -
            Assignee Daniel Owen [ owend ] Ashwin Govindarajulu [ ashwin.govindarajulu ]
            Status Resolved [ 5 ] Closed [ 6 ]
            wayne Wayne Siu made changes -
            Fix Version/s 6.5.2 [ 17223 ]
            abhishek.jindal Abhishek Jindal made changes -
            Link This issue relates to CBSE-9361 [ CBSE-9361 ]
            ashwin.govindarajulu Ashwin Govindarajulu added a comment - Validated the fix against "6.0.5-3333-enterprise" http://qa.sc.couchbase.com/job/oel6-4node-rebalance_in_jython/1480/console

            Build couchbase-server-6.6.2-9411 contains kv_engine commit c690fba with commit message:
            Merge "MB-41255: Merge branch '6.5.2' into mad-hatter" into mad-hatter

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.2-9411 contains kv_engine commit c690fba with commit message: Merge " MB-41255 : Merge branch '6.5.2' into mad-hatter" into mad-hatter

            Build couchbase-server-6.6.2-9411 contains kv_engine commit d88694a with commit message:
            MB-41255: Merge branch '6.5.2' into mad-hatter

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.2-9411 contains kv_engine commit d88694a with commit message: MB-41255 : Merge branch '6.5.2' into mad-hatter

            Build couchbase-server-6.6.2-9411 contains kv_engine commit 5958386 with commit message:
            [BP] MB-41255: Don't bgfetch for a replica delete of an xattr

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.6.2-9411 contains kv_engine commit 5958386 with commit message: [BP] MB-41255 : Don't bgfetch for a replica delete of an xattr
            arunkumar Arunkumar Senthilnathan made changes -
            Labels approved-for-6.0.5 approved-for-6.6.1 approved-for-6.0.5 approved-for-6.6.1 releasenote

            Build couchbase-server-7.0.0-4255 contains kv_engine commit c690fba with commit message:
            Merge "MB-41255: Merge branch '6.5.2' into mad-hatter" into mad-hatter

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4255 contains kv_engine commit c690fba with commit message: Merge " MB-41255 : Merge branch '6.5.2' into mad-hatter" into mad-hatter

            Build couchbase-server-7.0.0-4255 contains kv_engine commit d88694a with commit message:
            MB-41255: Merge branch '6.5.2' into mad-hatter

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4255 contains kv_engine commit d88694a with commit message: MB-41255 : Merge branch '6.5.2' into mad-hatter

            Build couchbase-server-7.0.0-4255 contains kv_engine commit 5958386 with commit message:
            [BP] MB-41255: Don't bgfetch for a replica delete of an xattr

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.0-4255 contains kv_engine commit 5958386 with commit message: [BP] MB-41255 : Don't bgfetch for a replica delete of an xattr

            Thanks Dave Rigby.

            Depends on how you define "regression" . I believe the issue was introduced with XATTR support; when replicating deletes to items which previously had XATTRs in them. To hit this issue you need to be using XATTRs, which were new in 5.0.0 - so while it's true we didn't encounter this issue prior to v5, you also need to be using the new v5 feature to hit it.

            Customer was upgrading from 6.0 to 6.6. Given that the fix was backported to 6.0.5 also, looks like the issue existed in 6.0.x as well. Likely that the customer was not using XATTRs and their previous upgrades (to 6.0.x) were hence not impacted. So yes, probably not a regression.

            shivani.gupta Shivani Gupta added a comment - Thanks Dave Rigby . Depends on how you define "regression"  . I believe the issue was introduced with XATTR support; when replicating deletes to items which previously had XATTRs in them. To hit this issue you need to be using XATTRs, which were new in 5.0.0 - so while it's true we didn't encounter this issue prior to v5, you also need to be using the new v5 feature to hit it. Customer was upgrading from 6.0 to 6.6. Given that the fix was backported to 6.0.5 also, looks like the issue existed in 6.0.x as well. Likely that the customer was not using XATTRs and their previous upgrades (to 6.0.x) were hence not impacted. So yes, probably not a regression.

            People

              ashwin.govindarajulu Ashwin Govindarajulu
              owend Daniel Owen
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty