Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-4366

ns_server is reusing tap names unsafely which causes data loss or inconsistency in replication when a node is removed and added back

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.7.2, 1.8.0
    • Fix Version/s: 1.8.1
    • Component/s: ns_server
    • Security Level: Public

      Description

      screenshot attached

      NOTE: we're converting this to main 'named tap issues' ticket.

      So what's not safe about reusing named taps as of 1.8.0?

      If something happened to destination node after tap was disconnected. And if that something affected data for vbuckets replicated as part of named tap, then subsequent reuse of named tap will incorrectly assume that we can continue sending stuff instead of re-negotiating which data needs to be resent.

      # Subject Project Status CR V
      For Gerrit Dashboard: &For+MB-4366=message:MB-4366

        Activity

        farshid Farshid Ghods (Inactive) created issue -
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        another screenshot : 5 minutes after stopping the rebalance

        Show
        farshid Farshid Ghods (Inactive) added a comment - another screenshot : 5 minutes after stopping the rebalance
        farshid Farshid Ghods (Inactive) made changes -
        Field Original Value New Value
        Attachment Screen Shot 2011-10-19 at 5.23.19 PM.png [ 11766 ]
        farshid Farshid Ghods (Inactive) made changes -
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        tap stream only stops if there is no item added to the backlog
        if the user keeps the load running this tap stream remains alive forever

        Show
        farshid Farshid Ghods (Inactive) added a comment - tap stream only stops if there is no item added to the backlog if the user keeps the load running this tap stream remains alive forever
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Farshid, cannot make sense of this screenshots. Can you elaborate?

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Farshid, cannot make sense of this screenshots. Can you elaborate?
        alkondratenko Aleksey Kondratenko (Inactive) made changes -
        Assignee Farshid Ghods [ farshid ]
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        basically that means there is still one tap_rebalance stream open and running even after rebalance was stopped.

        we seem to be stopping most of the streams except one

        Show
        farshid Farshid Ghods (Inactive) added a comment - basically that means there is still one tap_rebalance stream open and running even after rebalance was stopped. we seem to be stopping most of the streams except one
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        waiting 5 minutes will not work if there are ongoing mutuations in the cluster because this tap stream only times out after 5 minutes of inactivity

        Show
        farshid Farshid Ghods (Inactive) added a comment - waiting 5 minutes will not work if there are ongoing mutuations in the cluster because this tap stream only times out after 5 minutes of inactivity
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        so it's ep-engine issue then ? I mean we close tap streams as much as possible in ns_server. Named tap streams are kept alive by ep-engine. If there's anything ns_server can do to really stop those tap producers, I'll be happy to do that.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - so it's ep-engine issue then ? I mean we close tap streams as much as possible in ns_server. Named tap streams are kept alive by ep-engine. If there's anything ns_server can do to really stop those tap producers, I'll be happy to do that.
        farshid Farshid Ghods (Inactive) made changes -
        Labels 1.7.2-release-notes 1.8.0-release-notes
        perry Perry Krug made changes -
        Component/s ns_server [ 10019 ]
        dipti Dipti Borkar made changes -
        Fix Version/s 1.8.1 [ 10249 ]
        Affects Version/s 1.8.0 [ 10248 ]
        Affects Version/s 1.7.2 [ 10203 ]
        Priority Major [ 3 ] Blocker [ 1 ]
        dipti Dipti Borkar made changes -
        Link This issue blocks CBSE-114 [ CBSE-114 ]
        dipti Dipti Borkar made changes -
        Assignee Farshid Ghods [ farshid ] Aleksey Kondratenko [ alkondratenko ]
        Fix Version/s 1.8.1 [ 10295 ]
        Hide
        steve Steve Yen added a comment -

        this is the main ticket for the named tap approach/fix

        Show
        steve Steve Yen added a comment - this is the main ticket for the named tap approach/fix
        Hide
        steve Steve Yen added a comment -

        is this a blocker for 1.8.1?

        Show
        steve Steve Yen added a comment - is this a blocker for 1.8.1?
        Hide
        dipti Dipti Borkar added a comment -

        Yes, because this may be causing data loss in some conditions.

        Farshid, I believe there are a few other tickets where this is the underlying problem. Can you reference them here for completeness? Thanks

        Show
        dipti Dipti Borkar added a comment - Yes, because this may be causing data loss in some conditions. Farshid, I believe there are a few other tickets where this is the underlying problem. Can you reference them here for completeness? Thanks
        alkondratenko Aleksey Kondratenko (Inactive) made changes -
        Summary ns_server doesn't shut down rebalance tap streams when user stops rebalance ns_server is reusing tap names unsafely
        Description screenshot attached screenshot attached

        NOTE: we're converting this to main 'named tap issues' ticket.

        So what's not safe about reusing named taps as of 1.8.0?

        If something happened to destination node after tap was disconnected. And if that something affected data for vbuckets replicated as part of named tap, then subsequent reuse of named tap will incorrectly assume that we can continue sending stuff instead of re-negotiating which data needs to be resent.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        http://review.couchbase.org/14555 fixes it on 1.8.1.

        1.8 and master have a bit different code in this area so this work still needs some forward-porting.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - http://review.couchbase.org/14555 fixes it on 1.8.1. 1.8 and master have a bit different code in this area so this work still needs some forward-porting.
        Hide
        steve Steve Yen added a comment -

        fix is in gerrit (but more work still needed to enable 1.8.2)

        Show
        steve Steve Yen added a comment - fix is in gerrit (but more work still needed to enable 1.8.2)
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        let's keep this open for now. While I'll adapt it for 1.8.2 I may have to change 1.8.1 code to enable forward-compatibility with 1.8.2 and master

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - let's keep this open for now. While I'll adapt it for 1.8.2 I may have to change 1.8.1 code to enable forward-compatibility with 1.8.2 and master
        dipti Dipti Borkar made changes -
        Labels 1.7.2-release-notes 1.8.0-release-notes next_sprint
        dipti Dipti Borkar made changes -
        Labels next_sprint current_sprint
        dipti Dipti Borkar made changes -
        Labels current_sprint next_sprint
        dipti Dipti Borkar made changes -
        Sprint Priority 7
        Hide
        dipti Dipti Borkar added a comment -

        Aliaksey, code complete is friday and we need to merge everything in by then.
        What changes need to be made to ensure forward-compatibility?

        Show
        dipti Dipti Borkar added a comment - Aliaksey, code complete is friday and we need to merge everything in by then. What changes need to be made to ensure forward-compatibility?
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Minor. I'll be doing that tomorrow first-priority.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Minor. I'll be doing that tomorrow first-priority.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        I've found no further changes to 1.8.1 are needed. 1.8.2 implementation is here http://review.couchbase.org/14827

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - I've found no further changes to 1.8.1 are needed. 1.8.2 implementation is here http://review.couchbase.org/14827
        alkondratenko Aleksey Kondratenko (Inactive) made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-ns-server-2-0 #333 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/333/)
        only reuse tap name when changing vbucket filter.MB-4366 (Revision 61bf78355e64fff2e807939fea385862ca6919d5)
        reimplemented named tap fix for branch-18. MB-4366 (Revision e3b833480ceb5b7832e22131ed5d3fb532e6ea83)

        Result = SUCCESS
        Aliaksey Artamonau :
        Files :

        • src/ns_server_cluster_sup.erl
        • src/ebucketmigrator_srv.erl
        • src/ns_vbm_sup.erl

        Aliaksey Artamonau :
        Files :

        • src/ns_vbm_new_sup.erl
        • src/ns_vbm_sup.erl
        • src/ebucketmigrator_srv.erl
        • src/ns_server_cluster_sup.erl
        • src/cb_gen_vbm_sup.erl
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-ns-server-2-0 #333 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/333/ ) only reuse tap name when changing vbucket filter. MB-4366 (Revision 61bf78355e64fff2e807939fea385862ca6919d5) reimplemented named tap fix for branch-18. MB-4366 (Revision e3b833480ceb5b7832e22131ed5d3fb532e6ea83) Result = SUCCESS Aliaksey Artamonau : Files : src/ns_server_cluster_sup.erl src/ebucketmigrator_srv.erl src/ns_vbm_sup.erl Aliaksey Artamonau : Files : src/ns_vbm_new_sup.erl src/ns_vbm_sup.erl src/ebucketmigrator_srv.erl src/ns_server_cluster_sup.erl src/cb_gen_vbm_sup.erl
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-ns-server-2-0 #337 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/337/)
        fixed typo in start_vbucket_filter_change. MB-4366 (Revision 5db3c35e8a5ff6a5885271df4466b30c5369fa38)

        Result = SUCCESS
        Steve Yen :
        Files :

        • src/ebucketmigrator_srv.erl
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-ns-server-2-0 #337 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/337/ ) fixed typo in start_vbucket_filter_change. MB-4366 (Revision 5db3c35e8a5ff6a5885271df4466b30c5369fa38) Result = SUCCESS Steve Yen : Files : src/ebucketmigrator_srv.erl
        farshid Farshid Ghods (Inactive) made changes -
        Summary ns_server is reusing tap names unsafely ns_server is reusing tap names unsafely which causes data loss or inconsistency in replication when a node is removed and added back
        farshid Farshid Ghods (Inactive) made changes -
        Labels next_sprint 1.8.1-release-notes
        ingenthr Matt Ingenthron made changes -
        Comment [ Lu provided by the 09 series in a warm summer morning coastline, <strong><a href="http://www.replicawatchesale1.com/">Replica Rolex</a></strong> think. Surprisingly, the temperature, one mile one mile natural sand, <strong><a href="http://www.popreplicabags.com/">Replica Handbags</a></strong> part of the vast expanse of water, heat the coast of the pie, Ipanema beach on the beautiful landscape diversity. We will conduct our bathing suits, sunglasses, bags, bracelets, links ... <strong><a href="http://www.popreplicabags.com/">Louis Vuitton Handbags</a></strong> whatever you bring you to enjoy your glorious period. Of course, <strong><a href="http://www.popreplicabags.com/louis-vuitton-handbags-louis-vuitton-speedy-c-1_2_33.html">Louis Vuitton Speedy</a></strong> the most important commodity on the beach, so the need for frequent bag, we can easily invest in those clothes, as well as with spicy each of us demonstrators, Cabas the same time.Your of Ipanema Beach is the main function. very suitable for people who are looking for an exciting beach holiday. The popular and stylish, and definitely should never be a magnificent coastline. As the hot visual appeal, <strong><a href="http://www.popreplicabags.com/louis-vuitton-handbags-louis-vuitton-neverfull-c-1_2_34.html">Louis Vuitton Neverfull</a></strong> my wife and I almost Tahitienne series supervision concept! <strong><a href="http://www.popreplicabags.com/louis-vuitton-handbags-louis-vuitton-alma-c-1_2_35.html">Louis Vuitton Alma</a></strong> Seriously with these people may be very similar, while the spherical design.Take Beach Cabas Ipanema, <strong><a href="http://www.popreplicabags.com/louis-vuitton-handbags-louis-vuitton-artsy-c-1_2_36.html">Louis Vuitton Artsy</a></strong> a good form of advertising on the Internet. It's just an elegant open-air use of organic cotton bag material, logo, excellent color, fine pleated, tapered basic structure of the management of every detail, <strong><a href="http://www.popreplicabags.com/louis-vuitton-handbags-louis-vuitton-keepall-c-1_2_37.html">Louis Vuitton Keepall</a></strong> in full compliance with the occasion of your beach. ]
        dipti Dipti Borkar made changes -
        Fix Version/s 1.8.2 [ 10249 ]
        farshid Farshid Ghods (Inactive) made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            alkondratenko Aleksey Kondratenko (Inactive)
            Reporter:
            farshid Farshid Ghods (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes