Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-35908

2x increase(time) in delta recovery after graceful failover in 6.5.0-4218

    XMLWordPrintable

Details

    • Untriaged
    • Unknown
    • KV-Engine MH 2nd Beta

    Attachments

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

      Activity

        The issue got exposed by this change of mine: http://review.couchbase.org/#/c/113796/ (which is a special case of the more general change that might or might not happen in madhatter: MB-35782).

        The underlying issue does appear to be an ep-engine issue though. Since this is a graceful failover followed by a delta recovery, ideally we don't expect to find any vbuckets with diverged histories during delta recovery (note, due to MB-35407 that's not going to be quite true, but the number of affected vbuckets shouldn't be large). In this case, out of 512 vbuckets that had to be recovered, 177 were found to have diverged.

        [ns_server:info,2019-09-06T22:53:20.134-07:00,ns_1@172.23.99.206:rebalance_agent-worker<0.17320.4>:rebalance_agent:find_diverged_vbuckets:575]Found diverged vbuckets in bucket "bucket-1". VBuckets:
        [{439,[{8968159175748,0}],[{230363863584385,0}],103696},
         {724,[{157387194889432,0}],[{52110476215576,0}],103723},
         {230,[{32677229703634,0}],[{101323308188231,0}],103640},
         {464,[{250620930099785,0}],[{148370438418802,0}],103804},
         {749,[{95023536180161,0}],[{189218869046192,0}],103736},
         {255,[{190640454172129,0}],[{61561632926425,0}],103845},
         {489,[{186984667174414,0}],[{182423637108604,0}],103638},
         {178,[{270303978685558,0}],[{8700194518495,0}],103624},
         {723,[{163683204348626,0}],[{92780527535927,0}],103659},
         {229,[{128339327605831,0}],[{117254116392904,0}],103523},
        ...
        

        Let's look at the vbucket 439. The information in the log message tells us that at the moment when delta recovery was performed, the failover history of the vbucket on the recovered node (node .206) was:

        [{230363863584385,0}]
        

        While the failover history in the active copy (node .204) was:

        [{8968159175748,0}]
        

        While the high seqno on node .206 was 103696.

        So according to these failover histories, the two vbuckets diverged at the sequence number of 0. Which is weird to say the least.

        Looking at this further. Prior to the graceful failover, the chain for this vbucket was:

        [172.23.99.204,172.23.99.206]
        

        After graceful failover it became:

        [172.23.99.204,undefined]
        

        Then after the recovery, it went back to:

        [172.23.99.204,172.23.99.206]
        

        So this vbucket has always been a replica on .206. Now if I search for 230363863584385 (.206's vbucket uuid) in memcached log I find the following:

        2019-09-06T21:54:18.218725-07:00 INFO (bucket-1) VBucket: created vb:439 with state:replica initialState:dead lastSeqno:0 persistedRange:{0,0} max_cas:0 uuid:230363863584385 topology:null
        ...
        2019-09-06T21:54:18.364017-07:00 INFO 160: (bucket-1) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.99.204->ns_1@172.23.99.206:bucket-1 - (vb:439) Attempting to add stream: opaque_:15, start_seqno_:0, end_seqno_:18446744073709551615, vb_uuid:230363863584385, snap_start_seqno_:0, snap_end_seqno_:0, last_seqno:0, stream_req_value:{"uid":"0"}
        ...
        2019-09-06T22:53:19.896538-07:00 INFO (bucket-1) VBucket: created vb:439 with state:replica initialState:replica lastSeqno:103696 persistedRange:{103696,103696} max_cas:1567834383669723136 uuid:230363863584385 topology:null
        

        21:54:18 is when the bucket was created. Then a replication from 204 to 206 existed for an hour before graceful failover was performed. Yet the vbucket on .206 never synced its vbucket history with the active vbucket. Which then led delta recovery to determine that the vbucket needed to be deleted.

        Aliaksey Artamonau Aliaksey Artamonau (Inactive) added a comment - - edited The issue got exposed by this change of mine: http://review.couchbase.org/#/c/113796/ (which is a special case of the more general change that might or might not happen in madhatter: MB-35782 ). The underlying issue does appear to be an ep-engine issue though. Since this is a graceful failover followed by a delta recovery, ideally we don't expect to find any vbuckets with diverged histories during delta recovery (note, due to MB-35407 that's not going to be quite true, but the number of affected vbuckets shouldn't be large). In this case, out of 512 vbuckets that had to be recovered, 177 were found to have diverged. [ns_server:info,2019-09-06T22:53:20.134-07:00,ns_1@172.23.99.206:rebalance_agent-worker<0.17320.4>:rebalance_agent:find_diverged_vbuckets:575]Found diverged vbuckets in bucket "bucket-1". VBuckets: [{439,[{8968159175748,0}],[{230363863584385,0}],103696}, {724,[{157387194889432,0}],[{52110476215576,0}],103723}, {230,[{32677229703634,0}],[{101323308188231,0}],103640}, {464,[{250620930099785,0}],[{148370438418802,0}],103804}, {749,[{95023536180161,0}],[{189218869046192,0}],103736}, {255,[{190640454172129,0}],[{61561632926425,0}],103845}, {489,[{186984667174414,0}],[{182423637108604,0}],103638}, {178,[{270303978685558,0}],[{8700194518495,0}],103624}, {723,[{163683204348626,0}],[{92780527535927,0}],103659}, {229,[{128339327605831,0}],[{117254116392904,0}],103523}, ... Let's look at the vbucket 439. The information in the log message tells us that at the moment when delta recovery was performed, the failover history of the vbucket on the recovered node (node .206) was: [{230363863584385,0}] While the failover history in the active copy (node .204) was: [{8968159175748,0}] While the high seqno on node .206 was 103696. So according to these failover histories, the two vbuckets diverged at the sequence number of 0. Which is weird to say the least. Looking at this further. Prior to the graceful failover, the chain for this vbucket was: [172.23.99.204,172.23.99.206] After graceful failover it became: [172.23.99.204,undefined] Then after the recovery, it went back to: [172.23.99.204,172.23.99.206] So this vbucket has always been a replica on .206. Now if I search for 230363863584385 (.206's vbucket uuid) in memcached log I find the following: 2019-09-06T21:54:18.218725-07:00 INFO (bucket-1) VBucket: created vb:439 with state:replica initialState:dead lastSeqno:0 persistedRange:{0,0} max_cas:0 uuid:230363863584385 topology:null ... 2019-09-06T21:54:18.364017-07:00 INFO 160: (bucket-1) DCP (Consumer) eq_dcpq:replication:ns_1@172.23.99.204->ns_1@172.23.99.206:bucket-1 - (vb:439) Attempting to add stream: opaque_:15, start_seqno_:0, end_seqno_:18446744073709551615, vb_uuid:230363863584385, snap_start_seqno_:0, snap_end_seqno_:0, last_seqno:0, stream_req_value:{"uid":"0"} ... 2019-09-06T22:53:19.896538-07:00 INFO (bucket-1) VBucket: created vb:439 with state:replica initialState:replica lastSeqno:103696 persistedRange:{103696,103696} max_cas:1567834383669723136 uuid:230363863584385 topology:null 21:54:18 is when the bucket was created. Then a replication from 204 to 206 existed for an hour before graceful failover was performed. Yet the vbucket on .206 never synced its vbucket history with the active vbucket. Which then led delta recovery to determine that the vbucket needed to be deleted.

        Jim Walker, could you please look into the issue with replicas not picking up correct failover histories?

        Aliaksey Artamonau Aliaksey Artamonau (Inactive) added a comment - Jim Walker , could you please look into the issue with replicas not picking up correct failover histories?
        ben.huddleston Ben Huddleston added a comment - - edited

        Found a bug whilst looking into this and expanding a unit test but not sure if this is the cause for the slow down in rebalance seen in perf tests, although it certainly could cause this issue.

        We could end up with two set_vbucket_state operations being flushed in the same batch, for example, a state change and a change to the failover table. Lets say that chronologically the state change takes place first, followed by the failover table change. Due to the code added in this patch http://review.couchbase.org/#/c/113904/ (http://src.couchbase.org/source/xref/trunk/kv_engine/engines/ep/src/ep_bucket.cc#464) we will skip the second item that we see because the second item has the same seqno. This could cause us to lose the state change should we shutdown or, as is the case for vb:439 in the logs for node 206, delete the vBucket from memory and then warm it back up from the dead state.

        ben.huddleston Ben Huddleston added a comment - - edited Found a bug whilst looking into this and expanding a unit test but not sure if this is the cause for the slow down in rebalance seen in perf tests, although it certainly could cause this issue. We could end up with two set_vbucket_state operations being flushed in the same batch, for example, a state change and a change to the failover table. Lets say that chronologically the state change takes place first, followed by the failover table change. Due to the code added in this patch http://review.couchbase.org/#/c/113904/ ( http://src.couchbase.org/source/xref/trunk/kv_engine/engines/ep/src/ep_bucket.cc#464 ) we will skip the second item that we see because the second item has the same seqno. This could cause us to lose the state change should we shutdown or, as is the case for vb:439 in the logs for node 206, delete the vBucket from memory and then warm it back up from the dead state.

        Build couchbase-server-6.5.0-4293 contains kv_engine commit 50fa82a with commit message:
        MB-35908: Compare cas instead of seqno when deduping setVBState

        build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-4293 contains kv_engine commit 50fa82a with commit message: MB-35908 : Compare cas instead of seqno when deduping setVBState
        ben.huddleston Ben Huddleston added a comment - - edited

        Regression looks to have been fixed by my patch to persist the latest set_vbucket_state.

        ben.huddleston Ben Huddleston added a comment - - edited Regression looks to have been fixed by my patch to persist the latest set_vbucket_state.

        People

          sharath.sulochana Sharath Sulochana (Inactive)
          sharath.sulochana Sharath Sulochana (Inactive)
          Votes:
          0 Vote for this issue
          Watchers:
          5 Start watching this issue

          Dates

            Created:
            Updated:
            Resolved:

            Gerrit Reviews

              There are no open Gerrit changes

              PagerDuty