Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-35407

graceful failover deletes bucket with force=true and can lead to poor delta recovery

    XMLWordPrintable

Details

    Description

      The RCA for MB-35326 showed that a dead->replica state change occurred soon before the bucket was deleted, the replica state change was never flushed because a force delete was used.

      see comment here

      The result of this is that KV at the restart/bucket creation, warmup will ignore dead vbuckets and as such, the vbuckets which could of been warmed up (because they were correctly flushed as active or replica) must now be rebuilt.

      Raising this MB as it seems a force bucket delete seems to be counter productive for a graceful failover->delta recovery workflow.

      Note-1: force shutdown noticed in ns_server.debug.log

      [ns_server:info,2019-07-27T23:34:56.175-07:00,ns_1@172.23.105.47:ns_memcached-default<0.20811.9>:ns_memcached:delete_bucket:770]Deleting bucket "default" from memcached (force = true)
      

       
      Note-2: Assuming graceful failover as it's mentioned in logging e.g.

      [ns_server:debug,2019-07-27T23:34:56.831-07:00,ns_1@172.23.105.47:ns_config_log<0.199.0>:ns_config_log:log_common:231]config change:
      counters ->
      [{'_vclock',[{<<"66664331ef97ea7bd2fdcdbe29e89002">>,{54,63731514896}}]},
       {failover_complete,2},
       {graceful_failover_start,2},
       {rebalance_success,24},
       {rebalance_start,24},
       {graceful_failover_success,1},
       {failover,1}]
      

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Note-2: Assuming graceful failover as it's mentioned in logging e.g.

          Yup - slightly more explicit reference in diag

          2019-07-27T23:34:17.505-07:00, ns_orchestrator:0:info:message(ns_1@172.23.105.105) - Starting graceful failover of nodes ['ns_1@172.23.105.47']. Operation Id = a5447c50d1e82b8739d0dd3efd101965
          2019-07-27T23:34:17.506-07:00, ns_rebalancer:0:info:message(ns_1@172.23.105.105) - Starting vbucket moves for graceful failover of ['ns_1@172.23.105.47']
          2019-07-27T23:34:18.081-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.105.105) - Bucket "default" rebalance does not seem to be swap rebalance
          2019-07-27T23:34:55.796-07:00, failover:0:info:message(ns_1@172.23.105.105) - Starting failing over ['ns_1@172.23.105.47']
          2019-07-27T23:34:56.175-07:00, ns_memcached:0:info:message(ns_1@172.23.105.47) - Shutting down bucket "default" on 'ns_1@172.23.105.47' for deletion
          2019-07-27T23:34:56.808-07:00, failover:0:info:message(ns_1@172.23.105.105) - Failed over ['ns_1@172.23.105.47']: ok
          2019-07-27T23:34:56.838-07:00, failover:0:info:message(ns_1@172.23.105.105) - Deactivating failed over nodes ['ns_1@172.23.105.47']
          2019-07-27T23:34:56.842-07:00, ns_orchestrator:0:info:message(ns_1@172.23.105.105) - Graceful failover completed successfully.
          

          James Flather James Flather added a comment - Note-2: Assuming graceful failover as it's mentioned in logging e.g. Yup - slightly more explicit reference in diag 2019-07-27T23:34:17.505-07:00, ns_orchestrator:0:info:message(ns_1@172.23.105.105) - Starting graceful failover of nodes ['ns_1@172.23.105.47']. Operation Id = a5447c50d1e82b8739d0dd3efd101965 2019-07-27T23:34:17.506-07:00, ns_rebalancer:0:info:message(ns_1@172.23.105.105) - Starting vbucket moves for graceful failover of ['ns_1@172.23.105.47'] 2019-07-27T23:34:18.081-07:00, ns_vbucket_mover:0:info:message(ns_1@172.23.105.105) - Bucket "default" rebalance does not seem to be swap rebalance 2019-07-27T23:34:55.796-07:00, failover:0:info:message(ns_1@172.23.105.105) - Starting failing over ['ns_1@172.23.105.47'] 2019-07-27T23:34:56.175-07:00, ns_memcached:0:info:message(ns_1@172.23.105.47) - Shutting down bucket "default" on 'ns_1@172.23.105.47' for deletion 2019-07-27T23:34:56.808-07:00, failover:0:info:message(ns_1@172.23.105.105) - Failed over ['ns_1@172.23.105.47']: ok 2019-07-27T23:34:56.838-07:00, failover:0:info:message(ns_1@172.23.105.105) - Deactivating failed over nodes ['ns_1@172.23.105.47'] 2019-07-27T23:34:56.842-07:00, ns_orchestrator:0:info:message(ns_1@172.23.105.105) - Graceful failover completed successfully.

          When ns_memcached:do_terminate is called with Reason other than

          {shutdown, reconfig} we'll do the bucket deletion with force = true. The times when {shutdown, reconfig}

          occur are when either:

          handle_cast({connect_done... does an ensure_bucket and gets an exception from memcached_bucket_config:get when it can't get the bucket info

          or

          do_ensure_bucket gets "restart" back from memcached_bucket_config:ensure which happens if any param has changed and has [restart] property (see memcached_bucket_config:arams(membase...

          I'll have to consult with Aliaksey Artamonau to get his thoughts on this.

          steve.watanabe Steve Watanabe added a comment - When ns_memcached:do_terminate is called with Reason other than {shutdown, reconfig} we'll do the bucket deletion with force = true. The times when {shutdown, reconfig} occur are when either: handle_cast({connect_done... does an ensure_bucket and gets an exception from memcached_bucket_config:get when it can't get the bucket info or do_ensure_bucket gets "restart" back from memcached_bucket_config:ensure which happens if any param has changed and has [restart] property (see memcached_bucket_config:arams(membase... I'll have to consult with Aliaksey Artamonau to get his thoughts on this.

          The result of this is that KV at the restart/bucket creation, warmup will ignore dead vbuckets and as such, the vbuckets which could of been warmed up (because they were correctly flushed as active or replica) must now be rebuilt.

          Is this a new behavior for memcached to not warmup dead vbuckets? What will happen when ns_server then changes the vbucket state to either active or replica? Will the vbucket wind up being empty? If so, then this behavior is unacceptable because it may arise in other scenarios.

          Aliaksey Artamonau Aliaksey Artamonau (Inactive) added a comment - The result of this is that KV at the restart/bucket creation, warmup will ignore dead vbuckets and as such, the vbuckets which could of been warmed up (because they were correctly flushed as active or replica) must now be rebuilt. Is this a new behavior for memcached to not warmup dead vbuckets? What will happen when ns_server then changes the vbucket state to either active or replica? Will the vbucket wind up being empty? If so, then this behavior is unacceptable because it may arise in other scenarios.

          Reading the linked ticket, and it appears that the answer to my question is: yes, a vbucket that was dead at warmup time will lose its data even if turned into an active or a replica vbucket. I think this is a problem and needs to be fixed. Otherwise in certain cases we will lose data when we are not supposed to.

          As to the issue of bucket deletion after delta recovery. Yes we could do better. Unfortunately the code is written in such a way that it'll be hard to change the way buckets are deleted. But it should be relatively easy to add an extra step to the delta recovery rebalance that will wait till all the data is persisted on the delta nodes. Is it worth the effort? Not sure.

          Aliaksey Artamonau Aliaksey Artamonau (Inactive) added a comment - Reading the linked ticket, and it appears that the answer to my question is: yes, a vbucket that was dead at warmup time will lose its data even if turned into an active or a replica vbucket. I think this is a problem and needs to be fixed. Otherwise in certain cases we will lose data when we are not supposed to. As to the issue of bucket deletion after delta recovery. Yes we could do better. Unfortunately the code is written in such a way that it'll be hard to change the way buckets are deleted. But it should be relatively easy to add an extra step to the delta recovery rebalance that will wait till all the data is persisted on the delta nodes. Is it worth the effort? Not sure.
          jwalker Jim Walker added a comment -

          Aliaksey Artamonau

          Is this a new behavior for memcached to not warmup dead vbuckets?

          3 years old, since 5.0 no MB or significant background to the change https://github.com/couchbase/kv_engine/commit/b379c2525d9b91bdf441c350d1b739fcd6ebcb22

          Will the vbucket wind up being empty?

          Yes, we will start up and have no vbucket at all for any of the dead ones, ns_server then corrects KV and a new empty vbucket will be created.

          In the discussions of MB-35326 it was a surprise that this happens, I certainly concluded that a VB state transition active->dead->replica with no crash should be logically equivalent to active->dead->warmup->replica, no loss of data from the node.

          Dave Rigby I vote that we effectively revert https://github.com/couchbase/kv_engine/commit/b379c2525d9b91bdf441c350d1b739fcd6ebcb22

          jwalker Jim Walker added a comment - Aliaksey Artamonau Is this a new behavior for memcached to not warmup dead vbuckets? 3 years old, since 5.0 no MB or significant background to the change https://github.com/couchbase/kv_engine/commit/b379c2525d9b91bdf441c350d1b739fcd6ebcb22 Will the vbucket wind up being empty? Yes, we will start up and have no vbucket at all for any of the dead ones, ns_server then corrects KV and a new empty vbucket will be created. In the discussions of MB-35326 it was a surprise that this happens, I certainly concluded that a VB state transition active->dead->replica with no crash should be logically equivalent to active->dead->warmup->replica, no loss of data from the node. Dave Rigby I vote that we effectively revert https://github.com/couchbase/kv_engine/commit/b379c2525d9b91bdf441c350d1b739fcd6ebcb22
          drigby Dave Rigby added a comment -

          Dave Rigby I vote that we effectively revert https://github.com/couchbase/kv_engine/commit/b379c2525d9b91bdf441c350d1b739fcd6ebcb22

          SGTM.

          Although I do think that ns_server should be more careful with controlled shutdown - force=true won't wait for outstanding mutations etc to flush, so it can lead to data-loss (assuming durability isn't used).

          drigby Dave Rigby added a comment - Dave Rigby I vote that we effectively revert https://github.com/couchbase/kv_engine/commit/b379c2525d9b91bdf441c350d1b739fcd6ebcb22 SGTM. Although I do think that ns_server should be more careful with controlled shutdown - force=true won't wait for outstanding mutations etc to flush, so it can lead to data-loss (assuming durability isn't used).
          jwalker Jim Walker added a comment -

          Dave Rigby I agree, a graceful failover with a non-graceful shutdown still seems like a legit problem.

          I'll move the warmup revert to a separate MB and this one can track the non-graceful issue - back to ns_server for this MB.

          jwalker Jim Walker added a comment - Dave Rigby I agree, a graceful failover with a non-graceful shutdown still seems like a legit problem. I'll move the warmup revert to a separate MB and this one can track the non-graceful issue - back to ns_server for this MB. KV warmup issue tracked as https://issues.couchbase.com/browse/MB-35599

          We only use force=true for failover (graceful or otherwise). So I don't think it's a critical issue: the recovery should ensure that no data (in case of graceful failover) is actually lost.

          Aliaksey Artamonau Aliaksey Artamonau (Inactive) added a comment - We only use force=true for failover (graceful or otherwise). So I don't think it's a critical issue: the recovery should ensure that no data (in case of graceful failover) is actually lost.

          People

            dfinlay Dave Finlay
            jwalker Jim Walker
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty