Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-53081

DCP Consumers can successfully StreamRequest from KV node in warmup which is about-to-be failed over

    XMLWordPrintable

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 6.6.2
    • Morpheus
    • couchbase-bucket
    • None
    • Untriaged
    • 1
    • Unknown

    Description

      This MB is spun out from MB-53076, when the following sequence showed that Indexing managed to reconnect to a KV node and perform DCP StreamRequests before the node had trafficEnabled by ns_server - and crucially a few seconds before the KV node was failed over and vBucket promoted elsewhere:

      a. At 2022-07-21T17:39:12.120-07:00, memcached exited

      2022-07-21T17:39:12.120-07:00, ns_log:0:info:message(ns_1@172.23.100.15) - Service 'memcached' exited with status 137. Restarting. Messages:

      b. Due to memcached exit, indexer received a streamEnd. Considering vb:141 for our analysis, the last received seqno. is 4168

      2022-07-21T17:39:12.111-07:00 [Info] TK StreamEnd MAINT_STREAM test3 141 77409272469274 4168

      c. Indexer triggered stream repair due to stream end. For vb:141, indexer restarted the stream with last received seqno. 4168 and a snapshot of {4163, 4168}

      2022-07-21T17:39:16.284-07:00 [Info] Indexer::handleKVStreamRepair Initiate Stream Repair MAINT_STREAM Bucket test3 StreamId 1
      2022-07-21T17:39:16.284-07:00 [Info] Indexer::startBucketStream Stream: MAINT_STREAM Bucket: test3 SessionId 1 RestartTS bucket: test3, vbuckets: 1024 Crc64: 0 snapType NO_SNAP -
          {vbno, vbuuid, seqno, snapshot-start, snapshot-end}
          {  141     46674030171a       4168       4163       4168}

      d.For this stream request, memcached (on node 100.15) asked projector to rollback to 4109 

      2022-07-21T17:39:17.706652-07:00 WARNING 382: (test3) DCP (Producer) eq_dcpq:secidx:proj-test3-MAINT_STREAM_TOPIC_0765f2d280c663c382dea1a1b49ab1cd-2556451229124417097/1 - (vb:141) Stream request requires rollback to seqno:4109 because consumer ahead of producer - producer upper at 4109. Client requested seqnos:{4168,18446744073709551615} snapshot:{4163,4168} uuid:77409272469274

      e. After this, the KV node 100.15 failed over

      2022-07-21T17:39:18.234-07:00, failover:0:info:message(ns_1@172.23.100.15) - Starting failing over ['ns_1@172.23.100.15'] 
      ...
      2022-07-21T17:39:19.308-07:00, ns_orchestrator:0:info:message(ns_1@172.23.100.15) - Failover completed successfully.
      

      ... Indexer goes on to connect to new node holding vb:141, issues StreamRequest, is told to rollback again ...

      The problem here is that Indexing has had to perform an extra rollback / repair based on VBucket UUID & sequence numbers on the "just crashed" active node, before ns_server had decided this node is healthy and enabled traffic on it.

      I believe this is a long-standing behaviour in KV - from the comments in the code:

       * KV-engine has the following behaviour as warmup runs.
       *
       * Whilst the following phases are incomplete:
       *
       *    Initialise
       *    CreateVBuckets
       *    LoadingCollectionCounts
       *    EstimateDatabaseItemCount
       *    LoadPreparedSyncWrites
       *    PopulateVBucketMap
       *
       *  1) setVBucket requests are queued (using the EWOULDBLOCK mechanism)
       *  2) The VBucket map is empty. No operation can find a VBucket object to
       *     operate on. For CRUD operations externally clients will see
       *     'temporary_failure' instead of 'not_my_vbucket'.
       *  3) DCP Consumers cannot be created.
       *  4) DCP Producers can be created, but stream-request will fail with
       *     'not_my_vbucket'.
       *
       * On completion of PopulateVBucketMap:
       *
       *  1) All queued setVBucket requests are notified and all new setVBucket
       *     requests can be processed.
       *  2) The VBucket map is populated and operations can now find VBucket objects.
       *     * DCP producer stream-requests can now be processed.
       <cut>
      

      Once PopulateVBucketMap step has completed, DCP StreamRequests can be (successfully) processed.

      While this is desirable in terms of bringing replicas / other DCP consumers back online and consistent asap, it does have the downside seen the the above scenario - a DCP consumer could end up having to discard data / perform costly updates of it's own state based on information from a node which is not "ready" from ns_server's POV and is soon-to-be failed over.

      Supportal
      http://supportal.couchbase.com/snapshot/551490e6c919e74d343537e9e2e38887::0

      https://cb-engineering.s3.amazonaws.com/MB-53075_functional/collectinfo-2022-07-22T004733-ns_1%40172.23.100.15.zip
      https://cb-engineering.s3.amazonaws.com/MB-53075_functional/collectinfo-2022-07-22T004733-ns_1%40172.23.100.16.zip
      https://cb-engineering.s3.amazonaws.com/MB-53075_functional/collectinfo-2022-07-22T004733-ns_1%40172.23.100.17.zip
      https://cb-engineering.s3.amazonaws.com/MB-53075_functional/collectinfo-2022-07-22T004733-ns_1%40172.23.100.19.zip
      https://cb-engineering.s3.amazonaws.com/MB-53075_functional/collectinfo-2022-07-22T004733-ns_1%40172.23.100.22.zip
      https://cb-engineering.s3.amazonaws.com/MB-53075_functional/collectinfo-2022-07-22T004733-ns_1%40172.23.121.215.zip

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          drigby Dave Rigby added a comment -

          CC Dave Finlay / Deepkaran Salooja / Varun Velamuri.

          Should KV delay accepting StreamRequests until some later point?

          One slightly confusing thing here is exactly why ns_server triggered an auto-failed over - it explicilty told KV to create a bunch of buckets (post-crash) so why does it go on to failover this node?

          drigby Dave Rigby added a comment - CC Dave Finlay / Deepkaran Salooja / Varun Velamuri . Should KV delay accepting StreamRequests until some later point? One slightly confusing thing here is exactly why ns_server triggered an auto-failed over - it explicilty told KV to create a bunch of buckets (post-crash) so why does it go on to failover this node?
          dfinlay Dave Finlay added a comment -

          Hey Dave

          I haven't looked at the logs in detail - but this is how it works. NS server doesn't know that the node will later fail over.

          It's possible that:

          • the buckets will be ready before the auto failover timeout elapses
          • another node will become unreachable at which point auto-failover of any one node is not possible

          Etc. So the bucket is created immediately in memcached post-crash – and then failover will happen if all the conditions are met and the autofailover timeout has passed. (And there are actually a lot of conditions to check.)

          dfinlay Dave Finlay added a comment - Hey Dave I haven't looked at the logs in detail - but this is how it works. NS server doesn't know that the node will later fail over. It's possible that: the buckets will be ready before the auto failover timeout elapses another node will become unreachable at which point auto-failover of any one node is not possible Etc. So the bucket is created immediately in memcached post-crash – and then failover will happen if all the conditions are met and the autofailover timeout has passed. (And there are actually a lot of conditions to check.)
          drigby Dave Rigby added a comment -

          Agreed - obviously one cannot predict the future and if other nodes will have problems / buckets are ready.

          I guess the question here is - do we want to allow (non-replica?) DCP consumes to connect to KV this early in warmup?

          I'm not sure what the answer is - and even exactly what the alternative might be - it just "feels" like KV is not really helping out DCP consumers here by telling them to rollback after it has just crashed / restarted, and it's essentially on a "ticking clock" to complete warmup within some small time before it gets failed over.

          Perhaps this should be marked as an Improvement - I don't think there's strictly a "Bug" here, but we could potentially increase the chances that Indexing et al don't need to rollback as much if this situation didn't occur...

          drigby Dave Rigby added a comment - Agreed - obviously one cannot predict the future and if other nodes will have problems / buckets are ready. I guess the question here is - do we want to allow (non-replica?) DCP consumes to connect to KV this early in warmup? I'm not sure what the answer is - and even exactly what the alternative might be - it just "feels" like KV is not really helping out DCP consumers here by telling them to rollback after it has just crashed / restarted, and it's essentially on a "ticking clock" to complete warmup within some small time before it gets failed over. Perhaps this should be marked as an Improvement - I don't think there's strictly a "Bug" here, but we could potentially increase the chances that Indexing et al don't need to rollback as much if this situation didn't occur...

          Dave Rigby, I think I would agree that it is probably better to let warmup finish before allowing non-replica DCP consumers to connect. A lot of the times the vbuuids, snapshot markers etc received in this state(specially if the node eventually fails over), have a good chance of getting DCP consumer in a state where it does need some sort of rollback.

          deepkaran.salooja Deepkaran Salooja added a comment - Dave Rigby , I think I would agree that it is probably better to let warmup finish before allowing non-replica DCP consumers to connect. A lot of the times the vbuuids, snapshot markers etc received in this state(specially if the node eventually fails over), have a good chance of getting DCP consumer in a state where it does need some sort of rollback.
          dfinlay Dave Finlay added a comment - - edited

          Well, this is a pretty interesting question.

          If the vbucket is about to be failed over, waiting seems smart because the replica may be in better shape. If the vbucket is not going to be failed over then waiting just adds extra time when the vbucket is unavailable. I'm hesitant to make a call on what exactly is best here - still thinking - but I can see merit to changing the behavior to disallow DCP streams connect until the bucket is ready.

          dfinlay Dave Finlay added a comment - - edited Well, this is a pretty interesting question. If the vbucket is about to be failed over, waiting seems smart because the replica may be in better shape. If the vbucket is not going to be failed over then waiting just adds extra time when the vbucket is unavailable. I'm hesitant to make a call on what exactly is best here - still thinking - but I can see merit to changing the behavior to disallow DCP streams connect until the bucket is ready.

          People

            drigby Dave Rigby
            drigby Dave Rigby
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty