Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-11331

Issue with failover log generating new request with the same seqno

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • 3.0
    • 3.0
    • couchbase-bucket
    • Security Level: Public
    • None
    • Untriaged
    • Unknown

    Description

      From Alk,

      I spotted this:

      MB-11085: Always create a new failover entry on unclean shutdowns

      In the past we wouldn't generate a new failover entry if the high
      seqno number on disk was the same after a crash. This is incorrect
      because it is possible that the server did receive mutations and
      replicated them without persisting them before the crash. If this
      happens the consumers of upr streams will no roll back their data
      properly because the failover entry will not change on the server.

      Change-Id: I8c6bab504f0be3298e1e888dbe6f3fac9c3fa905
      Reviewed-on: http://review.couchbase.org/37670
      Reviewed-by: Chiyoung Seo <chiyoung@couchbase.com>
      Tested-by: Michael Wiederhold <mike@couchbase.com>

      And tried it's behavior in practice. It looks like it has reverted to old behavior where it would silently overwrite last failover-history entry uuid if last seqno equals failover-entry seqno.

      Thinking about this more I believe it might be fine. But it has interesting consequences.

      If I understand failover-history entry seqno as "seqno just before start of new failover 'era'" then it appears perfectly fine to do that.

      However I think I'll need to change my code to accomodate for that. And some other upr consumers might have to as well. This is because my checkpointing code always assumes that latest seqno always "belongs" to latest failover-history entry. Which is clearly not the case when last seqno = seqno-of-last-failover-history-entry. In the later case seqno actually belongs to previous entry.

      I can adapt my code. Or we can add a simple tweak to upr where it'll create empty, "bubble" seqno when it starts new failover history entry. In that case you will never have a case where on restart your last seqno = last-failover-history-entry-seqno. And there's no problem.
      I'm pretty sure that this corner case affects not just xdcr. And I'm willing to bet that nobody handles it right yet. So we need to resolve this case asap.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              mikew Mike Wiederhold [X] (Inactive)
              mikew Mike Wiederhold [X] (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty