Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19635

Offline upgrade from 2.5 to any 3.x or 4.x release results in potential data loss

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 3.1.3, 3.1.4, 3.1.5, 4.0.0, 4.1.0, 4.1.1, 4.5.0
    • 4.1.2, 4.5.0
    • couchbase-bucket
    • None
    • Untriaged
    • Yes

    Description

      Offline upgrade from 2.5 to any 3.x+ release results in active and replica vbuckets with UUIDs of zero. Aliaksey noticed this odd behavior and Jim Walker found the buggy code:

      vbuuid's are broken when warming up a 3.x node with 2.x vbucket files.
      http://src.couchbase.org/source/xref/3.1.3/ep-engine/src/couch-kvstore/couch-kvstore.cc#1883
      Here we have a "default" failover table, and if the vbucket doesn't contain one, we use that. The 2.5.2 vbucket files will not contain a failover table so we just use that JSON uuid and it becomes the vbucket's UUID (stored in cachedVbState) -
      This is quite a hole in 2.5.2 to 3. upgrades and appears to have been there since 3.0.

      When the TAP -> DCP conversion rebalance happens we shut down the TAP stream and start a DCP stream. The replica vbucket requests a stream sending its high sequence number and because the UUIDs of active and replica are the same, the active only rolls-back to its high sequence number instead of zero, which it should rollback to (because the high sequence numbers of the active and replica in the world of TAP bear no relation to each other.)

      This means that some of the data from the active vbucket will not make it to the replica and a failover will likely result in data loss.

      I'll clone a separate ticket to track fixing this in 3.1.6.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            drigby Dave Rigby added a comment -

            MB-19636 is the cloned bug tracking the 3.1.x fix version.

            drigby Dave Rigby added a comment - MB-19636 is the cloned bug tracking the 3.1.x fix version.
            dfinlay Dave Finlay added a comment -

            KV Engine bug scrub: fix should be small and because this may result in non trivial data loss (and we believe customers have run into this in the past) we will keep this and fix this week.

            dfinlay Dave Finlay added a comment - KV Engine bug scrub: fix should be small and because this may result in non trivial data loss (and we believe customers have run into this in the past) we will keep this and fix this week.
            dfinlay Dave Finlay added a comment -

            Targeting merge by end of day 5/19.

            dfinlay Dave Finlay added a comment - Targeting merge by end of day 5/19.
            drigby Dave Rigby added a comment -

            Committed to ep-engine/watson: http://review.couchbase.org/#/c/64119/

            drigby Dave Rigby added a comment - Committed to ep-engine/watson: http://review.couchbase.org/#/c/64119/
            drigby Dave Rigby added a comment -

            Committed to ep-engine/sherlock (4.1.2): http://review.couchbase.org/#/c/64155/

            drigby Dave Rigby added a comment - Committed to ep-engine/sherlock (4.1.2): http://review.couchbase.org/#/c/64155/

            Another symptoms of this defect is that GSI will not build (i.e build progress will be zero) when the vbucket UUIDs is 0.

            pvarley Patrick Varley added a comment - Another symptoms of this defect is that GSI will not build (i.e build progress will be zero) when the vbucket UUIDs is 0.

            People

              jwalker Jim Walker
              dfinlay Dave Finlay
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty