Details
-
Bug
-
Resolution: Fixed
-
Critical
-
3.1.3, 3.1.4, 3.1.5, 4.0.0, 4.1.0, 4.1.1, 4.5.0
-
None
-
Untriaged
-
Yes
Description
Clone of MB-19635 for 3.x branch
Offline upgrade from 2.5 to any 3.x+ release results in active and replica vbuckets with UUIDs of zero. Aliaksey noticed this odd behavior and Jim Walker found the buggy code:
vbuuid's are broken when warming up a 3.x node with 2.x vbucket files.
http://src.couchbase.org/source/xref/3.1.3/ep-engine/src/couch-kvstore/couch-kvstore.cc#1883
Here we have a "default" failover table, and if the vbucket doesn't contain one, we use that. The 2.5.2 vbucket files will not contain a failover table so we just use that JSON uuid and it becomes the vbucket's UUID (stored in cachedVbState) -
This is quite a hole in 2.5.2 to 3. upgrades and appears to have been there since 3.0.
When the TAP -> DCP conversion rebalance happens we shut down the TAP stream and start a DCP stream. The replica vbucket requests a stream sending its high sequence number and because the UUIDs of active and replica are the same the active only rollsback to its high sequence number instead of zero, which it should rollback to (because the high sequence numbers of the active and replica in the world of TAP bear no relation to each other.)
This means that some of the data from the active vbucket will not make it to the replica and a failover will likely result in data loss.
I'll clone a separate ticket to track fixing this in 3.1.6.