Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 4.6.0
Affects Version/s: 3.1.6, 4.1.2, 4.5.1, 4.6.0
Component/s: couchbase-bucket
Labels:
- datacorruption

Triage:
Triaged
Flagged:

Release Note
Is this a Regression?:
No

Description

During investigation of ~~MB-21568~~ I found that when ep-engine performs a rollback, if a key needs to be reverted to a previous value (it has been modified since the rollbackSeqno), the HashTable update fails due to being called incorrectly.

The effect of this is that after rollback, old values for keys still exist in memory (disk is correct) on the replica vBucket(s). In the event of a subsequent failover (and promotion of the replica -> active), incorrect values will be returned for such keys.

Steps to reproduce

Start a two node cluster (./cluster_run --nodes=2, ./cluster_connect -n2
Populate some data - for simplicy target a single vbucket (vb_0) using the below write_to_vb.py script.
Disable persistence on n_0 (the node which has the active instance of vb_0). This simulates the persistence queue being behind replication.
Update the values for a subset of the keys created in (2) - note it is necessary to modify less than half of the original keys - if more than 50% of the vBucket is changed then rollback will simply rollback to zero.
Send SIGKILL to n_0's memcached. As this has persistence disabled, it will restart with a high seqno less than the replica, so when n_1 re-connects to DCP it will be told to rollback.
Finally, failover n_0, causing the replica on n_1 to be promoted to active.

Script to accomplish the first 5 steps of this (perform failover at the UI):

repro.sh
#!/bin/bash

set -e

BASEDIR=$(dirname "$0")
PYTHONPATH=/Users/dave/repos/couchbase/server/source/ep-engine/management

${BASEDIR}/write_to_vb.py 127.0.0.1 12000 0 key_ 10 one
../install/bin/cbepctl 127.0.0.1:12000 drain
../install/bin/cbepctl 127.0.0.1:12002 drain
../install/bin/cbepctl 127.0.0.1:12000 stop
${BASEDIR}/write_to_vb.py 127.0.0.1 12000 0 key_ 1 two
# Kill first nodes' memcached
kill -9 $(pgrep -l -f memcached\|grep /n_0/\|cut -d ' ' -f 1)

Accompanying script to write to a specific vbucket:

write_to_vb.py
#!/usr/bin/env python

# Writes documents to a single vBucket.

import mc_bin_client
import sys

if len(sys.argv) < 6:
print >> sys.stderr, "Usage: {} <host> <port> <vbid> <prefix> <count> <value>".format(sys.argv[0])
sys.exit(1)
client = mc_bin_client.MemcachedClient(host=sys.argv[1], port=int(sys.argv[2]))
client.vbucketId = int(sys.argv[3])
for i in range(int(sys.argv[5])):
client.set(sys.argv[4] + str(i), 0, 0, sys.argv[6])

Expected Result

The value of key_0 should be the original value (i.e. "one") when queried:

$ PYTHONPATH=$SRC_ROOT/ep-engine/management python -c "import mc_bin_client; client = mc_bin_client.MemcachedClient('127.0.0.1', 12002); print client.get('key_0')"

(0, 1478267032502272, 'one')

Actual Result

The value of key_0 has not been rolled back:

$ PYTHONPATH=$SRC_ROOT/ep-engine/management python -c "import mc_bin_client; client = mc_bin_client.MemcachedClient('127.0.0.1', 12002); print client.get('key_0')"

(0, 1478267032502272, 'two')

Attachments

Issue Links

relates to

MB-21568 rollback may leave hashtable inconsistent with on-disk data

Closed

Sub-Tasks

1.	[BP 4.x] Rollback fails to restore correct version of documents modified since rollback seqno		Closed	Dave Rigby (Inactive)
2.	[BP 3.x] Rollback fails to restore correct version of documents modified since rollback seqno		Closed	Dave Rigby (Inactive)