Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-21587

Rollback fails to restore correct version of documents modified since rollback seqno

    XMLWordPrintable

Details

    • Triaged
    • Release Note
    • No

    Description

      During investigation of MB-21568 I found that when ep-engine performs a rollback, if a key needs to be reverted to a previous value (it has been modified since the rollbackSeqno), the HashTable update fails due to being called incorrectly.

      The effect of this is that after rollback, old values for keys still exist in memory (disk is correct) on the replica vBucket(s). In the event of a subsequent failover (and promotion of the replica -> active), incorrect values will be returned for such keys.

      Steps to reproduce

      1. Start a two node cluster (./cluster_run --nodes=2, ./cluster_connect -n2
      2. Populate some data - for simplicy target a single vbucket (vb_0) using the below write_to_vb.py script.
      3. Disable persistence on n_0 (the node which has the active instance of vb_0). This simulates the persistence queue being behind replication.
      4. Update the values for a subset of the keys created in (2) - note it is necessary to modify less than half of the original keys - if more than 50% of the vBucket is changed then rollback will simply rollback to zero.
      5. Send SIGKILL to n_0's memcached. As this has persistence disabled, it will restart with a high seqno less than the replica, so when n_1 re-connects to DCP it will be told to rollback.
      6. Finally, failover n_0, causing the replica on n_1 to be promoted to active.

      Script to accomplish the first 5 steps of this (perform failover at the UI):

      repro.sh

      #!/bin/bash
       
      set -e
       
      BASEDIR=$(dirname "$0")
      PYTHONPATH=/Users/dave/repos/couchbase/server/source/ep-engine/management
       
      ${BASEDIR}/write_to_vb.py 127.0.0.1 12000 0 key_ 10 one
      ../install/bin/cbepctl 127.0.0.1:12000 drain
      ../install/bin/cbepctl 127.0.0.1:12002 drain
      ../install/bin/cbepctl 127.0.0.1:12000 stop
      ${BASEDIR}/write_to_vb.py 127.0.0.1 12000 0 key_ 1 two
      # Kill first nodes' memcached
      kill -9 $(pgrep -l -f memcached|grep /n_0/|cut -d ' ' -f 1)
      

      Accompanying script to write to a specific vbucket:

      write_to_vb.py

      #!/usr/bin/env python
       
      # Writes documents to a single vBucket.
       
      import mc_bin_client
      import sys
       
      if len(sys.argv) < 6:
          print >> sys.stderr, "Usage: {} <host> <port> <vbid> <prefix> <count> <value>".format(sys.argv[0])
          sys.exit(1)
      client = mc_bin_client.MemcachedClient(host=sys.argv[1], port=int(sys.argv[2]))
      client.vbucketId = int(sys.argv[3])
      for i in range(int(sys.argv[5])):
          client.set(sys.argv[4] + str(i), 0, 0, sys.argv[6])
      

      Expected Result

      The value of key_0 should be the original value (i.e. "one") when queried:

      $ PYTHONPATH=$SRC_ROOT/ep-engine/management python -c "import mc_bin_client; client = mc_bin_client.MemcachedClient('127.0.0.1', 12002); print client.get('key_0')"
      (0, 1478267032502272, 'one')
      

      Actual Result

      The value of key_0 has not been rolled back:

      $ PYTHONPATH=$SRC_ROOT/ep-engine/management python -c "import mc_bin_client; client = mc_bin_client.MemcachedClient('127.0.0.1', 12002); print client.get('key_0')"
      (0, 1478267032502272, 'two')
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              drigby Dave Rigby (Inactive)
              drigby Dave Rigby (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  PagerDuty