Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: 5.5.0
Affects Version/s: 4.6.0, 4.6.2, 5.0.0, 5.0.1
Component/s: couchbase-bucket, XDCR
Labels:
- vulcan-commited

Triage:
Untriaged
Is this a Regression?:
Unknown

Description

With XDCR a target node receiving DelWithMeta(key1) and that node does not know about key1, the requirement is that a delete of key1 is recreated (allowing future conflict resolution to occur).

A problem can occur when the source and target nodes compact and purge deleted documents (tombstones).

If the target node purges ahead of the source, the target node may correctly remove tombstones, but this leaves the source/target out of sync.
If an XDCR disconnect occurs before the source compacts, XDCR may 're-sync', i.e. ask the source for all documents from a historical seqno.
- The source node will then send to XDCR over local DCP all mutations and deletions, which will be replicated to the target node as SetWithMeta/DelWithMeta
- Many of these meta operations will be ignored, because conflict resolution spots if an incoming with-meta matches existing documents.
- The DelWithMeta which match keys that the target purged, means the target will re-create the deletes (queueing into checkpoints, writing to disk and writing to all local DCP clients).

For workloads which make heavy use of deletion (perhaps all documents have fixed time-to-live) this scenario may end up driving very high utilisation (recreating many many deletes).

The problem occurs because when we create a tombstone, it is always given a time of now() as it's creation time (compaction uses the tombstones creation time and it's now() to work out the age of a tombstone and its eligibility for removal).

So in our example of target/source compacting out of sync, we can presume that overtime we build up a nice steady stream of deletions, we are writing tombstones spaced apart by their real deletion time.

When the target compacts, it may purge n deletes had nicely spanned creation times.

When the disconnect and re-sync occurs, the target will effectively bulk recreate tombstones which did have a nicely spanned deletion, to the exact same creation time (depending on how many deletes are generated before the clock ticks).

So those n deletes all now have the same purge time, and still the workload is creating new deletions. You can visualise it with a simple histogram showing that before we had a nice spread of documents at each expiry creation time, but after the compact/resync tombstones move to the same timestamp.

▲

│

│

│

┌─────────┐   │

│ No. of  │   │

│ Deletes │   │                   ┌───┐

└─────────┘   │       ┌───┐       │   │       ┌───┐

              │ ┌───┐ │   │ ┌───┐ │   │ ┌───┐ │   │

              │ │   │ │   │ │   │ │   │ │   │ │   │

              │ │   │ │   │ │   │ │   │ │   │ │   │

              │ │   │ │   │ │   │ │   │ │   │ │   │

              └─┴───┴─┴───┴─┴───┴─┴───┴─┴───┴─┴───┴────────▶

                       ┌───────────────────────┐

                       │ Delete Creation Time  │

                       └───────────────────────┘

                             ┌───┐

               ▲             │   │

               │             │   │

               │             │   │

               │             │   │

 ┌─────────┐   │             │   │

 │ No. of  │   │             │   │

 │ Deletes │   │             │   │ ┌───┐

 └─────────┘   │             │   │ │   │       ┌───┐

               │             │   │ │   │ ┌───┐ │   │

               │             │   │ │   │ │   │ │   │

               │             │   │ │   │ │   │ │   │

               │             │   │ │   │ │   │ │   │

               └─────────────┴───┴─┴───┴─┴───┴─┴───┴────────▶

                       ┌───────────────────────┐

                       │ Delete Creation Time  │

                       └───────────────────────┘

Now nothing stops this process cycling and the large group of tombstones can itself combine with other compact/disconnect cycles and grow, and overtime this cluster gets large and expensive to re-sync.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Screen Shot 2018-03-19 at 7.59.14 PM.png
90 kB
19/Mar/18 8:31 PM
Screen Shot 2018-03-19 at 7.59.24 PM.png
93 kB
19/Mar/18 8:31 PM
Screen Shot 2018-03-19 at 8.29.23 PM.png
103 kB
19/Mar/18 8:31 PM
Screen Shot 2018-03-19 at 8.29.29 PM.png
131 kB
19/Mar/18 8:31 PM

Issue Links

blocks

MB-27811 Enable delete_times on replication

Resolved

causes

MB-29861 Tombstoned documents missing after server upgrade

Closed

MB-33919 tombstone 'delete time' can be in the past or far future

Closed

Sub-Tasks

Enable delete_times on replication

Closed

Dave Finlay

Gerrit Reviews

- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-27457
#	Subject	Branch	Project	Status	CR	V
88297,7	MB-27457: [1/n] Stub out a new dcp_deletion engine callback	master	kv_engine	Status: MERGED	+2	+1
88298,8	MB-27457: [2/n] Track if a connection has requested delete-time	master	kv_engine	Status: MERGED	+2	+1
88299,12	MB-27457: [3/n] Give dcp_expiry its own packet definition	master	kv_engine	Status: MERGED	+2	+1
88300,12	MB-27457: [4/n] Stub out dcp delete_v2 consumer callback	master	kv_engine	Status: MERGED	+2	+1
88301,17	MB-27457: [5/n] Allow KV-engine to set the delete time	master	kv_engine	Status: MERGED	+2	+1
88302,17	MB-27457: [6/n] Allow DCP producer to send the delete time	master	kv_engine	Status: MERGED	+2	+1
88303,21	MB-27457: [8/n] Add delete-times flag to DCP_OPEN	master	kv_engine	Status: MERGED	+2	+1
88428,19	MB-27457: [7/n] Update the dcp_deletion protocol and executors/validators	master	kv_engine	Status: MERGED	+2	+1
88438,19	MB-27457: [9/n] Remove collection parameter from prior deletion API	master	kv_engine	Status: MERGED	+2	+1
89113,1	MB-27457: [6/n] Allow DCP producer to send the delete time	master	kv_engine	Status: ABANDONED	0	0
90075,3	MB-27457 enable deletion time when creating dcp connections	master	goxdcr	Status: MERGED	+2	+1
90076,3	MB-27457 add deletion time to DCP_DELETION	master	gomemcached	Status: MERGED	+2	+1