Loading...

Details

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: 2.1.0
Component/s: None
Security Level: Public
Labels:
None

Description

Question #1: Main issue is unexpected behavior of XCDR. My team saying that they were able to produce a situation in which "old" keys overriding newer keys on different cluster.

I will approach this by explaining how the XDCR conflict resolution works. I hope that will be the information you need.

If you have further concerns about the behavior, I'll need to know whether you're observing that Couchbase does not behave as expected (i.e., it appears to be having some wrong behavior according to the definition), OR if the conflict resolution that Couchbase does is not suitable for your specific needs in this case.

I'll also ask you to provide a clear description of what behavior you're seeing and how it either conflicts with the expected resolution policy, or how it is causing problems specifically for your use case.

This is described in the "Document Handling, and Conflict Resolution" section on this page:

http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-admin-tasks-xdcr-functionality.html

I quote:

XDCR automatically performs conflict resolution between the source and destination clusters and is designed to ensure that changes to individual documents are replicated successfully. For each stored document, XDCR looks at the following items to create a check value to resolve conflicts:
Numerical sequence, which is incremented on each mutation
CAS value
Document flags
Expiration (TTL) value
During conflict resolution, XDCR sequentially checks the values until it identifies the document with the highest value. XDCR will use this version of the document for replication. The algorithm is designed to consistently select the same document on either a source or destination cluster.

The key point here is that the main determiner of which document will be selected is which one has been modified most frequently.

As an example, assume two clusters, A and B, with bi-directional replication streams. At time T1, both clusters have the same revision of the document, let's call it revision 8. To think about it more easily, let's say that at this point the link between the two clusters goes down temporarily, and clients are modifying each cluster independently. Examine these events:

T1: initial state: A=8, B=8
T2: link goes down
T3: client updates doc on A: A=9, B=8
T4: client updates doc on A: A=10, B=8
T5: client updates doc on B: A=10, B=9
T6: link goes up, A's version wins: A=10, B=10

I think this is the simplest example I can give. Here we see that the most recent change is not necessarily the one that will win. Rather, it is the version of the doc which has seen the most updates. Indeed, "most recent change" is difficult to determine precisely in distributed systems, and is not reliable when needing to resolve conflicts from N clusters. The algorithm Couchbase uses ensures that each cluster can independently come to the same consistent view of which document wins.

However, this example does point out a case where that algorithm is counter-intuitive. Particularly if there are several minutes, or hours, between T4 and T5, one would expect the revision from cluster B to win, because it was more recent. There is also an intuitive aspect to the "highest revision number wins" algorithm, in that the change made on B was ignorant of the 2 previous changes on A; cluster B had less information available to it than cluster A.

Regardless of which makes more sense to you, this is how Couchbase is defined to work.

In some cases it makes sense to have some cluster-specific documents that ensure there will be no data lost due to conflict resolution. For example, imagine that in our example, the document is a simple integer counter that gets incremented. In the above example, assume the counter starts out at 100:

T1: A=100, B=100
T3: A=101, B=100
T4: A=102, B=100
T5: A=102, B=101
T6: A=102, B=102 // Oops, this should be 103!

It depends on what the counter represents. If it's a statistic counting some common event, it may be fine to drop an increment here or there and not really make any difference, so this may be fine. But if it's counting coins in a bank account, it might be much more important. We could split this into two different documents, then, that track updates on each cluster separately. Like this:

T1: Aa=50, Ab=50, Ba=50, Bb=50 // Total of counter a + b is 100
T3: Aa=51, Ab=50, Ba=50, Bb=50
T4: Aa=52, Ab=50, Ba=50, Bb=50
T5: Aa=52, Ab=50, Ba=50, Bb=51
T6: Aa=52, Ab=51, Ba=52, Bb=51 // No conflicts, total is 103 as expected

So a client that wants the value for this counter needs to get counter_a, counter_b, counter_..., and sum them to get the actual count.

So that's it. Let us know if this resolves the issue for you or not.

Question #2: We created a simple view that aggregate counters stored in JSON format. See the errors in the log below.

[couchdb:error,2013-03-06T9:11:47.600,ns_1@172.31.0.65:<0.32666.4809>:couch_log:error:42]Set view `cdb`, main group `_design/dev_counters`, received error from updater:

{too_large_btree_state, 70447}

…

[couchdb:error,2013-03-06T9:11:37.671,ns_1@172.31.0.62:<0.9322.2210>:couch_log:error:42]Set view `cdb`, main group `_design/dev_counters`, writer error
error: {error, {reduction_too_long, <<0,0,0,2,10,… (many more numbers, very long)>>
…

The errors you're getting say: "too_large_btree_state" and "reduction_too_long". This says that you have defined a custom reduce function in your view, and it is growing too fast. It isn't in fact reducing, instead the value computed there is growing larger and larger along with the number of items.

A proper reduction will be of constant size, regardless of the number of items in the view. A simple example of this is the _sum reduction. It is just a single number, no matter how many items are contributing to the sum.

An invalid reduction will grow bigger as more items are aggregated. A simple example is "reducing" to a list of values, one for each document. The reduction value at the root of the B-tree (that is, the reduction over all values in the index) will have N elements in that list.

When you realize that the reduction value is stored in each non-leaf B-tree node, which reduces all the nodes under it, you'll see that such a growing reduction will use up a very large amount of space. The B-tree state grows too large; the reduction grows too long.

You don't need to limit reduce values to simple integers. It's OK, for example, to have a reduce with a dictionary (object) with a finite number of members. Or an array with a small number of elements. Etc. But you can't have a reduce value that continues to grow as the size of the input, that will explode the index very quickly.

Question #3: We see errors on the log about cluster taking too long to retrieve key (5 second!!) .

I suspect this problem is related to #2. We'll need to get that sorted out first. I recommend you stick to just the built-in reduce functions (_sum, _count, _stats) first, and ensure things are working with that. Often it is more efficient to put some of the summary logic into the client instead of into the index itself. Also, a reasonable ballpark for the number of views is around 10-20; if you're creating 100 views, for example, probably you're wanting to use a fulltext index instead.

Once that kind of issue is addressed, if you're still seeing problems, we'll want to look at your design documents in full, look at the complete error logs, and dig deeper into it.

Here is a tool we ask you to use when collecting stats and logs from a Couchbase system. Probably we don't need this immediately, but it is helpful to know about for future problem reports:

http://www.couchbase.com/wiki/display/couchbase/Working+with+the+Couchbase+Technical+Support+Team

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Views FAQ from Support/Customer

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty