Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.1.0
Affects Version/s: 7.0.0
Component/s: ns_server
Labels:
- request-dev-verify

Triage:
Untriaged
Story Points:
1
Is this a Regression?:
Unknown

Description

I'm filing this as a bug as currently my thinking it's something that needs to be fixed, but this can also be considered some kind of improvement.

In any case, I hit the issue this ticket describes when I was running some tests that created and dropped scopes and collections in quick succession.

I tried to create bucket c_0 on scope s1 in bucket b_3487 and it failed stating that c_0 already exists:

[ns_server:debug,2021-08-19T11:51:57.149-07:00,n_2@127.0.0.1:collections<0.823.0>:collections:do_update:284]Performing operation {create_collection,"s1","c_0",[]} on bucket "b_3487"

[ns_server:debug,2021-08-19T11:51:57.149-07:00,n_2@127.0.0.1:kv<0.248.0>:collections:do_update_with_manifest:326]Perform operation {create_collection,"s1","c_0",[]} on manifest 34 of bucket "b_3487"

...

[ns_server:debug,2021-08-19T11:51:57.149-07:00,n_2@127.0.0.1:kv<0.248.0>:collections:perform_operations:367]Operation {create_collection,"s1","c_0",[]} failed with error {collection_already_exists,

                                                               "s1","c_0"}

You can see it's operating on manifest 34.

However, collection c_0 was dropped from the manifest a 90 milliseconds earlier on n_1. (Note that these nodes are all running on the same machine so the timestamps are pretty comparable.)

[ns_server:debug,2021-08-19T11:51:57.059-07:00,n_1@127.0.0.1:kv<0.249.0>:collections:do_update_with_manifest:326]Perform operation {drop_collection,"s1","c_0"} on manifest 34 of bucket "b_3487"

...

[ns_server:debug,2021-08-19T11:51:57.141-07:00,n_1@127.0.0.1:ns_audit<0.619.0>:ns_audit:handle_call:148]Audit drop_collection: [{local,{[{ip,<<"127.0.0.1">>},{port,9001}]}},

			{remote,{[{ip,<<"127.0.0.1">>},{port,52842}]}},

			{real_userid,{[{domain,builtin},

                                       {user,<<"<ud>Administrator</ud>">>}]}},

			{timestamp,<<"2021-08-19T11:51:57.141-07:00">>},

			{new_manifest_uid,<<"23">>},

			{collection_name,<<"c_0">>},

			{scope_name,<<"s1">>},

			{bucket_name,<<"b_3487">>}]

This was also against manifest 34 and it clearly succeeded.

This is a 3 node cluster so collection manifest updates only need to reach 2 nodes before the change is considered committed, which means it's possible the third node hasn't received the updates before another manifest update arrives.

This wouldn't have happened if the client I used (the Java client) sent the collection changes to the same nodes every time. However, on occasion the client will need to switch servers for these kinds of requests due to failover etc, so I don't believe it's a principled fix to this issue to change the client to always target the same server node.

My current view that the way to address this behavior is to do a quorum read on the manifest before performing checks as I think this is quite a bit nicer for users and changes to the manifest should in general be infrequent enough that we can afford the quorum read.

Alternatively we could add some kind of read-consistency option to the collection / scope management REST APIs. Though even in this case, I think the default should be to quorum read.

I am interested in people's opinions on this topic.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

n_0.zip
10.32 MB
19/Aug/21 12:27 PM
n_1.zip
6.75 MB
19/Aug/21 12:27 PM
n_2.zip
9.88 MB
19/Aug/21 12:27 PM

Issue Links

relates to

MB-46643 Couchbase fails to create collection index right after the creation of a collection

Reopened

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Dave Finlay

Reporter:: Dave Finlay

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 19/Aug/21 12:26 PM

Updated:: 18/Mar/22 10:30 AM

Resolved:: 24/Nov/21 7:25 AM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

MB-48063: Do quorum read on manifest for collection update: Gerrit Review:

ns_server should do a quorum read on collection manifests before performing checks

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty