Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.1.0
Affects Version/s: 7.0.0
Component/s: ns_server
Labels:
- request-dev-verify

Triage:
Untriaged
Story Points:
1
Is this a Regression?:
Unknown

Description

I'm filing this as a bug as currently my thinking it's something that needs to be fixed, but this can also be considered some kind of improvement.

In any case, I hit the issue this ticket describes when I was running some tests that created and dropped scopes and collections in quick succession.

I tried to create bucket c_0 on scope s1 in bucket b_3487 and it failed stating that c_0 already exists:

[ns_server:debug,2021-08-19T11:51:57.149-07:00,n_2@127.0.0.1:collections<0.823.0>:collections:do_update:284]Performing operation {create_collection,"s1","c_0",[]} on bucket "b_3487"

[ns_server:debug,2021-08-19T11:51:57.149-07:00,n_2@127.0.0.1:kv<0.248.0>:collections:do_update_with_manifest:326]Perform operation {create_collection,"s1","c_0",[]} on manifest 34 of bucket "b_3487"

...

[ns_server:debug,2021-08-19T11:51:57.149-07:00,n_2@127.0.0.1:kv<0.248.0>:collections:perform_operations:367]Operation {create_collection,"s1","c_0",[]} failed with error {collection_already_exists,

                                                               "s1","c_0"}

You can see it's operating on manifest 34.

However, collection c_0 was dropped from the manifest a 90 milliseconds earlier on n_1. (Note that these nodes are all running on the same machine so the timestamps are pretty comparable.)

[ns_server:debug,2021-08-19T11:51:57.059-07:00,n_1@127.0.0.1:kv<0.249.0>:collections:do_update_with_manifest:326]Perform operation {drop_collection,"s1","c_0"} on manifest 34 of bucket "b_3487"

...

[ns_server:debug,2021-08-19T11:51:57.141-07:00,n_1@127.0.0.1:ns_audit<0.619.0>:ns_audit:handle_call:148]Audit drop_collection: [{local,{[{ip,<<"127.0.0.1">>},{port,9001}]}},

			{remote,{[{ip,<<"127.0.0.1">>},{port,52842}]}},

			{real_userid,{[{domain,builtin},

                                       {user,<<"<ud>Administrator</ud>">>}]}},

			{timestamp,<<"2021-08-19T11:51:57.141-07:00">>},

			{new_manifest_uid,<<"23">>},

			{collection_name,<<"c_0">>},

			{scope_name,<<"s1">>},

			{bucket_name,<<"b_3487">>}]

This was also against manifest 34 and it clearly succeeded.

This is a 3 node cluster so collection manifest updates only need to reach 2 nodes before the change is considered committed, which means it's possible the third node hasn't received the updates before another manifest update arrives.

This wouldn't have happened if the client I used (the Java client) sent the collection changes to the same nodes every time. However, on occasion the client will need to switch servers for these kinds of requests due to failover etc, so I don't believe it's a principled fix to this issue to change the client to always target the same server node.

My current view that the way to address this behavior is to do a quorum read on the manifest before performing checks as I think this is quite a bit nicer for users and changes to the manifest should in general be infrequent enough that we can afford the quorum read.

Alternatively we could add some kind of read-consistency option to the collection / scope management REST APIs. Though even in this case, I think the default should be to quorum read.

I am interested in people's opinions on this topic.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Hide
n_0.zip
19/Aug/21 12:27 PM
10.32 MB
Dave Finlay
Extracting archive...
Show
n_0.zip
19/Aug/21 12:27 PM
10.32 MB
Dave Finlay
Hide
n_2.zip
19/Aug/21 12:27 PM
9.88 MB
Dave Finlay
Extracting archive...
Show
n_2.zip
19/Aug/21 12:27 PM
9.88 MB
Dave Finlay
Hide
n_1.zip
19/Aug/21 12:27 PM
6.75 MB
Dave Finlay
Extracting archive...
Show
n_1.zip
19/Aug/21 12:27 PM
6.75 MB
Dave Finlay

Issue Links

relates to

MB-46643 Couchbase fails to create collection index right after the creation of a collection

Reopened

Activity

People

Assignee:: Dave Finlay

Reporter:: Dave Finlay

Votes:: 0 Vote for this issue

Watchers:: 12 Start watching this issue

Dates

Created:: 19/Aug/21 12:26 PM

Updated:: 18/Mar/22 10:30 AM

Resolved:: 24/Nov/21 7:25 AM

ns_server should do a quorum read on collection manifests before performing checks

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

PagerDuty