Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 4.6.0
Affects Version/s: 3.1.6, 4.0.0, 4.1.2, 4.5.0, 4.5.1
Component/s: couchbase-bucket
Labels:
None

Triage:
Untriaged
Is this a Regression?:
Unknown

Description

1. Apply the attached patch to ep-engine (scheduling may be arbitrary, so I can introduce arbitrary delays in the code and it still should behave correctly).

2. Create a bucket upload some data, restart the server couple of times to get some failover history.

$ ~/dev/membase/repo-watson/install/bin/cbstats 127.0.0.1:12000 failovers 7

 vb_7:0:id:        215814349521305

 vb_7:0:seq:       160

 vb_7:1:id:        155329856619529

 vb_7:1:seq:       160

 vb_7:2:id:        91747243693536

 vb_7:2:seq:       160

 vb_7:3:id:        128962012805783

 vb_7:3:seq:       160

 vb_7:4:id:        275970465686046

 vb_7:4:seq:       160

 vb_7:5:id:        241099843010628

 vb_7:5:seq:       160

 vb_7:6:id:        50511930730683

 vb_7:6:seq:       160

 vb_7:7:id:        280675982653774

 vb_7:7:seq:       0

 vb_7:num_entries: 8

3. Run the following via /diag/eval.

V = 7, ns_memcached:set_vbucket("default", V, dead),  ns_memcached:sync_delete_vbucket("default", V).

This is what happens during bucket flush for all vbuckets.

4. Let ns_server recreate the vbucket, wait for 10 second sleep to expire.

You can watch the local docs to see when it happens.

$ ../install/bin/couch_dbdump --local --json data/n_0/data/default/7.couch.1

Dumping "data/n_0/data/default/7.couch.1":

{"id":"_local/vbstate",

 "value":"{\"state\": \"active\",

           \"checkpoint_id\": \"0\",

           \"max_deleted_seqno\": \"0\",

           \"failover_table\": [{\"id\":199363149644834,\"seq\":0}],

           \"snap_start\": \"0\",\"snap_end\": \"0\",

           \"max_cas\": \"0\",

           \"drift_counter\": \"-140737488355328\"}"}

Total docs: 1

And then vbucket goes back to dead state.

$ ../install/bin/couch_dbdump --local --json data/n_0/data/default/7.couch.1

Dumping "data/n_0/data/default/7.couch.1":

{"id":"_local/vbstate",

 "value":"{\"state\": \"dead\",

          \"checkpoint_id\": \"0\",

          \"max_deleted_seqno\": \"0\",

          \"failover_table\": [{\"id\":215814349521305,\"seq\":160},

                               {\"id\":155329856619529,\"seq\":160},

                               {\"id\":91747243693536,\"seq\":160},

                               {\"id\":128962012805783,\"seq\":160},

                               {\"id\":275970465686046,\"seq\":160},

                               {\"id\":241099843010628,\"seq\":160},

                               {\"id\":50511930730683,\"seq\":160},

                               {\"id\":280675982653774,\"seq\":0}],

          \"snap_start\": \"160\",\"snap_end\": \"160\",

          \"max_cas\": \"1473396979655770112\",

          \"drift_counter\": \"-140737488355328\"}"}

Total docs: 1

5. At this point stats show fresh failover history. Restart the server, observe that failover history from the deleted vbucket resurrects.

$ ~/dev/membase/repo-watson/install/bin/cbstats 127.0.0.1:12000 failovers 7

 vb_7:0:id:        215814349521305

 vb_7:0:seq:       160

 vb_7:1:id:        155329856619529

 vb_7:1:seq:       160

 vb_7:2:id:        91747243693536

 vb_7:2:seq:       160

 vb_7:3:id:        128962012805783

 vb_7:3:seq:       160

 vb_7:4:id:        275970465686046

 vb_7:4:seq:       160

 vb_7:5:id:        241099843010628

 vb_7:5:seq:       160

 vb_7:6:id:        50511930730683

 vb_7:6:seq:       160

 vb_7:7:id:        280675982653774

 vb_7:7:seq:       0

 vb_7:num_entries: 8

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

0001-race.patch
0.9 kB
08/Sep/16 10:45 PM

Issue Links

relates to

MB-20822 Erase diverged branch correctly from Failover table

Closed

MB-21650 Changes for MB-20852 (set_vbucket is raceful) caused ~30% drop in KV throughput

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Dave Rigby (Inactive)

Reporter:: Aliaksey Artamonau (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 13 Start watching this issue

Dates

Created:: 08/Sep/16 10:53 PM

Updated:: 21/Jun/17 6:15 AM

Resolved:: 01/Nov/16 2:45 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 29 closed Gerrit changes

Hide There are 29 closed Gerrit changes

MB-20852: Remove unused 'force' parameter from scheduleVBStatePersist(): Gerrit Review:

MB-20852 [11/N]: Move persistenceCheckpoint id to VBucket: Gerrit Review:

MB-20852: ep_test_apis: report final value in wait_for_stat...: Gerrit Review:

MB-20852: ep_unit_tests_main: Show DEBUG logs with -v: Gerrit Review:

MB-20852 [12/N]: Add VBucket::getVBucketState method, use vector for VBuckets in Map: Gerrit Review:

MB-20852 [6/N]: Simplify {start,stop}Flusher & flushVBucket, move to C++11: Gerrit Review:

MB-20852: Serialize VB state changes: Gerrit Review:

Merge remote-tracking branch 'couchbase/watson': Gerrit Review:

Merge remote-tracking branch 'couchbase/watson': Gerrit Review:

MB-20852 [2/N]: Convert queue_operation to scoped enum: Gerrit Review:

MB-20852 [9/N]: Explicilty handle all queue_op uses: Gerrit Review:

MB-20852 [10/N]: Return by value from VBucket::getPersistedSnapshot: Gerrit Review:

MB-20852 [1/N]: Update tests to facilitate set_vbucket_state changes: Gerrit Review:

MB-20852 [3/N]: checkpoint_test enhancements: Gerrit Review:

MB-20852 [4/N]: Use named struct when moving cursors between checkpoints: Gerrit Review:

MB-20852 [5/N]: Checkpoint: C++11-ification: Gerrit Review:

MB-20852 [7/N]: CheckpointManager::queueDirty: Pass vb by reference: Gerrit Review:

MB-20852 [8/N]: Improve documentation of putCursorsInCollapsedChk: Gerrit Review:

MB-20852 [13/N]: Checkpoint: Add getNumMetaItems() method: Gerrit Review:

MB-20852 [13/N]: Improve debug printing of CheckpointManager objects: Gerrit Review:

MB-20852 [15/N]: Accurately track meta items within checkpoints: Gerrit Review:

MB-20852 [14/N]: Improve debug/logging in CheckpointManager: Gerrit Review:

MB-20852 [16/N]: Add queue_op::set_vbucket_state meta-item: Gerrit Review:

MB-20852 [17/N]: Serialize VB state changes: Gerrit Review:

MB-20852 [18/N]: Remove now dead VBucket persist Tasks: Gerrit Review:

Merge remote-tracking branch 'couchbase/watson': Gerrit Review:

MB-21650: Prevent false sharing of frequently modified memory stats: Gerrit Review:

MB-21925: Fix queue_fill stat when persitenceCursor on queue_op::empty: Gerrit Review:

MB-25060: Remove bias of 1 from getNumOpenChkItems: Gerrit Review:

set_vbucket is raceful with other set_vbucket invocations and with sync vbucket deletions (and likely with lots of other stuff)

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty