Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 2.5.0
Affects Version/s: 2.0, 2.0.1
Component/s: ns_server
Security Level: Public
Labels:
- ns_server-story

Is this a Regression?:
Yes
Sprint:
02/Sep/2013 - 20/Sep/2013

Description

SUBJ.

This happens because janitor_agent can be stuck waiting for:

*) tap connections "ping" (which we do in order to discover and clean up dead connections)

*) stuck vbucket filter change request (which is sent to "other" side, i.e. non-local memcached)

And corresponding ebucketmigrator can be stuck there too.

So unresponsiveness of 1 node can cause this critical component of all other nodes to be stuck. We cannot activate any vbuckets without stopping replication into them. And that requires:

*) janitor agent not be stuck

*) corresponding ebucketmigrators not being stuck

I've re-visited this problem just now and ideally fix will be made with support from ep-engine side which could be done as part of UPR work.

Without ep-engine support that will require significant changes in ns_server which are harder to do right now particularly due to 1.8.x backwards compatibility support. That would be doable but would take at least several days of work.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-8039
#	Subject	Branch	Project	Status	CR	V
29051,3	MB-8039: don't ping tap connections during janitor runs	master	ns_server	Status: MERGED	+2	+1

Activity

People

Assignee:: Andrei Baranouski

Reporter:: Aleksey Kondratenko (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 08/Apr/13 5:55 PM

Updated:: 17/Aug/15 6:43 AM

Resolved:: 19/Sep/13 12:42 PM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

MB-8039: don't ping tap connections during janitor runs: Gerrit Review:

failover is not quick when any node (including being failed over) is not responding

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty