Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Won't Fix
Priority: Major
Fix Version/s: 5.0.0
Affects Version/s: 5.0.0
Component/s: ns_server
Labels:
Environment:
Centos 7

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
https://s3.amazonaws.com/bugdb/jira/MB-24986/MB-24986/172.23.98.79-20170621-1630-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-24986/MB-24986/172.23.98.79-diag.txt.gz
https://s3.amazonaws.com/bugdb/jira/MB-24986/MB-24986/172.23.98.80-20170621-1632-diag.zip (Memcached failure is introduced on this node)
https://s3.amazonaws.com/bugdb/jira/MB-24986/MB-24986/172.23.98.80-diag.txt.gz
https://s3.amazonaws.com/bugdb/jira/MB-24986/MB-24986/172.23.98.81-20170621-1633-diag.zip
https://s3.amazonaws.com/bugdb/jira/MB-24986/MB-24986/172.23.98.81-diag.txt.gz

Show
https://s3.amazonaws.com/bugdb/jira/MB-24986/MB-24986/172.23.98.79-20170621-1630-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-24986/MB-24986/172.23.98.79-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-24986/MB-24986/172.23.98.80-20170621-1632-diag.zip (Memcached failure is introduced on this node) https://s3.amazonaws.com/bugdb/jira/MB-24986/MB-24986/172.23.98.80-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-24986/MB-24986/172.23.98.81-20170621-1633-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-24986/MB-24986/172.23.98.81-diag.txt.gz
Is this a Regression?:
Yes

Description

1. Create a cluster with 3 nodes and atleast 1 bucket in the cluster
2. Enable autofailover and set the timeout to 5 secs.
3. On any of the node, stop the memcached process (the tests do it by sending kill SIGSTOP signal to the memcached process). Note the time when the failure was injected.
4. Wait for the autofailover of the node to be completed. Note the time when autofailover was initiated.
We expect the failover to be initiated within 8 secs (5 sec is ideal but we give 3 sec buffer to the initiation). But the failover is initiated after around 9-10 secs.
This is a regression as compared to last week's build. The tests for memcached failures were passing till last weeks build (5.0.0-3088) but are failing due to autofailover being initated after the expected time.
The tests can be found here. http://qa.sc.couchbase.com/view/nserver/job/cen006-nserv-autofailover-memcached/35/consoleFull
Test_1, test_3, test_10, test_12, test_13 all failed due to this issue.
The issue can be reproduced by running the following test
./testrunner -i <ini file> -t failover.AutoFailoverTests.AutoFailoverTests.test_autofailover,timeout=5,num_node_failures=1,failover_action=stop_memcached,nodes_init=3

Attaching the logs from the run mentioned above for test_1 for the cluster.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Bharath G P

Reporter:: Bharath G P

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 22/Jun/17 3:02 AM

Updated:: 05/Jul/17 1:44 AM

Resolved:: 05/Jul/17 1:44 AM

Gerrit Reviews

There are no open Gerrit changes

Autofailover of node is taking more than 8 secs when failure type is due to memcached failure.

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty