Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Test Blocker
Fix Version/s: 3.0
Affects Version/s: 3.0
Component/s: couchbase-bucket
Security Level: Public
Labels:
None
Environment:
CentOS 6.x 8*8 clusters 2 uni-xdcrs
Each node 15GB RAM, 4cores

Triage:
Untriaged
Is this a Regression?:
Unknown
Sprint:
June 30 - July 18

Description

Build
--------
3.0.0-786 (xdcr on upr, internal replication on upr)

Clusters
-----------
Source : http://172.23.105.44:8091/
Destination : http://172.23.105.54:8091/
The clusters are available to investigate. No urgency to reclaim. Pls let me know if you need me to collect logs.

Steps
--------
1. Load on both clusters till vb_active_resident_items_ratio < 30.
2. Access phase with 98% gets, 2%sets runs for 3 hours
3. Rebalance-out 1 node at cluster1 with workload [high dgm ~4%]

Every attempt to rebalance out one node fails. The last one left 3 nodes in pending state.

First rebalance-out failed with error:
-----------------------------------------------
Many messages like -

Control connection to memcached on 'ns_1@172.23.105.49' disconnected: {{badmatch,
{error,
timeout}},
[{mc_client_binary,
stats_recv,
4,
[

{file, "src/mc_client_binary.erl"}, {line, 163}]},
{mc_client_binary,
stats,
4,
[{file, "src/mc_client_binary.erl"}

{line, 411}

]},
{ns_memcached,
handle_info,
2,
[

{file, "src/ns_memcached.erl"}, {line, 725}]},
{gen_server,
handle_msg,
5,
[{file, "gen_server.erl"}, {line, 604}]},
{ns_memcached,
init,
1,
[{file, "src/ns_memcached.erl"}

{line, 170}

]},
{gen_server,
init_it,
6,
[

{file, "gen_server.erl"}

{line, 304}

]},
{proc_lib,
init_p_do_apply,
3,
[

{file, "proc_lib.erl"}

{line, 239}

]}]}

Subsequent rebalance-out attempts
-------------------------------------------------
timeout}} ns_memcached000 ns_1@172.23.105.52 14:20:19 - Fri Jun 6, 2014
Control connection to memcached on 'ns_1@172.23.105.48' disconnected: {badmatch,
{error,
timeout}} ns_memcached000 ns_1@172.23.105.48 14:20:19 - Fri Jun 6, 2014
Control connection to memcached on 'ns_1@172.23.105.45' disconnected: {badmatch,
{error,
timeout}} ns_memcached000 ns_1@172.23.105.45 14:20:19 - Fri Jun 6, 2014
Rebalance exited with reason

{not_all_nodes_are_ready_yet, ['ns_1@172.23.105.50']}
ns_orchestrator002 ns_1@172.23.105.44 14:17:19 - Fri Jun 6, 2014
Bucket "saslbucket" loaded on node 'ns_1@172.23.105.52' in 0 seconds. ns_memcached000 ns_1@172.23.105.52 14:16:32 - Fri Jun 6, 2014
Bucket "saslbucket" loaded on node 'ns_1@172.23.105.45' in 0 seconds. ns_memcached000 ns_1@172.23.105.45 14:16:32 - Fri Jun 6, 2014
Control connection to memcached on 'ns_1@172.23.105.45' disconnected: {badmatch,
{error,
timeout}} ns_memcached000 ns_1@172.23.105.45 14:16:32 - Fri Jun 6, 2014
Control connection to memcached on 'ns_1@172.23.105.52' disconnected: {badmatch,
{error,
timeout}} ns_memcached000 ns_1@172.23.105.52 14:16:32 - Fri Jun 6, 2014
Started rebalancing bucket standardbucket1
Starting rebalance, KeepNodes = ['ns_1@172.23.105.44','ns_1@172.23.105.45',
'ns_1@172.23.105.48','ns_1@172.23.105.49',
'ns_1@172.23.105.50','ns_1@172.23.105.51',
'ns_1@172.23.105.52'], EjectNodes = ['ns_1@172.23.105.47'], Failed over and being ejected nodes = []; no delta recovery nodes
Rebalance exited with reason {not_all_nodes_are_ready_yet, ['ns_1@172.23.105.50']}

Started rebalancing bucket standardbucket1
Starting rebalance, KeepNodes = ['ns_1@172.23.105.44','ns_1@172.23.105.45',
'ns_1@172.23.105.48','ns_1@172.23.105.49',
'ns_1@172.23.105.50','ns_1@172.23.105.51',
'ns_1@172.23.105.52'], EjectNodes = ['ns_1@172.23.105.47'], Failed over and being ejected nodes = []; no delta recovery nodes

Pls feel free to close if another similar issue is still open.

Attachments

Issue Links

relates to

MB-11351 ns_server's ns_heart and janitor_agent may get totally stuck if some upr stuff inside ep-engine gets stuck

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Aruna Piravi (Inactive)

Reporter:: Aruna Piravi (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 06/Jun/14 3:20 PM

Updated:: 16/Jun/14 1:29 PM

Resolved:: 16/Jun/14 1:29 PM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

MB-11349 report compaction error to memcached: Gerrit Review:

KV+XDCR System test : Compaction is failing constantly, but reporting success

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty