Details
Description
Setup
1. Setup a 2 node cluster, with 2 buckets-bucket1, bucket2
2. Load 25M items on each bucket. Enable auto-failover 30s.
3. Cluster is in dgm, 70 percent active resident ratio on each bucket.
4. Continue loading/mutating data to create high fragmentation
Error/Output
1. Seeing Server-errors - Web Request failed on the web-logs
2. This was followed by an attempt to auto-failover one node-95( non-master). But it could not since the cluster was too small!
3. Non-master node 95 was made the master node
4. Memcached connection on node 94 is lost.
5. Both the nodes are in pending state. Items from node 95 are lost, since it was auto-failed over.
Error messages from the web-logs
1.Server error during processing: ["web request failed",
,
{type,exit},
{what,
{timeout,
}},
{trace,
[
,
{ns_bucket,json_map_from_config,2},
{menelaus_web_buckets, '-handle_sasl_buckets_streaming/2-fun-1-', 3},
{lists,map,2},
{menelaus_web_buckets, '-handle_sasl_buckets_streaming/2-fun-2-', 2},
{menelaus_web,streaming_inner,3},
{menelaus_web,handle_streaming,4},
{menelaus_web,loop,3}]}]
2. Could not auto-failover node ('ns_1@10.3.2.95'). Cluster was too small, you need at least 2 other nodes.
3. Control connection to memcached on 'ns_1@10.3.2.94' disconnected: {{badmatch,
{error,
timeout}},
[
,
{mc_client_binary, select_bucket, 2},
{ns_memcached, ensure_bucket, 2},
{ns_memcached, handle_info, 2},
{gen_server, handle_msg, 5},
{proc_lib, init_p_do_apply, 3}]} ns_memcached004 ns_1@10.3.2.94 20:42:43 - Mon Jun 18, 2012
Attached is a screen-shot from the cluster.
Attached the logs from nodes, 94 and 95 - bug2.tar.
*Can access/ping both the nodes. I dont see any core-dumps on any node.
Live cluster can be found at http://10.3.2.94:8091/index.html#sec=analytics&statsBucket=/pools/default/buckets/bucket1
error_logger:error] [2012-06-18 20:38:29] [ns_1@10.3.2.94:error_logger:ale_error_logger_handler:log_report:72]
=========================CRASH REPORT=========================
crasher:
initial call: ns_heart:init/1
pid: <0.236.0>
registered_name: ns_heart
exception exit: {timeout,{gen_server,call,[disksup,get_disk_data,5000]}}
in function gen_server:terminate/6
ancestors: [ns_server_sup,ns_server_cluster_sup,<0.60.0>]
messages: [do_expensive_checks,beat,beat,beat,do_expensive_checks,beat,
beat,force_beat,force_beat]
links: [<0.196.0>,<0.237.0>,<0.57.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 121393
stack_size: 24
reductions: 432952171
neighbours:
[ns_server:debug] [2012-06-18 20:38:29] [ns_1@10.3.2.94:<0.21840.29>:menelaus_web:handle_streaming:673] Starting streaming for 10.1.3.103:35053 path /pools/bucket2/saslBucketsStreaming
[ns_server:debug] [2012-06-18 20:38:31] [ns_1@10.3.2.94:<0.267.0>:ns_pubsub:do_subscribe_link:132] Parent process exited with reason shutdown
[ns_server:debug] [2012-06-18 20:38:31] [ns_1@10.3.2.94:<0.263.0>:ns_pubsub:do_subscribe_link:132] Parent process exited with reason killed
[ns_doctor:debug] [2012-06-18 20:38:29] [ns_1@10.3.2.94:ns_doctor:ns_doctor:handle_info:93] Current node statuses:
[{'ns_1@10.3.2.94',
[{last_heard,{1340,76981,427276}},
,
,
{replication,[
,
{"bucket1",0.0}]},
{memory,
[
,
,
,
,
,
,
,
,
]},
{system_stats,
[
,
,
]},
{interesting_stats,
[
,
,
]},
,
{version,
[
,
,
,
,
,
,
,
]},
,
,
{memory_data,{26397753344,26260025344,
}},
{disk_data,
[
,
,
,
,
,
,
]},