Description
build 1409
used nodes:10.5.2.11
10.5.2.13
10.5.2.14
10.5.2.15
long-running tests with the steps( not so important)
1.one node 10.5.2.11
2.upload using DocumentGenerator rebalance.rebalancein.RebalanceInTests.incremental_rebalance_in_with_queries,blob_generator=False,items=10000000
3. during uploading data rebalance in 10.5.2.13, 10.5.2.15 incremental
4.rebalance out 10.5.2.15 and stop rebalance on progress ~ 40% - data is still loading
5.restart rebalance after 5 min
6.rebalance in 2 nodes: 10.5.2.14 & 10.5.2.15
7.stop loading data on 5336207 docs( my host hanged)
8.continue loading data 6000000-10000000 keys ( 9336207 - total)
9. create 5 views in ddoc
10. run docs ops about 2 hours( update, get via tests scripts)
11. rebalance out 10.5.2.15
result:rebalance is stuck with progress
{"status":"running","ns_1@10.5.2.11":
,"ns_1@10.5.2.13":
{"progress":0.5490196078431373},"ns_1@10.5.2.14":
{"progress":0.08949416342412453},"ns_1@10.5.2.15":{"progress":0.31640625}}
memory is used almost fully
top - 14:32:04 up 24 days, 5:14, 1 user, load average: 3.38, 3.38, 3.00
Tasks: 123 total, 1 running, 122 sleeping, 0 stopped, 0 zombie
Cpu(s): 19.9%us, 0.7%sy, 0.2%ni, 33.6%id, 45.2%wa, 0.0%hi, 0.5%si, 0.0%st
Mem: 2058744k total, 2047312k used, 11432k free, 4488k buffers
Swap: 5996536k total, 11260k used, 5985276k free, 425112k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8033 couchbas 15 0 1078m 902m 3164 S 34.3 44.9 84:31.12 memcached
7999 couchbas 25 0 1500m 554m 8948 S 7.0 27.6 351:07.27 beam.smp
7722 root 34 19 282m 25m 8168 D 0.3 1.3 0:01.61 yum-updatesd-he
7796 jenkins 15 0 12760 1112 828 R 0.3 0.1 0:00.39 top
1 root 15 0 10368 684 572 S 0.0 0.0 0:00.79 init
2 root RT -5 0 0 0 S 0.0 0.0 0:02.93 migration/0
rebalance hanged almost 1 hour and then it's failed:
Server error during processing: ["web request failed",
{path,"/pools/default"},
{type,exit},
{what,
{timeout,
}},
{trace,
[
,
{menelaus_web,build_nodes_info_fun,3},
{menelaus_web,build_pool_info,4},
{menelaus_web,handle_pool_info_wait,6},
{menelaus_web,check_and_handle_pool_info,2},
{menelaus_web,loop,3},
{mochiweb_http,headers,5},
{proc_lib,init_p_do_apply,3}]}]
Haven't heard from a higher priority node or a master, so I'm taking over.
see also Web Console log
possible reason is memory leak as mentioned in http://www.couchbase.com/issues/browse/MB-5806