Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
4.5.0
-
Untriaged
-
Unknown
Description
2 1/2 days in test, I ran out of disk space from cores being generated on couchdb being killed by OOM. I'm not sure why OOM is kicking in because the test has cleanup phase to prevent hard-out-of-mem happening on the bucket.
I expected couchdb to comeback if killed this way but it continues to be killed after restart...
[ns_server:info,2016-05-29T04:36:20.836-07:00,ns_1@172.23.105.61:ns_couchdb_port<0.22870.351>:ns_port_server:log:210]ns_couchdb<0.22870.351>: Apache CouchDB (LogLevel=info) is starting.
|
[ns_server:info,2016-05-29T04:36:21.175-07:00,ns_1@172.23.105.61:ns_couchdb_port<0.22870.351>:ns_port_server:log:210]ns_couchdb<0.22870.351>: Apache CouchDB has started. Time to relax.
|
[ns_server:info,2016-05-29T04:38:18.170-07:00,ns_1@172.23.105.61:ns_couchdb_port<0.28610.351>:ns_port_server:log:210]ns_couchdb<0.28610.351>: Apache CouchDB (LogLevel=info) is starting.
|
[ns_server:info,2016-05-29T04:38:18.516-07:00,ns_1@172.23.105.61:ns_couchdb_port<0.28610.351>:ns_port_server:log:210]ns_couchdb<0.28610.351>: Apache CouchDB has started. Time to relax.
|
[ns_server:info,2016-05-29T04:40:01.897-07:00,ns_1@172.23.105.61:ns_couchdb_port<0.1110.352>:ns_port_server:log:210]ns_couchdb<0.1110.352>: Apache CouchDB (LogLevel=info) is starting.
|
ns_couchdb<0.1110.352>: Apache CouchDB has started. Time to relax.
|
Tracing the series of events, ns_server got error badrpc, nodedown from couchdb
[ns_server:error,2016-05-29T04:29:58.801-07:00,ns_1@172.23.105.61:<0.6395.342>:menelaus_web:loop:189]Server error during processing: ["web request failed",
|
{path,
|
"/pools/default/buckets/WAREHOUSE/ddocs"},
|
{method,'GET'},
|
{type,exit},
|
{what,{error,{badrpc,nodedown}}},
|
{trace,
|
[{ns_couchdb_api,rpc_couchdb_node,4,
|
[{file,"src/ns_couchdb_api.erl"},
|
{line,162}]},
|
{capi_utils,full_live_ddocs,3,
|
[{file,"src/capi_utils.erl"},{line,172}]},
|
...
|
I couldn't find a reason why couchdb was down in logs other than this error printed just prior
[couchdb:error,2016-05-29T04:27:34.465-07:00,couchdb_ns_1@127.0.0.1:<0.335.0>:couch_log:error:44]Cleanup process <0.22776.195> for set view `ORDER_LINE`, replica (prod) group `_design/all`, died with reason: stopped
|
Here couchdb is restarted after eheap_alloc error
[ns_server:info,2016-05-29T04:35:47.762-07:00,ns_1@172.23.105.61:ns_couchdb_port<0.8218.32>:ns_port_server:log:210]ns_couchdb<0.8218.32>:
|
ns_couchdb<0.8218.32>: Crash dump was written to: erl_crash.dump.1464283581.8932.ns_couchdb
|
ns_couchdb<0.8218.32>: eheap_alloc: Cannot allocate 8162366936 bytes of memory (of type "old_heap").
|
|
[ns_server:error,2016-05-29T04:36:18.098-07:00,ns_1@172.23.105.61:wait_link_to_couchdb_node<0.22631.351>:ns_server_nodes_sup:do_wait_link_to_couchdb_node:163]ns_couchdb
|
_port(<0.8218.32>) died with reason {abnormal,134}
|
[ns_server:info,2016-05-29T04:36:20.836-07:00,ns_1@172.23.105.61:ns_couchdb_port<0.22870.351>:ns_port_server:log:210]ns_couchdb<0.22870.351>: Apache CouchDB (LogLevel=info) is starting.
|
This happens a few times and then Couchdb is then killed by oom (core attached)
...
|
May 29 04:43:30 kvm-s63705 kernel: [5679446.330644] [ 1585] 1000 1585 7632490 5334196 10554 0 0 beam.smp
|
May 29 04:43:30 kvm-s63705 kernel: [5679446.330646] [ 1622] 1000 1622 1462 147 8 0 0 goport
|
May 29 04:43:30 kvm-s63705 kernel: [5679446.330647] [ 1627] 1000 1627 109769 21401 69 0 0 goxdcr
|
May 29 04:43:30 kvm-s63705 kernel: [5679446.330649] [ 1636] 1000 1636 1113 175 7 0 0 sh
|
May 29 04:43:30 kvm-s63705 kernel: [5679446.330650] [ 1638] 1000 1638 1084 351 8 0 0 memsup
|
May 29 04:43:30 kvm-s63705 kernel: [5679446.330652] [ 1639] 1000 1639 1084 183 8 0 0 cpu_sup
|
May 29 04:43:30 kvm-s63705 kernel: [5679446.330653] [ 1645] 1000 1645 2516 1210 10 0 0 godu
|
May 29 04:43:30 kvm-s63705 kernel: [5679446.330654] [ 1646] 1000 1646 1112 166 7 0 0 sh
|
May 29 04:43:30 kvm-s63705 kernel: [5679446.330656] [ 1647] 1000 1647 1330 105 8 0 0 godu
|
May 29 04:43:30 kvm-s63705 kernel: [5679446.330657] [ 1659] 1000 1659 43359 2573 37 0 0 moxi
|
May 29 04:43:30 kvm-s63705 kernel: [5679446.330659] [ 1660] 1000 1660 2155 381 10 0 0 sigar_port
|
May 29 04:43:30 kvm-s63705 kernel: [5679446.330660] [ 1661] 1000 1661 1867 220 9 0 0 inet_gethost
|
May 29 04:43:30 kvm-s63705 kernel: [5679446.330662] [ 1662] 1000 1662 2391 393 10 0 0 inet_gethost
|
May 29 04:43:30 kvm-s63705 kernel: [5679446.330663] Out of memory: Kill process 1585 (beam.smp) score 692 or sacrifice child
|
At this time test was rebalancing in 2 nodes
ok 261 - [2016-05-29T04:19:06-07:00, 1419c28:57f2a1] server-add -c 172.23.106.14 --server-add 172.23.105.83 -u Administrator -p password --server-add-username Administrator --server-add-password password
|
ok 262 - [2016-05-29T04:19:12-07:00, 1419c28:22ba5c] server-add -c 172.23.106.14 --server-add 172.23.105.63 -u Administrator -p password --server-add-username Administrator --server-add-password password
|
*not ok* 263 - [2016-05-29T04:29:31-07:00, 1419c28:c4557f] rebalance -c 172.23.106.14 -u Administrator -p password
|
With cores enabled I ran out of disk space and from beam creating 15GB cores.
Attachments
Issue Links
- duplicates
-
MB-19221 Duplicated partition versions
- Resolved