Details
-
Bug
-
Resolution: Incomplete
-
Major
-
2.0
-
Security Level: Public
-
build 1808
Description
Running 300 queries/sec against 2 buckets in parallel during rebalance. Rebalance fails and the following is in logs along with Mnesia core dumps at time of crash.
=========================CRASH REPORT=========================
crasher:
initial call: mochiweb_acceptor:init/3
pid: <0.32223.614>
registered_name: []
exception exit:
in function mochiweb_acceptor:init/3
ancestors: [couch_httpd,couch_secondary_services,couch_server_sup,
cb_couch_sup,ns_server_cluster_sup,<0.59.0>]
messages: []
links: [<0.6398.0>]
dictionary: []
trap_exit: false
status: running
heap_size: 377
stack_size: 24
reductions: 218
neighbours:
[couchdb:error,2012-10-09T18:29:03.298,ns_1@10.6.2.68:<0.32453.614>:couch_log:error:42]Set view `default`, main group `_design/d1`, doc loader error
error: {case_clause,{error,emfile}}
stacktrace: [
{couch_set_view_updater,'-load_changes/7-fun-2-',6},
{lists,foldl,3},
{couch_set_view_updater,load_changes,7},
{couch_set_view_updater,'-update/7-fun-2-',10}]
[couchdb:error,2012-10-09T18:29:03.299,ns_1@10.6.2.68:<0.32498.614>:couch_log:error:42]Set view `saslbucket`, main group `_design/d11`, doc loader error
error: {case_clause,{error,emfile}}
stacktrace: [{couch_db,fast_reads,2}
,
,
,
,
]
[couchdb:error,2012-10-09T18:29:03.303,ns_1@10.6.2.68:<0.6803.0>:couch_log:error:42]Set view `default`, main group `_design/d1`, received error from updater: {case_clause,
{error,
emfile}}
.....
Atop at time of crash beam is at 5.0G (attached)
atop -m -r /var/log/atop/atop_20121009 -b 18:30 -e 18:30
ATOP - pine-11803 2012/10/09 18:30:02 ------ 10m0s elapsed
PRC | sys 12m08s | user 17m59s | #proc 136 | #zombie 0 | #exit 378 |
CPU | sys 81% | user 148% | irq 20% | idle 108% | wait 7% |
cpu | sys 23% | user 28% | irq 17% | idle 20% | cpu000 w 2% |
cpu | sys 20% | user 40% | irq 1% | idle 29% | cpu002 w 2% |
cpu | sys 20% | user 40% | irq 1% | idle 29% | cpu001 w 2% |
cpu | sys 20% | user 40% | irq 1% | idle 30% | cpu003 w 2% |
CPL | avg1 1.79 | avg5 2.06 | avg15 2.56 | csw 29981103 | intr 13722e3 |
MEM | tot 31.0G | free 152.2M | cache 6.3G | buff 132.7M | slab 614.4M |
SWP | tot 2.0G | free 2.0G | | vmcom 25.2G | vmlim 17.5G |
PAG | scan 1469e3 | stall 0 | | swin 0 | swout 0 |
LVM | Group02-Data | busy 15% | read 18653 | write 393987 | avio 0.27 ms |
LVM | roup01-Index | busy 1% | read 3048 | write 34554 | avio 0.10 ms |
LVM | roup-lv_root | busy 0% | read 281 | write 59417 | avio 0.05 ms |
DSK | xvdc | busy 15% | read 18653 | write 390125 | avio 0.27 ms |
DSK | xvdb | busy 1% | read 3048 | write 5084 | avio 0.47 ms |
DSK | xvda | busy 0% | read 281 | write 11485 | avio 0.25 ms |
NET | transport | tcpi 4961710 | tcpo 5569073 | udpi 0 | udpo 0 |
NET | network | ipi 4961710 | ipo 5687386 | ipfrw 0 | deliv 4962e3 |
NET | eth0 ---- | pcki 4397953 | pcko 5122937 | si 28 Mbps | so 52 Mbps |
NET | lo ---- | pcki 564448 | pcko 564448 | si 5227 Kbps | so 5227 Kbps |
PID MINFLT MAJFLT VSTEXT VSIZE RSIZE VGROW RGROW MEM CMD
21836 1405 5 132K 18.3G 18.0G 4112K 4692K 58% memcached
21800 3124e3 12 1876K 8.2G 5.4G 5.0G 4.8G 17% beam.smp
21835 399 2 423K 412.2M 6264K 0K 36K 0% moxi
25706 608 0 148K 17720K 5440K 0K 0K 0% atop
1253 0 0 139K 12268K 2380K 0K 0K 0% udevd
1255 0 0 139K 12268K 2372K 0K 0K 0% udevd
9447 0 0 216K 78704K 1816K 0K 0K 0% pickup
976 553 0 845K 11560K 1732K 0K 0K 0% xe-daemon
1210 0 0 141K 78624K 1724K 0K 0K 0% master
1219 0 0 286K 78876K 1688K 0K 0K 0% qmgr
21827 214 0 845K 103.6M 1328K 0K 4K 0% sh
955 2 1 307K 249.1M 1288K 0K 12K 0% rsyslogd
1232 56 0 52K 114.4M 1288K 0K 0K 0% crond
21831 0 0 53K 38968K 1132K 0K 0K 0% ssl_esock
370 0 0 139K 11084K 1032K 0K 0K 0% udevd
1 0 0 131K 19216K 980K 0K 0K 0% init
1130 0 0 503K 64024K 968K 0K 0K 0% sshd
939 0 0 160K 27632K 768K 0K 0K 0% auditd
887 0 0 543K 9112K 756K 0K 0K 0% dhclient
1246 0 0 15K 4068K 616K 0K 0K 0% agetty
21846 0 0 2K 6260K 576K 0K 0K 0% sigar_port
1249 0 0 11K 4056K 572K 0K 0K 0% mingetty
Doing netstat I see other services still up except web acceptor on port 8091:
[root@pine-11803 couchbase]# netstat -t -l
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 :21100 *: LISTEN
tcp 0 0 :epmd *: LISTEN
tcp 0 0 :ssh *: LISTEN
tcp 0 0 localhost:smtp : LISTEN
tcp 0 0 :8092 *: LISTEN
tcp 0 0 localhost:44318 : LISTEN
tcp 0 0 :ssh *: LISTEN
tcp 0 0 localhost:smtp : LISTEN