Details
-
Bug
-
Resolution: Fixed
-
Blocker
-
4.0.0
-
Security Level: Public
-
centOS 6.x, 15Gb RAM, 4 cores
Description
Build
------
4.0.0-1867
Following up on MB-14457 (observed on 172.23.105.58 during system test)
goxdcr uses >2X memory(4GB) compared to memcached(2GB). Left with no swap space, Linux OOM killer first kills goxdcr and then goes after memcached. Since it causes memc to be killed, I'm marking this as blocker.
.58 is a VM with 15GB ram and 4 cores of CPU.
Attaching log from .58
In /var/log/messages:
Apr 16 01:33:44 guinep-s10505 kernel: Free swap = 0kB
Apr 16 01:33:44 guinep-s10505 kernel: Total swap = 10727416kB
Apr 16 01:33:44 guinep-s10505 kernel: 4019199 pages RAM
Apr 16 01:33:44 guinep-s10505 kernel: 123935 pages reserved
Apr 16 01:33:44 guinep-s10505 kernel: 14733 pages shared
Apr 16 01:33:44 guinep-s10505 kernel: 3867796 pages non-shared
Apr 16 01:33:44 guinep-s10505 kernel: [ 2899] 0 2899 4033 963 0 0 0 atop
Apr 16 01:33:44 guinep-s10505 kernel: [ 3760] 498 3760 314477 2356 3 0 0 beam.smp
Apr 16 01:33:44 guinep-s10505 kernel: [ 3799] 498 3799 506818 119852 3 0 0 beam.smp
Apr 16 01:33:44 guinep-s10505 kernel: [ 3839] 498 3839 26517 200 0 0 0 sh
Apr 16 01:33:44 guinep-s10505 kernel: [ 3840] 498 3840 1014 88 3 0 0 cpu_sup
Apr 16 01:33:44 guinep-s10505 kernel: [ 3841] 498 3841 1014 138 3 0 0 memsup
Apr 16 01:33:44 guinep-s10505 kernel: [ 3854] 498 3854 2698 104 2 0 0 inet_gethost
Apr 16 01:33:44 guinep-s10505 kernel: [ 3855] 498 3855 2698 82 0 0 0 inet_gethost
Apr 16 01:33:44 guinep-s10505 kernel: [ 3856] 498 3856 251734 6421 1 0 0 beam.smp
Apr 16 01:33:44 guinep-s10505 kernel: [ 3905] 498 3905 26518 200 3 0 0 sh
Apr 16 01:33:44 guinep-s10505 kernel: [ 3906] 498 3906 1015 135 0 0 0 memsup
Apr 16 01:33:44 guinep-s10505 kernel: [ 3907] 498 3907 1015 88 0 0 0 cpu_sup
Apr 16 01:33:44 guinep-s10505 kernel: [ 3910] 498 3910 1492 194 0 0 0 godu
Apr 16 01:33:44 guinep-s10505 kernel: [ 3911] 498 3911 26517 200 3 0 0 sh
Apr 16 01:33:44 guinep-s10505 kernel: [ 3912] 498 3912 1491 95 0 0 0 godu
Apr 16 01:33:44 guinep-s10505 kernel: [ 3920] 498 3920 50176 225 1 0 0 moxi
Apr 16 01:33:44 guinep-s10505 kernel: [ 3925] 498 3925 2098 166 0 0 0 sigar_port
Apr 16 01:33:44 guinep-s10505 kernel: [ 3929] 498 3929 1620 123 0 0 0 goport
Apr 16 01:33:44 guinep-s10505 kernel: [ 3930] 498 3930 46938 237 1 0 0 saslauthd-port
Apr 16 01:33:44 guinep-s10505 kernel: [ 3932] 498 3932 1620 124 1 0 0 goport
Apr 16 01:33:44 guinep-s10505 kernel: [ 3933] 498 3933 115706 577 2 0 0 moxi
Apr 16 01:33:44 guinep-s10505 kernel: [ 3934] 498 3934 115739 601 1 0 0 moxi
Apr 16 01:33:44 guinep-s10505 kernel: [ 3935] 498 3935 232491 568 3 0 0 beam.smp
Apr 16 01:33:44 guinep-s10505 kernel: [ 3940] 498 3940 4397871 2749351 3 0 0 goxdcr
Apr 16 01:33:44 guinep-s10505 kernel: [ 3942] 498 3942 109292 517 0 0 0 projector
Apr 16 01:33:44 guinep-s10505 kernel: [ 3993] 498 3993 2071616 836127 0 0 0 memcached
Apr 16 01:33:44 guinep-s10505 kernel: [ 4011] 498 4011 2699 104 3 0 0 inet_gethost
Apr 16 01:33:44 guinep-s10505 kernel: [ 4012] 498 4012 2699 82 3 0 0 inet_gethost
Apr 16 01:33:44 guinep-s10505 kernel: [16341] 498 16341 2698 78 2 0 0 inet_gethost
Apr 16 01:33:44 guinep-s10505 kernel: [20783] 0 20783 4034 964 0 0 0 atop
Apr 16 01:33:44 guinep-s10505 kernel: [24016] 89 24016 19688 401 0 0 0 pickup
Apr 16 01:33:44 guinep-s10505 kernel: [25028] 0 25028 1017 128 0 0 0 sleep
Apr 16 01:33:44 guinep-s10505 kernel: Out of memory: Kill process 3940 (goxdcr) score 646 or sacrifice child
Apr 16 01:33:44 guinep-s10505 kernel: Killed process 3940, UID 498, (goxdcr) total-vm:17591484kB, anon-rss:10996992kB, file-rss:412kB
:
:
Apr 16 01:33:44 guinep-s10505 kernel: Out of memory: Kill process 3993 (memcached) score 302 or sacrifice child
Apr 16 01:33:44 guinep-s10505 kernel: Killed process 3993, UID 498, (memcached) total-vm:8286464kB, anon-rss:3343632kB, file-rss:860kB
[user:info,2015-04-16T1:33:47.426,ns_1@172.23.105.58:<0.211.0>:ns_log:crash_consumption_loop:70]Port server goxdcr on node 'babysitter_of_ns_1@127.0.0.1' exited with status 1. Restarting. Messages: PipelineManager 2015-04-16T01:33:42.973-07:00 [INFO] Replication Status = map[3754c145e93cbcddccbda66d99e2163a/standardbucket/standardbucket:name=
{3754c145e93cbcddccbda66d99e2163a/standardbucket/standardbucket}, status=
{Pending}, errors={[
{"time":"2015-04-16T01:33:30.112118921-07:00","errMsg":"map[xmem_3754c145e93cbcddccbda66d99e2163a/standardbucket/standardbucket_172.23.105.48:11210_1:Xmem is stuck]"},
debug.log:[user:info,2015-04-16T1:33:45.863,ns_1@172.23.105.58:<0.211.0>:ns_log:crash_consumption_loop:70]Port server memcached on node 'babysitter_of_ns_1@127.0.0.1' exited with status 137. Restarting. Messages: 2015-04-16T01:32:12.785397-07:00 WARNING (standardbucket) DCP (Producer) eq_dcpq:xdcr:dcp_3754c145e93cbcddccbda66d99e2163a/standardbucket/standardbucket_172.23.105.58:11210_1 - (vb 511) Sending disk snapshot with start seqno 0 and end seqno 32519