Details
-
Bug
-
Resolution: Fixed
-
Major
-
4.6.0
-
Untriaged
-
-
No
Description
After 1.5 days into testing, a node went down and has not come back online. Looks to be same behavior seen in 3.1.3 but my understanding is MB-14068 fixes this in 4.0+. Would need confirmation if this is indeed duplicate issue:
At time of OOM there's the dupe partition logs on 172.23.105.62:
[couchdb:error,2016-11-04T04:48:28.957-07:00,couchdb_ns_1@127.0.0.1:<0.1258.0>:couch_log:error:44]set view `default`, mapreduce_view main (p
|
rod) group `_design/scale` have the duplicate partition versions [{35,
|
3 mins later we have OOM
Nov 4 04:51:47 kvm-s63704 kernel: [19417698.447312] Out of memory: Kill process 9098 (beam.smp) score 674 or sacrifice child
|
Nov 4 04:51:47 kvm-s63704 kernel: [19417698.453765] Killed process 9175 (godu) total-vm:11632kB, anon-rss:4588kB, file-rss:0kB
|
Nov 4 04:51:47 kvm-s63704 kernel: [19417698.463027] beam.smp invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
|
Nov 4 04:51:47 kvm-s63704 kernel: [19417698.463032] beam.smp cpuset=/ mems_allowed=0
|
Also on this node memcached process is no longer running
root@kvm-s63704:/opt/couchbase/var/lib/couchbase/logs# ps aux | grep [m]emcached
|
root@kvm-s63704:/opt/couchbase/var/lib/couchbase/logs#
|
Last log from memcached was bucket shutdown 2 hrs ago
2016-11-04T09:44:47.291209-07:00 NOTICE Shutting down OpenSSL
|
2016-11-04T09:44:47.797552-07:00 NOTICE Shutting down libevent
|
2016-11-04T09:44:47.797644-07:00 NOTICE Shutdown complete.
|
So doesn't look like mcd is being targed by OOM, just couchdb.
+Subsequent attempts to restart couchdb results in OOM
Attachments
Issue Links
For Gerrit Dashboard: MB-22063 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
71175,2 | MB-22063[BP MB-21594] Remove old partition versions during rebalance | watson | couchdb | Status: MERGED | +2 | +1 |