Loading...

Details

Type: Bug
Resolution: Cannot Reproduce
Priority: Critical
Fix Version/s: 5.0.0
Affects Version/s: 5.0.0
Component/s: 3rd-party
Labels:
- functional-test

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide

https://s3.amazonaws.com/cb-engineering/Aruna/collectinfo-2016-07-19T223130-ns_1%40172.23.105.216.zip
https://s3.amazonaws.com/cb-engineering/Aruna/collectinfo-2016-07-19T223130-ns_1%40172.23.105.224.zip
https://s3.amazonaws.com/cb-engineering/Aruna/collectinfo-2016-07-19T223130-ns_1%40172.23.106.120.zip

Show
https://s3.amazonaws.com/cb-engineering/Aruna/collectinfo-2016-07-19T223130-ns_1%40172.23.105.216.zip https://s3.amazonaws.com/cb-engineering/Aruna/collectinfo-2016-07-19T223130-ns_1%40172.23.105.224.zip https://s3.amazonaws.com/cb-engineering/Aruna/collectinfo-2016-07-19T223130-ns_1%40172.23.106.120.zip
Is this a Regression?:
Unknown

Description

Build
4.5.0-0888

Testcase (new)
./testrunner -i INI_FILE.ini -p get-cbcollect-info=True,get-coredumps=True,get-logs=False,stop-on-failure=False,cluster=D:F:F,GROUP=DGM -t fts.stable_topology_fts.StableTopFTS.create_simple_default_index,cluster=D,F,D+F,dgm_run=1,active_resident_ratio=10,eviction_policy=fullEviction,moss_compact_threshold=20,GROUP=DGM

Everything goes wrong after 9:18:33PM when each of the nodes encounter tick_timeouts, and one by one, all nodes repeatedly take over as master until all of them go down and memc and compactor crash on .120. Couchbase restarts repeatedly on .120 and then goes into a kernel panic. I had to have IT bring up the node. Attaching cbcollect from all nodes.

.120 had kv + fts
.224 and .216 - only fts.

In descending order of events -

Control connection to memcached on 'ns_1@172.23.106.120' disconnected: {{badmatch,

                                                                         {error,

                                                                          timeout}},

                                                                        [{mc_client_binary,

                                                                          cmd_vocal_recv,

5,

                                                                          [{file,

                                                                            "src/mc_client_binary.erl"},

                                                                           {line,

                                                                            156}]},

                                                                         {mc_client_binary,

                                                ... show	ns_memcached 000	ns_1@172.23.106.120	9:45:10 PM Mon Jul 18, 2016

 Compactor for database `default` (pid [{type,database},

                                       {important,true},

                                       {name,<<"default">>},

                                       {fa,

                                        {#Fun<compaction_new_daemon.4.92023696>,

                                         [<<"default">>,

                                          {config,

                                           {30,18446744073709551616},

                                           {30,18446744073709551616},

                                           undefined,false,false,

                                           {daemon_config,30,131072,20971520}},

                                          false,

                                          {[{type,bucket}]}]}}]) terminated unexpectedly: {timeout,

                                                                                           {gen_server,

                                                                                            call,

                                                                                            [{'ns_memcached-default',

                                                                                              'ns_1@172.23.106.120'},

                                                                                             {raw_stats,

                                                                                              <<"diskinfo">>,

                                                                                              #Fun<compaction_new_daemon.18.92023696>,

                                                                                              {<<"0">>,

                                                                                               <<"0">>}},

                                                                                             180000]}} hide	compaction_new_daemon 000	ns_1@172.23.106.120	9:45:10 PM Mon Jul 18, 2016

Couchbase Server has started on web port 8091 on node 'ns_1@172.23.105.216'. Version: "4.7.0-888-enterprise".	menelaus_sup 001	ns_1@172.23.105.216	9:43:58 PM Mon Jul 18, 2016

Node 'ns_1@172.23.105.216' saw that node 'ns_1@172.23.105.224' went down. Details: [{nodedown_reason,

                                                                                     connection_closed}]	ns_node_disco 005	ns_1@172.23.105.216	9:43:52 PM Mon Jul 18, 2016

Node 'ns_1@172.23.105.216' saw that node 'ns_1@172.23.106.120' went down. Details: [{nodedown_reason,

                                                                                     connection_closed}]	ns_node_disco 005	ns_1@172.23.105.216	9:43:52 PM Mon Jul 18, 2016

Node 'ns_1@172.23.105.224' saw that node 'ns_1@172.23.106.120' went down. Details: [{nodedown_reason,

                                                                                     net_tick_timeout}]	ns_node_disco 005	ns_1@172.23.105.224	9:42:05 PM Mon Jul 18, 2016

Node 'ns_1@172.23.106.120' saw that node 'ns_1@172.23.105.216' went down. Details: [{nodedown_reason,

                                                                                     net_tick_timeout}]	ns_node_disco 005	ns_1@172.23.106.120	9:39:35 PM Mon Jul 18, 2016

Node 'ns_1@172.23.105.224' saw that node 'ns_1@172.23.105.216' went down. Details: [{nodedown_reason,

                                                                                     net_tick_timeout}]	ns_node_disco 005	ns_1@172.23.105.224	9:39:32 PM Mon Jul 18, 2016

Haven't heard from a higher priority node or a master, so I'm taking over.	mb_master 000	ns_1@172.23.105.224	9:38:34 PM Mon Jul 18, 2016

Haven't heard from a higher priority node or a master, so I'm taking over.	mb_master 000	ns_1@172.23.105.224	9:36:34 PM Mon Jul 18, 2016

Haven't heard from a higher priority node or a master, so I'm taking over.	mb_master 000	ns_1@172.23.105.224	9:31:38 PM Mon Jul 18, 2016

Haven't heard from a higher priority node or a master, so I'm taking over.	mb_master 000	ns_1@172.23.105.224	9:29:34 PM Mon Jul 18, 2016

Haven't heard from a higher priority node or a master, so I'm taking over.	mb_master 000	ns_1@172.23.106.120	9:28:44 PM Mon Jul 18, 2016

Haven't heard from a higher priority node or a master, so I'm taking over.	mb_master 000	ns_1@172.23.105.224	9:25:32 PM Mon Jul 18, 2016

Haven't heard from a higher priority node or a master, so I'm taking over.	mb_master 000	ns_1@172.23.105.224	9:21:34 PM Mon Jul 18, 2016

Haven't heard from a higher priority node or a master, so I'm taking over.	mb_master 000	ns_1@172.23.105.216	9:19:33 PM Mon Jul 18, 2016

Haven't heard from a higher priority node or a master, so I'm taking over.	mb_master 000	ns_1@172.23.105.224	9:18:33 PM Mon Jul 18, 2016

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Screen Shot 2016-07-19 at 3.43.14 PM.png
468 kB
19/Jul/16 4:02 PM
Screen Shot 2016-07-19 at 3.43.39 PM.png
222 kB
19/Jul/16 4:02 PM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Compactor and Memcached terminate unexpectedly, node goes down due to kernel panic

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty