Loading...

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 4.5.0
Affects Version/s: 4.5.0
Component/s: cbft
Labels:
- functional-test

Triage:
Untriaged
Is this a Regression?:
Unknown

Description

Build
4.5.0-1960

Testcase
./testrunner -i INI_FILE.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False,index_retry=10,GROUP=P0 -t fts.moving_topology_fts.MovingTopFTS.rebalance_out_during_index_building,items=30000,cluster=D,F,F,index_replicas=1,standard_buckets=2,sasl_buckets=2,GROUP=P0

All nodes have 4cores, 4GB ram, SSDs.

Steps
1. D,F,F cluster, 5 buckets having 30K docs each.
2. with index_replica=1, create default index.
3. During index building, rebalance out an fts node.
4. Rebalance fails with message indicating cbft was killed. cbcollect syslog contains "messages" file that indicates cbft was killed twice -

2016-03-28 15:45:17 | INFO | MainProcess | test_thread | [moving_topology_fts.rebalance_out_during_index_building] Index building has begun...

2016-03-28 15:45:20 | INFO | MainProcess | test_thread | [moving_topology_fts.rebalance_out_during_index_building] Index count for default_index_1: 6335

2016-03-28 15:45:22 | INFO | MainProcess | test_thread | [moving_topology_fts.rebalance_out_during_index_building] Index count for sasl_bucket_1_index_1: 5529

2016-03-28 15:45:25 | INFO | MainProcess | test_thread | [moving_topology_fts.rebalance_out_during_index_building] Index count for sasl_bucket_2_index_1: 3183

2016-03-28 15:45:27 | INFO | MainProcess | test_thread | [moving_topology_fts.rebalance_out_during_index_building] Index count for standard_bucket_1_index_1: 1608

2016-03-28 15:45:30 | INFO | MainProcess | test_thread | [moving_topology_fts.rebalance_out_during_index_building] Index count for standard_bucket_2_index_1: 787

2016-03-28 15:45:30 | INFO | MainProcess | test_thread | [fts_base.__async_rebalance_out] Starting rebalance-out nodes:[ip:172.23.106.176 port:8091 ssh_username:root] at C1 cluster 172.23.106.139

2016-03-28 15:45:30 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance params : password=password&ejectedNodes=ns_1%40172.23.106.176&user=Administrator&knownNodes=ns_1%40172.23.106.176%2Cns_1%40172.23.106.175%2Cns_1%40172.23.106.139

2016-03-28 15:45:30 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance operation started

2016-03-28 15:45:30 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 0.00 %

2016-03-28 15:45:41 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 16.67 %

2016-03-28 15:45:51 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 16.67 %

2016-03-28 15:46:01 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 18.27 %

2016-03-28 15:46:11 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 23.08 %

2016-03-28 15:46:21 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 23.08 %

2016-03-28 15:46:31 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 26.28 %

2016-03-28 15:46:41 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 26.28 %

2016-03-28 15:46:51 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 29.49 %

2016-03-28 15:47:01 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 29.49 %

2016-03-28 15:47:11 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 31.09 %

2016-03-28 15:47:22 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 32.69 %

2016-03-28 15:47:32 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 32.69 %

2016-03-28 15:47:42 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 39.10 %

2016-03-28 15:47:52 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 45.51 %

2016-03-28 15:48:02 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 51.92 %

2016-03-28 15:48:12 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 61.54 %

2016-03-28 15:48:22 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 61.54 %

2016-03-28 15:48:32 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 61.54 %

2016-03-28 15:48:42 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 61.54 %

2016-03-28 15:48:52 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 63.14 %

2016-03-28 15:49:02 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 66.35 %

2016-03-28 15:49:12 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 66.35 %

2016-03-28 15:49:22 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 66.35 %

2016-03-28 15:49:33 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 71.15 %

2016-03-28 15:49:43 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 71.15 %

2016-03-28 15:49:53 | ERROR | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'} - rebalance failed

2016-03-28 15:49:56 | INFO | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] Latest logs from UI on 172.23.106.139:

2016-03-28 15:49:56 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.106.139', u'code': 2, u'text': u'Rebalance exited with reason {service_rebalance_failed,fts,\n                                 {lost_connection,shutdown}}\n', u'shortText': u'message', u'serverTime': u'2016-03-28T15:49:52.944Z', u'module': u'ns_orchestrator', u'tstamp': 1459205392944, u'type': u'info'}

2016-03-28 15:49:56 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.106.175', u'code': 0, u'text': u"Service 'fts' exited with status 1. Restarting. Messages: 2016-03-28T15:49:34.116-07:00 [INFO] janitor: feeds to remove: 0\n2016-03-28T15:49:34.116-07:00 [INFO] janitor: feeds to add: 0\n2016-03-28T15:49:37.621-07:00 [INFO] moss_herder: persistence progess, waiting: 27\n2016-03-28T15:49:48.875-07:00 [INFO] moss_herder: persistence progess, waiting: 7\n[goport] 2016/03/28 15:49:52 /opt/couchbase/bin/cbft terminated: signal: killed", u'shortText': u'message', u'serverTime': u'2016-03-28T15:49:52.608Z', u'module': u'ns_log', u'tstamp': 1459205392608, u'type': u'info'}

On .175 -

Mar 28 15:49:51 localhost kernel: Out of memory: Kill process 17132 (cbft) score 426 or sacrifice child

Mar 28 15:54:32 localhost kernel: Out of memory: Kill process 17894 (cbft) score 425 or sacrifice child

Attaching logs from
.139 --> kv
.175 --> fts.
.176 --> fts, rebalanced out node

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

172.23.106.139-20160328-1550-diag.zip
19.49 MB
28/Mar/16 5:35 PM
172.23.106.175-20160328-1601-diag.zip
9.18 MB
28/Mar/16 5:35 PM
172.23.106.176-20160328-1604-diag.zip
11.38 MB
28/Mar/16 5:35 PM

Issue Links

relates to

MB-18965 [FTS] rebalance: MCP rebalance stuck when cbft cant connect to memcached

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

[FTS] MCP: Rebalance fails due to cbft getting killed by OOM killer

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty