Details
-
Bug
-
Resolution: Fixed
-
Critical
-
4.5.0
-
Untriaged
-
Unknown
Description
Build
4.5.0-1960
Testcase
./testrunner -i INI_FILE.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False,index_retry=10,GROUP=P0 -t fts.moving_topology_fts.MovingTopFTS.rebalance_out_during_index_building,items=30000,cluster=D,F,F,index_replicas=1,standard_buckets=2,sasl_buckets=2,GROUP=P0
All nodes have 4cores, 4GB ram, SSDs.
Steps
1. D,F,F cluster, 5 buckets having 30K docs each.
2. with index_replica=1, create default index.
3. During index building, rebalance out an fts node.
4. Rebalance fails with message indicating cbft was killed. cbcollect syslog contains "messages" file that indicates cbft was killed twice -
2016-03-28 15:45:17 | INFO | MainProcess | test_thread | [moving_topology_fts.rebalance_out_during_index_building] Index building has begun...
|
2016-03-28 15:45:20 | INFO | MainProcess | test_thread | [moving_topology_fts.rebalance_out_during_index_building] Index count for default_index_1: 6335
|
2016-03-28 15:45:22 | INFO | MainProcess | test_thread | [moving_topology_fts.rebalance_out_during_index_building] Index count for sasl_bucket_1_index_1: 5529
|
2016-03-28 15:45:25 | INFO | MainProcess | test_thread | [moving_topology_fts.rebalance_out_during_index_building] Index count for sasl_bucket_2_index_1: 3183
|
2016-03-28 15:45:27 | INFO | MainProcess | test_thread | [moving_topology_fts.rebalance_out_during_index_building] Index count for standard_bucket_1_index_1: 1608
|
2016-03-28 15:45:30 | INFO | MainProcess | test_thread | [moving_topology_fts.rebalance_out_during_index_building] Index count for standard_bucket_2_index_1: 787
|
2016-03-28 15:45:30 | INFO | MainProcess | test_thread | [fts_base.__async_rebalance_out] Starting rebalance-out nodes:[ip:172.23.106.176 port:8091 ssh_username:root] at C1 cluster 172.23.106.139
|
2016-03-28 15:45:30 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance params : password=password&ejectedNodes=ns_1%40172.23.106.176&user=Administrator&knownNodes=ns_1%40172.23.106.176%2Cns_1%40172.23.106.175%2Cns_1%40172.23.106.139
|
2016-03-28 15:45:30 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance operation started
|
2016-03-28 15:45:30 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 0.00 %
|
2016-03-28 15:45:41 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 16.67 %
|
2016-03-28 15:45:51 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 16.67 %
|
2016-03-28 15:46:01 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 18.27 %
|
2016-03-28 15:46:11 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 23.08 %
|
2016-03-28 15:46:21 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 23.08 %
|
2016-03-28 15:46:31 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 26.28 %
|
2016-03-28 15:46:41 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 26.28 %
|
2016-03-28 15:46:51 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 29.49 %
|
2016-03-28 15:47:01 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 29.49 %
|
2016-03-28 15:47:11 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 31.09 %
|
2016-03-28 15:47:22 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 32.69 %
|
2016-03-28 15:47:32 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 32.69 %
|
2016-03-28 15:47:42 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 39.10 %
|
2016-03-28 15:47:52 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 45.51 %
|
2016-03-28 15:48:02 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 51.92 %
|
2016-03-28 15:48:12 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 61.54 %
|
2016-03-28 15:48:22 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 61.54 %
|
2016-03-28 15:48:32 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 61.54 %
|
2016-03-28 15:48:42 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 61.54 %
|
2016-03-28 15:48:52 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 63.14 %
|
2016-03-28 15:49:02 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 66.35 %
|
2016-03-28 15:49:12 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 66.35 %
|
2016-03-28 15:49:22 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 66.35 %
|
2016-03-28 15:49:33 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 71.15 %
|
2016-03-28 15:49:43 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 71.15 %
|
2016-03-28 15:49:53 | ERROR | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'} - rebalance failed
|
2016-03-28 15:49:56 | INFO | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] Latest logs from UI on 172.23.106.139:
|
2016-03-28 15:49:56 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.106.139', u'code': 2, u'text': u'Rebalance exited with reason {service_rebalance_failed,fts,\n {lost_connection,shutdown}}\n', u'shortText': u'message', u'serverTime': u'2016-03-28T15:49:52.944Z', u'module': u'ns_orchestrator', u'tstamp': 1459205392944, u'type': u'info'}
|
2016-03-28 15:49:56 | ERROR | MainProcess | Cluster_Thread | [rest_client.print_UI_logs] {u'node': u'ns_1@172.23.106.175', u'code': 0, u'text': u"Service 'fts' exited with status 1. Restarting. Messages: 2016-03-28T15:49:34.116-07:00 [INFO] janitor: feeds to remove: 0\n2016-03-28T15:49:34.116-07:00 [INFO] janitor: feeds to add: 0\n2016-03-28T15:49:37.621-07:00 [INFO] moss_herder: persistence progess, waiting: 27\n2016-03-28T15:49:48.875-07:00 [INFO] moss_herder: persistence progess, waiting: 7\n[goport] 2016/03/28 15:49:52 /opt/couchbase/bin/cbft terminated: signal: killed", u'shortText': u'message', u'serverTime': u'2016-03-28T15:49:52.608Z', u'module': u'ns_log', u'tstamp': 1459205392608, u'type': u'info'}
|
On .175 -
Mar 28 15:49:51 localhost kernel: Out of memory: Kill process 17132 (cbft) score 426 or sacrifice child
|
:
|
:
|
Mar 28 15:54:32 localhost kernel: Out of memory: Kill process 17894 (cbft) score 425 or sacrifice child
|
Attaching logs from
.139 --> kv
.175 --> fts.
.176 --> fts, rebalanced out node
Attachments
Issue Links
- relates to
-
MB-18965 [FTS] rebalance: MCP rebalance stuck when cbft cant connect to memcached
- Closed