Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Major
Fix Version/s: 7.0.0
Affects Version/s: 5.0.0
Component/s: fts
Labels:
- functional-test
- known_issue

Triage:
Triaged
Is this a Regression?:
No

Description

Build

5.0.0-3317

Testcase

Pls see ~~MB-25272~~, rebalance now failed in the same test.

"This test has never failed in previous runs but I do not think it's a regression. Probably timing-related or a race in index deletion.

1. Cluster: D, D+F, D+F
2. Test indexes 200k docs, in the middle of indexing, fails over one fts node and updates the index definition to use 512 pindexes(oops! note to self: change to 8 or 16)
3. Indexing completes.
4. In the teardown phase, we start by deleting the index, then the buckets and then try to de-cluster the nodes when rebalance gets stuck at 75% "..and then fails with the following error.

./testrunner -i INI_FILE.ini get-cbcollect-info=True,get-logs=False,stop-on-failure=False,level_compaction=False,GROUP=P1 -t fts.moving_topology_fts.MovingTopFTS.update_index_during_failover,items=200000,cluster=D+F+F,GROUP=P1

[2017-07-16 00:21:42,820] - [fts_base:1709] INFO - removing nodes from cluster ...
[2017-07-16 00:21:42,835] - [fts_base:1711] INFO - cleanup [ip:172.23.105.96 port:8091 ssh_username:root, ip:172.23.105.190 port:8091 ssh_username:root, ip:172.23.106.66 port:8091 ssh_username:root]
[2017-07-16 00:21:42,878] - [bucket_helper:142] INFO - deleting existing buckets [u'default'] on 172.23.105.96
[2017-07-16 00:21:42,878] - [bucket_helper:144] INFO - remove bucket default ...
[2017-07-16 00:21:45,725] - [bucket_helper:158] INFO - deleted bucket : default from 172.23.105.96
[2017-07-16 00:21:45,726] - [bucket_helper:234] INFO - waiting for bucket deletion to complete....
[2017-07-16 00:21:45,730] - [rest_client:133] INFO - node 172.23.105.96 existing buckets : []
[2017-07-16 00:21:45,767] - [cluster_helper:254] INFO - rebalancing all nodes in order to remove nodes
[2017-07-16 00:21:45,773] - [rest_client:1395] INFO - rebalance params : {'password': 'password', 'ejectedNodes': u'ns_1@172.23.106.66,ns_1@172.23.105.190', 'user': 'Administrator', 'knownNodes': u'ns_1@172.23.106.66,ns_1@172.23.105.96,ns_1@172.23.105.190'}
[2017-07-16 00:21:45,786] - [rest_client:1400] INFO - rebalance operation started
[2017-07-16 00:21:45,793] - [rest_client:1548] INFO - rebalance percentage : 0.00 %
[2017-07-16 00:21:55,810] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:22:05,826] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:22:15,842] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:22:25,859] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:22:35,869] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:22:45,885] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:22:55,902] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:23:05,915] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:23:15,930] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:23:25,950] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:23:35,968] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:23:45,983] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:23:55,999] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:24:06,018] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:24:16,038] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:24:26,053] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:24:36,059] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:24:46,067] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:24:56,081] - [rest_client:1548] INFO - rebalance percentage : 75.00 %
[2017-07-16 00:25:06,096] - [rest_client:1531] ERROR - {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try again.'} - rebalance failed
[2017-07-16 00:25:06,122] - [rest_client:2955] INFO - Latest logs from UI on 172.23.105.96:
[2017-07-16 00:25:06,122] - [rest_client:2956] ERROR - {u'node': u'ns_1@172.23.105.190', u'code': 0, u'text': u"Service 'fts' exited with status 137. Restarting. Messages:\n2017-07-16T00:25:00.892-07:00 [INFO] feed_dcp: OnError, name: default_index_1_3e2f9755ca7458f5_fb9bdf16: bucketName: default, bucketUUID: , err: worker auth, AuthenticateMemcachedConn, server: 172.23.105.190:11210, err: MCResponse status=EACCESS, opcode=0x89, opaque=0, msg: \n2017-07-16T00:25:00.892-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: worker auth, AuthenticateMemcachedConn, server: 172.23.105.190:11210, err: MCResponse status=EACCESS, opcode=0x89, opaque=0, msg: \n2017-07-16T00:25:00.940-07:00 [INFO] Using plain authentication for user @fts \n2017-07-16T00:25:00.941-07:00 [INFO] Using plain authentication for user @fts \n2017-07-16T00:25:01.004-07:00 [INFO] feed_dcp: OnError, name: default_index_1_3e2f9755ca7458f5_d75fa3fd: bucketName: default, bucketUUID: , err: worker auth, AuthenticateMemcachedConn, server: 172.23.105.190:11210, err: MCResponse status=EACCESS, opcode=0x89, opaque=0, msg: \n2017-07-16T00:25:01.008-07:00 [INFO] main: meh.OnFeedError, srcType: couchbase, err: worker auth, AuthenticateMemcachedConn, server: 172.23.105.190:11210, err: MCResponse status=EACCESS, opcode=0x89, opaque=0, msg: \n[goport(/opt/couchbase/bin/cbft)] 2017/07/16 00:25:03 child process exited with status 137\n", u'shortText': u'message', u'serverTime': u'2017-07-16T00:25:03.445Z', u'module': u'ns_log', u'tstamp': 1500189903445, u'type': u'info'}
[2017-07-16 00:25:06,123] - [rest_client:2956] ERROR - {u'node': u'ns_1@172.23.105.96', u'code': 0, u'text': u'Rebalance exited with reason {service_rebalance_failed,fts,\n {lost_connection,shutdown}}', u'shortText': u'message', u'serverTime': u'2017-07-16T00:25:03.293Z', u'module': u'ns_orchestrator', u'tstamp': 1500189903293, u'type': u'critical'}

Pls note: subsequent attempts at rebalance pass, so marking it major.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

172.23.106.66-20170716-0030-diag.zip
11.76 MB
20/Jul/17 9:48 PM
172.23.105.96-20170716-0027-diag.zip
23.95 MB
20/Jul/17 9:49 PM
172.23.105.190-20170716-0028-diag.zip
37.78 MB
20/Jul/17 9:49 PM

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Evgeny Makarenko (Inactive)

Reporter:: Aruna Piravi (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 20/Jul/17 9:50 PM

Updated:: 17/Jun/21 3:46 PM

Resolved:: 08/Feb/21 8:19 AM

Gerrit Reviews

There are no open Gerrit changes

Show There is 1 closed Gerrit change

Hide There is 1 closed Gerrit change

Additional automation for https://issues.couchbase.com/browse/MB-25354: Gerrit Review:

[FTS] Rebalance fails from OOM killer, when many pindexes (> 256)

Details

Description

Attachments

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty