Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 4.5.0
Affects Version/s: 4.5.0
Component/s: cbft
Labels:
None

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
https://s3.amazonaws.com/cb-customers/Aruna/collectinfo-2016-03-21T221331-ns_1%40172.23.106.139.zip - kv
https://s3.amazonaws.com/cb-customers/Aruna/collectinfo-2016-03-21T221331-ns_1%40172.23.106.175.zip - fts

Removed nodes:
https://s3.amazonaws.com/bugdb/jira/MB-18814/176.zip
https://s3.amazonaws.com/bugdb/jira/MB-18814/110.zip

Show
https://s3.amazonaws.com/cb-customers/Aruna/collectinfo-2016-03-21T221331-ns_1%40172.23.106.139.zip - kv https://s3.amazonaws.com/cb-customers/Aruna/collectinfo-2016-03-21T221331-ns_1%40172.23.106.175.zip - fts Removed nodes: https://s3.amazonaws.com/bugdb/jira/MB-18814/176.zip https://s3.amazonaws.com/bugdb/jira/MB-18814/110.zip
Is this a Regression?:
Unknown

Description

Build
4.5.0-1883

Testcase
./testrunner -i INI_FILE.ini -p skip-cleanup=True,get-cbcollect-info=False,get-logs=False,stop-on-failure=False -t fts.moving_topology_fts.MovingTopFTS.rebalance_out_during_querying,items=10000,cluster=D,F,D+F,F,fail-on-errors=False,num_queries=100,GROUP=P0,num_rebalance=2,compare_es=True

ES need to be configured for the above test.

Steps:
1. Cluster: D,F,D+F,F (4 nodes, 3 have fts enabled)
2. Load 10K docs and build an index.
3. Run 100 queries, compare results to ES. All passed.
4. Now trigger rebalance out of 2 fts nodes.
5. In parallel, fire the same 100 queries with ES validation. 12-15 queries failed with some docs missing from FTS.

It's interesting to note that the failed queries [36, 37, 39, 40, 43, 45, 47, 48, 49, 57, 58, 59] are almost consecutive indicating a small phase (<1 min) when something goes wrong..

2016-03-21 15:04:28 | INFO | MainProcess | Cluster_Thread | [task.execute] ------------------------------------------------------------------ Query # 36 -----------------------------------------------------------------
2016-03-21 15:04:28 | INFO | MainProcess | Cluster_Thread | [fts_base.run_fts_query] Running query {"from": 0, "indexName": "default_index", "fields": [], "explain": false, "ctl": {"timeout": 0, "consistency": {"vectors": {}, "level": ""}}, "query":

{"field": "manages.reports", "match": "Keelia Kallie Lilith Devi"}

, "size": 10000000} on node: 172.23.106.175
:
:
2016-03-21 15:05:02 | INFO | MainProcess | Cluster_Thread | [task.execute] ------------------------------------------------------------------ Query # 58 -----------------------------------------------------------------
2016-03-21 15:05:02 | INFO | MainProcess | Cluster_Thread | [fts_base.run_fts_query] Running query {"from": 0, "indexName": "default_index", "fields": [], "explain": false, "ctl": {"timeout": 0, "consistency": {"vectors": {}, "level": ""}}, "query":

{"field": "dept", "match": "Finance"}

, "size": 10000000} on node: 172.23.106.175
2016-03-21 15:05:02 | INFO | MainProcess | Cluster_Thread | [task.execute] FTS hits for query:

{"field": "dept", "match": "Finance"}

is 869 (took 43.308229ms)

However some queries in this phase are also successful.

Full testrunner log - https://gist.github.com/arunapiravi/f88e825a81471512955c
Attaching cbcollect from all nodes.

Also confirmed that the same failing queries returned expected results past the rebalance phase.

Attachments

Issue Links

relates to

MB-19292 [FTS] MCP: Querying during swap rebalance does not return correct results for some queries

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Aruna Piravi (Inactive)

Reporter:: Aruna Piravi (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 21/Mar/16 3:20 PM

Updated:: 04/May/16 5:21 PM

Resolved:: 07/Apr/16 4:14 PM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

MB-18814 - refactored out a assignPIndexLOCKED helper func: Gerrit Review:

MB-18814 - assign primary by first going through replica promotion: Gerrit Review:

[FTS] MCP: Incorrect query results returned during rebalance of fts nodes

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty