Description
Build
4.5.0-1883
Testcase
./testrunner -i INI_FILE.ini -p skip-cleanup=True,get-cbcollect-info=False,get-logs=False,stop-on-failure=False -t fts.moving_topology_fts.MovingTopFTS.rebalance_out_during_querying,items=10000,cluster=D,F,D+F,F,fail-on-errors=False,num_queries=100,GROUP=P0,num_rebalance=2,compare_es=True
ES need to be configured for the above test.
Steps:
1. Cluster: D,F,D+F,F (4 nodes, 3 have fts enabled)
2. Load 10K docs and build an index.
3. Run 100 queries, compare results to ES. All passed.
4. Now trigger rebalance out of 2 fts nodes.
5. In parallel, fire the same 100 queries with ES validation. 12-15 queries failed with some docs missing from FTS.
It's interesting to note that the failed queries [36, 37, 39, 40, 43, 45, 47, 48, 49, 57, 58, 59] are almost consecutive indicating a small phase (<1 min) when something goes wrong..
2016-03-21 15:04:28 | INFO | MainProcess | Cluster_Thread | [task.execute] ------------------------------------------------------------------ Query # 36 -----------------------------------------------------------------
2016-03-21 15:04:28 | INFO | MainProcess | Cluster_Thread | [fts_base.run_fts_query] Running query {"from": 0, "indexName": "default_index", "fields": [], "explain": false, "ctl": {"timeout": 0, "consistency": {"vectors": {}, "level": ""}}, "query":
, "size": 10000000} on node: 172.23.106.175
:
:
2016-03-21 15:05:02 | INFO | MainProcess | Cluster_Thread | [task.execute] ------------------------------------------------------------------ Query # 58 -----------------------------------------------------------------
2016-03-21 15:05:02 | INFO | MainProcess | Cluster_Thread | [fts_base.run_fts_query] Running query {"from": 0, "indexName": "default_index", "fields": [], "explain": false, "ctl": {"timeout": 0, "consistency": {"vectors": {}, "level": ""}}, "query":
, "size": 10000000} on node: 172.23.106.175
2016-03-21 15:05:02 | INFO | MainProcess | Cluster_Thread | [task.execute] FTS hits for query:
is 869 (took 43.308229ms)
However some queries in this phase are also successful.
Full testrunner log - https://gist.github.com/arunapiravi/f88e825a81471512955c
Attaching cbcollect from all nodes.
Also confirmed that the same failing queries returned expected results past the rebalance phase.
Attachments
Issue Links
- relates to
-
MB-19292 [FTS] MCP: Querying during swap rebalance does not return correct results for some queries
- Closed