Details
Description
So this bug is tough, because when the queries that are marked as failed are rerun inside the exact environment in which they failed, they pass. Im thinking this is an issue that occurs when multiple join queries are run at the same time. Basically the query results are off by 1 for the n1ql vs the sql when the failure is observed. I have left an open environment at this VM 172.23.105.209, but running this command should reproduce the error consistently:
./testrunner -i /root/tuqvm.ini -p gsi_type=plasma -t rqg.test_rqg.RQGTests.test_rqg_concurrent,test_file_path=b/resources/rqg/multiple_table_db/query_test_using_templates/queries_joins_50000.txt.zip,database=multiple_table_db,reset_database=True,concurreny_count=10,index_quota_percent=30,record_failure=False,password=password,use_mysql=True,replicas=0,total_queries=200,failure_record_path=/tmp,skip_cleanup=false,subquery=False,ansi_joins=True,create_secondary_meta_indexes=True,create_secondary_indexes=True
Here is an example of a failing query (that passes when run by itself on the same environment):
SELECT t_2.int_field1 , t_2.decimal_field1 , t_1.primary_key_id , t_1.varchar_field1 , t_1.char_field1 , t_4.bool_field1 FROM multiple_table_db_1815_simple_table_10 t_4 LEFT JOIN multiple_table_db_1815_simple_table_3 t_5 ON ( t_4.primary_key_id = t_5.primary_key_id ) LEFT JOIN multiple_table_db_1815_simple_table_2 t_1 ON ( t_5.primary_key_id = t_1.primary_key_id ) INNER JOIN multiple_table_db_1815_simple_table_1 t_3 ON ( t_1.primary_key_id = t_3.primary_key_id ) LEFT JOIN multiple_table_db_1815_simple_table_4 t_2 ON ( t_3.primary_key_id = t_2.primary_key_id ) INNER JOIN multiple_table_db_1815_simple_table_4 t_2BodvZKCwzI ON ( t_1.primary_key_id = t_2BodvZKCwzI.primary_key_id ) INNER JOIN multiple_table_db_1815_simple_table_4 t_2OVdKJPpnph ON ( t_2.primary_key_id = t_2OVdKJPpnph.primary_key_id ) LEFT JOIN multiple_table_db_1815_simple_table_4 t_2cJFiCeQfRo ON ( t_5.primary_key_id = t_2cJFiCeQfRo.primary_key_id ) WHERE ((NOT (t_5.bool_field1) OR t_2.decimal_field1 > 4942)) OR (NOT (t_2.bool_field1))
The jenkins job claims that this query returns 872 results instead of 873 results, but when run through the UI it returns 873 results. I will attach the logs, but if you could see if this is a bug that would be great.
I don't think its an rqg issue because this issue is only seen with ANSI JOINS runs, the normal JOINS runs do not have this issue, and the current ansi job is just the normal job with ON instead of ON KEYS.