Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-25713

Query limit does not help reducing index scans when a distinct index is defined on the array used by the query's equal predicate.

    XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Major
    • None
    • 5.0.0
    • query
    • None
    • YCSB SOE workloadsj (ArrayScan) test.

    Description

      Regarding SOE ArrayScan test, we are running the following query with the following index created.

      Query: SELECT RAW meta().id FROM `bucket-1`WHERE ANY v IN devices SATISFIES v = "FF-012" END ORDER BY META().id LIMIT 10

      Index: CREATE INDEX ix ON `bucket-1`((DISTINCT devices))

       

      In our 2.5 million customer SOE data set, the query (without ‘order by meta().id limit 10’) returns 400 rows. And I see 400 index scans during the query run. This is expected.

       

      When I ran the same query with ‘order by meta().id limit 10’, it returns 10 rows, and I see 400 index scans during the query run. I understand that is because the order by meta().id forces the scan of the index because the index does not contain meta().id.

       

      But when I ran the same query with only ‘LIMIT 10’ (without order by clause), I still see 400 index scans. This is not what I have expected.  I expect such a query should cause only 10 index scans instead of 400. 

       

      I also tried to create another index with META().id added like the following.

      CREATE INDEX {name} ON `{bucket}`((DISTINCT devices), META().id)

       

      Then when I ran the same query above. I am still seeing 400 index scans when the query is ran with 'LIMIT 10' or ‘order by META().id LIMIT 10’.  Since the index contains META().id now, I expect the query causes less index scans.

       

      With this issue, it means queries with array value predicates as well as limit clause will be more expensive than needed.

      Assuming we have a customer table and each customer document has an array of visited cities in the document. And we defined a distinct array index on the visited cities array.

      A query to find out 10 customer that visited 'newyork' will have to scan the the index for all the customers ever visited 'newyork'. 

       

       

       

       

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            keshav Keshav Murthy
            hui.wang Hui Wang (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty