Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-56390

[CX] Incorrect selectivity for unnests and predicates on array indexes

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • Morpheus
    • 7.2.0
    • analytics
    • Untriaged
    • 0
    • Unknown
    • Analytics CBO Sprint 18, Analytics CBO Sprint 19, Analytics CBO Sprint 20, Analytics CBO Sprint 21, Analytics Sprint 24

    Description

      For the following CH2 query fragment:

      SELECT count(*) as revenue
      FROM   orders o, o.o_orderline ol
      WHERE  ol.ol_delivery_d  >= '2016-01-01 00:00:00.000000'
        AND  ol.ol_delivery_d < '2017-01-01 00:00:00.000000';
      

      Cardinality of orders is 300K, there are an average of 10 orderlines per orders, so the number of orderlines is ~3M.

      This query returns 320378 orderlines, so the selectivity (actual, not estimated) of the predicate is 320378/3M = 0.106

      When we estimate the selectivity using samples, the sample query returns 922 docs, but since we create the sample on orders (whose sample size is 1063), we compute the selectivity as 922/1063 = 0.867, which is clearly incorrect. Since the selectivity is too high, we do not choose an index scan, which leads to poor performance.

      The selectivity should be computed against the cardinality of the orderlines in the sample which is 10630, so estimated selectivity should be 922/10630 = 0.0867. When the query has an UNNEST operator, we need to account for an "unnesting factor" in the sample size used for selectivity estimation.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              vijay.sarathy Vijay Sarathy
              vijay.sarathy Vijay Sarathy
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - 120h
                  120h
                  Remaining:
                  Remaining Estimate - 120h
                  120h
                  Logged:
                  Time Spent - Not Specified
                  Not Specified

                  Gerrit Reviews

                    There are no open Gerrit changes

                    PagerDuty