Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-36461

[ARRAY RQG] query returning extra results that are missing the requested field

    XMLWordPrintable

    Details

    • Triage:
      Untriaged
    • Is this a Regression?:
      Unknown

      Description

      I have a cluster with a kv,n1ql,index node and a kv,analytics node

      default bucket with data set defined by a yaml file and 50k docs
      dataverse called default

      CREATE INDEX `adv_type` ON `default`(`type`)

      SELECT id, owner, type FROM `default` WHERE EVERY p IN idents.accts.profids SATISFIES p.type = "shopfc" END AND type = 'links' ORDER BY id DESC, owner, type DESC

      analytics query:
      SELECT idents.accts FROM `default` d WHERE (EVERY p IN d.idents.accts.profids SATISFIES p.`type` = "shopfc" END) AND `type` = 'links' ORDER BY id DESC, owner, `type` DESC

      n1ql query returns 195 results

      matching analytics query returns 3598 results

      Logs and backups attached

      backup: https://www.dropbox.com/s/dcrfil3ne6volww/analyticsbug.tar.gz?dl=0

      However the analytics query is returns documents that DON'T match the criteria:

      looking at the accts field of these documents we see that most of the returned results do not even contain the profids field.

      {
      "accts": {
      "mems": [

      { "memid": 264, "progid": "VW", "validated": true }

      ]
      }
      }

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          ajay.bhullar Ajay Bhullar added a comment -

          Keshav Murthy According to what Dmitry has said above, that would make this a n1ql issue in which documents where the field is missing are being filtered out. Please check

          Show
          ajay.bhullar Ajay Bhullar added a comment - Keshav Murthy According to what Dmitry has said above, that would make this a n1ql issue in which documents where the field is missing are being filtered out. Please check
          Hide
          dmitry.lychagin Dmitry Lychagin added a comment -

          I'm still researching this case that's why I removed my earlier comment. Sorry for the confusion.

          The issue is what happens when the input to IN subclause is MISSING.

          SELECT EVERY x IN MISSING SATISFIES x = 1 END
          

          returns TRUE in Analytics, but MISSING in N1QL

          SELECT EVERY x IN NULL SATISFIES x = 1 END
          

          returns TRUE in Analytics, but NULL in N1QL

          SELECT EVERY x IN [] SATISFIES x = 1 END
          

          returns TRUE in both engines

          Show
          dmitry.lychagin Dmitry Lychagin added a comment - I'm still researching this case that's why I removed my earlier comment. Sorry for the confusion. The issue is what happens when the input to IN subclause is MISSING. SELECT EVERY x IN MISSING SATISFIES x = 1 END returns TRUE in Analytics, but MISSING in N1QL SELECT EVERY x IN NULL SATISFIES x = 1 END returns TRUE in Analytics, but NULL in N1QL SELECT EVERY x IN [] SATISFIES x = 1 END returns TRUE in both engines
          Hide
          dmitry.lychagin Dmitry Lychagin added a comment -

          We decided to defer this issue to the next release. The fix is not trivial and and we need to have a broader discussion on the semantics of the iteration over NULL/MISSING values. 
          The workaround is to use IS KNOWN predicate to test whether the IN value is not NULL/MISSING:  WHERE (x IS KNOWN) AND (EVERY y IN x SATISFIES ... END)
          The original query needs to be rewritten as:

          SELECT idents.accts FROM `default` d WHERE (d.idents.accts.profids IS KNOWN) AND (EVERY p IN d.idents.accts.profids SATISFIES p.`type` = "shopfc" END) AND `type` = 'links' ORDER BY id DESC, owner, `type` DESC
          

          We also need to add a release note documenting this discrepancy between the two engines.

          Also note, if the IN value is an empty array then EVERY expression would return TRUE in both engines. If user wants to avoid this then EXISTS could be used to check for both conditions at once (known and not empty): WHERE (EXISTS x) AND (EVERY y IN x SATISFIES ... END)

           

          Show
          dmitry.lychagin Dmitry Lychagin added a comment - We decided to defer this issue to the next release. The fix is not trivial and and we need to have a broader discussion on the semantics of the iteration over NULL/MISSING values.  The workaround is to use IS KNOWN predicate to test whether the IN value is not NULL/MISSING:  WHERE (x IS KNOWN) AND (EVERY y IN x SATISFIES ... END) The original query needs to be rewritten as: SELECT idents.accts FROM `default` d WHERE (d.idents.accts.profids IS KNOWN) AND (EVERY p IN d.idents.accts.profids SATISFIES p.`type` = "shopfc" END) AND `type` = 'links' ORDER BY id DESC, owner, `type` DESC We also need to add a release note documenting this discrepancy between the two engines. Also note, if the IN value is an empty array then EVERY expression would return TRUE in both engines. If user wants to avoid this then EXISTS could be used to check for both conditions at once (known and not empty): WHERE (EXISTS x) AND (EVERY y IN x SATISFIES ... END)  
          Hide
          amarantha.kulkarni Amarantha Kulkarni added a comment -

          Description for release notes:

          Summary: In cases where the input to IN subclause with EVERY quantifier is MISSING or NULL, Analytics and Query engines differ in behavior. The Analytics service treats MISSING or NULL input values(in this case) as equivalent to an empty array, which results in the whole `EVERY … IN …` expression returning TRUE, while the Query service returns MISSING if the input is MISSING (or NULL if the input is NULL).

          Workaround: Use the IS KNOWN predicate to test whether the IN value is not NULL/MISSING.

          `WHERE (x IS KNOWN) AND (EVERY y IN x SATISFIES ... END)`

          Show
          amarantha.kulkarni Amarantha Kulkarni added a comment - Description for release notes: Summary: In cases where the input to IN subclause with EVERY quantifier is MISSING or NULL, Analytics and Query engines differ in behavior. The Analytics service treats MISSING or NULL input values(in this case) as equivalent to an empty array, which results in the whole `EVERY … IN …` expression returning TRUE, while the Query service returns MISSING if the input is MISSING (or NULL if the input is NULL). Workaround: Use the IS KNOWN predicate to test whether the IN value is not NULL/MISSING. `WHERE (x IS KNOWN) AND (EVERY y IN x SATISFIES ... END)`

            People

            Assignee:
            dmitry.lychagin Dmitry Lychagin
            Reporter:
            ajay.bhullar Ajay Bhullar
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty