Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59769

Internal error with length() called on invalid unicode sequence

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown
    • Analytics Sprint 31, Analytics Sprint 32

    Description

      Internal error is observed when length() function is called on an invalid UTF-8 sequence. For example, the following query returns an internal error

      select string_length("xxxxxxxxxx x??\uDEAD"); 

      In the logs we can see the decoding error

      Caused by: java.lang.IllegalArgumentException: Decoding error: got a low surrogate without a leading high surrogate    at org.apache.hyracks.util.string.UTF8StringUtil.codePointSize(UTF8StringUtil.java:127) ~[hyracks-util-7.2.2-6401.jar:7.2.2-6401]    at org.apache.hyracks.util.string.UTF8StringUtil.getNumCodePoint(UTF8StringUtil.java:214) ~[hyracks-util-7.2.2-6401.jar:7.2.2-6401]    at org.apache.asterix.runtime.evaluators.functions.StringLengthDescriptor$1$1.evaluate(StringLengthDescriptor.java:88) ~

      In case of invalid UTF-8 strings length should return null instead of an error.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-59769
          # Subject Branch Project Status CR V

          Activity

            People

              abhay.aggrawal Abhay Aggrawal
              peeyush.gupta Peeyush Gupta
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty