Details
-
Bug
-
Resolution: Fixed
-
Major
-
7.2.2
-
Untriaged
-
0
-
Unknown
-
Analytics Sprint 31, Analytics Sprint 32
Description
Internal error is observed when length() function is called on an invalid UTF-8 sequence. For example, the following query returns an internal error
select string_length("xxxxxxxxxx x??\uDEAD"); |
In the logs we can see the decoding error
Caused by: java.lang.IllegalArgumentException: Decoding error: got a low surrogate without a leading high surrogate at org.apache.hyracks.util.string.UTF8StringUtil.codePointSize(UTF8StringUtil.java:127) ~[hyracks-util-7.2.2-6401.jar:7.2.2-6401] at org.apache.hyracks.util.string.UTF8StringUtil.getNumCodePoint(UTF8StringUtil.java:214) ~[hyracks-util-7.2.2-6401.jar:7.2.2-6401] at org.apache.asterix.runtime.evaluators.functions.StringLengthDescriptor$1$1.evaluate(StringLengthDescriptor.java:88) ~
|
In case of invalid UTF-8 strings length should return null instead of an error.
Attachments
Issue Links
- links to
For Gerrit Dashboard: MB-59769 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
201744,6 | MB-59769: Return null and warn for string functions for invalid unicode sequence | neo | cbas-core | Status: MERGED | +2 | +1 |
201956,3 | MB-59769: Move string utils to hyracks-api to avoid introducing new exceptions | neo | cbas-core | Status: MERGED | +2 | +1 |