Description
What's the issue?
Running a N1QL query against a bucket that contains valid JSON documents (i.e. that have the JSON bit set in the data type) may incorrectly return documents indicating that they are binary data.
Steps to reproduce
1) Download the attached JSON list of documents
2) Create a cluster with the data server, indexing service and the query service
3) Create a bucket
4) Run cbimport using the following command 'cbimport json -c $HOST -u $USER -p $PASSWORD -b $BUCKET -d file://testList.json -f list -g %name%'
5) Create a primary index on the created bucket (using the query workbench)
6) Run the “select * from bucket” query
7) Inspect the results to determine if there are any which are binary data
Please see the attached forum issue for more information, this is a summary of the steps I've used to reproduce this issue.
Observations
1) If I backup the cluster and examine the documents, I see that they are valid JSON (and have the JSON data type bit set)
2) If I pretty-print the 'testList.json' file using 'jq' and redirect it into a new file then import that file, the query will correctly return the document in JSON format instead of binary.
3) I have tested back to 6.6.0 but prior versions may be effected.
4) As pointed out by Donald Haggart, this may being caused by an unexpected Windows style carriage return which is not being ignored correctly by the query JSON validator.
Importing testList.json |
[
|
{
|
"default": {
|
"age": "39",
|
"name": "a"
|
}
|
},
|
{
|
"default": "<binary (28 b)>"
|
}
|
]
|
Importing jq formatted testList.json |
[
|
{
|
"default": {
|
"age": "39",
|
"name": "a"
|
}
|
},
|
{
|
"default": {
|
"age": "22",
|
"name": "c"
|
}
|
}
|
]
|
Imported document which query reports as binary data |
$ cbc cat xe -u Administrator -P password -U couchbase://192.168.2.22/test|xxd
|
xe CAS=0x1664d3b005930000, Flags=0x0, Size=31, Datatype=0x01(JSON)
|
00000000: 0d0a 2020 7b22 6e61 6d65 223a 2022 7865 .. {"name": "xe
|
00000010: 222c 2022 6167 6522 3a20 2234 3422 7d ", "age": "44"}
|
Attachments
Issue Links
- is triggered by
-
MB-44424 cbimport json list should correctly remove whitespace from the beginning of document values
- Closed
- links to
For Gerrit Dashboard: MB-44423 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
146472,3 | MB-44423 Add LF to whitespace identification | master | query | Status: MERGED | +2 | +1 |