Details
-
Bug
-
Resolution: Won't Fix
-
Blocker
-
2.0-developer-preview-4
-
Security Level: Public
-
None
Description
The JSON encoder/decoder used in Erlang (inside couchdb component), ejson, is currently broken. It encodes invalid UTF-8 strings when it should throw/raise/return an error, on the other hand it's unable to decode the invalid UTF-8 encoded strings (which is correct). Mochijson2, another Erlang-based JSON encoder/decoder, has the correct behaviour on raising an error if a string contains invalid UTF-8. See below example in an Erlang shell:
1> Result = ejson:encode(<<255>>).
<<"\"ÿ\"">>
2> ejson:decode(Result).
-
- exception throw: {invalid_json,{{error,{2,
"lexical error: invalid bytes in UTF8 string.\n"}},
<<"\"ÿ\"">>}}
in function ejson:nif_decode/1 (ejson.erl, line 57)
in call from ejson:decode/1 (ejson.erl, line 38)
3>
3> mochijson2:encode(<<255>>). - exception exit: {ucs,{bad_utf8_character_code}}
in function xmerl_ucs:from_utf8/1 (xmerl_ucs.erl, line 185)
in call from mochijson2:json_encode_string/2 (mochijson2.erl, line 186)
- exception throw: {invalid_json,{{error,{2,
In Python, encoding a string with byte 255 (invalid UTF-8) is also not allowed (which is correct):
$ python
Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> json.dumps("\xff")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/_init_.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 195, in encode
return encode_basestring_ascii(o)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte
>>>