Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5925

Broken ejson encoder

    XMLWordPrintable

Details

    • Bug
    • Resolution: Won't Fix
    • Blocker
    • 2.0-beta
    • 2.0-developer-preview-4
    • storage-engine
    • Security Level: Public
    • None

    Description

      The JSON encoder/decoder used in Erlang (inside couchdb component), ejson, is currently broken. It encodes invalid UTF-8 strings when it should throw/raise/return an error, on the other hand it's unable to decode the invalid UTF-8 encoded strings (which is correct). Mochijson2, another Erlang-based JSON encoder/decoder, has the correct behaviour on raising an error if a string contains invalid UTF-8. See below example in an Erlang shell:

      1> Result = ejson:encode(<<255>>).
      <<"\"ÿ\"">>
      2> ejson:decode(Result).

        • exception throw: {invalid_json,{{error,{2,
          "lexical error: invalid bytes in UTF8 string.\n"}},
          <<"\"ÿ\"">>}}
          in function ejson:nif_decode/1 (ejson.erl, line 57)
          in call from ejson:decode/1 (ejson.erl, line 38)
          3>
          3> mochijson2:encode(<<255>>).
        • exception exit: {ucs,{bad_utf8_character_code}}
          in function xmerl_ucs:from_utf8/1 (xmerl_ucs.erl, line 185)
          in call from mochijson2:json_encode_string/2 (mochijson2.erl, line 186)

      In Python, encoding a string with byte 255 (invalid UTF-8) is also not allowed (which is correct):

      $ python
      Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05)
      [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import json
      >>> json.dumps("\xff")
      Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/_init_.py", line 231, in dumps
      return _default_encoder.encode(obj)
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 195, in encode
      return encode_basestring_ascii(o)
      UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte
      >>>

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            damien damien (Inactive)
            FilipeManana Filipe Manana (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty