Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Won't Fix
    • Affects Version/s: 2.0-developer-preview-4
    • Fix Version/s: 2.0-beta
    • Component/s: storage-engine
    • Security Level: Public
    • Labels:
      None

      Description

      The JSON encoder/decoder used in Erlang (inside couchdb component), ejson, is currently broken. It encodes invalid UTF-8 strings when it should throw/raise/return an error, on the other hand it's unable to decode the invalid UTF-8 encoded strings (which is correct). Mochijson2, another Erlang-based JSON encoder/decoder, has the correct behaviour on raising an error if a string contains invalid UTF-8. See below example in an Erlang shell:

      1> Result = ejson:encode(<<255>>).
      <<"\"ÿ\"">>
      2> ejson:decode(Result).

        • exception throw: {invalid_json,{{error,{2,
          "lexical error: invalid bytes in UTF8 string.\n"}},
          <<"\"ÿ\"">>}}
          in function ejson:nif_decode/1 (ejson.erl, line 57)
          in call from ejson:decode/1 (ejson.erl, line 38)
          3>
          3> mochijson2:encode(<<255>>).
        • exception exit: {ucs,{bad_utf8_character_code}}
          in function xmerl_ucs:from_utf8/1 (xmerl_ucs.erl, line 185)
          in call from mochijson2:json_encode_string/2 (mochijson2.erl, line 186)

      In Python, encoding a string with byte 255 (invalid UTF-8) is also not allowed (which is correct):

      $ python
      Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05)
      [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import json
      >>> json.dumps("\xff")
      Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/_init_.py", line 231, in dumps
      return _default_encoder.encode(obj)
      File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py", line 195, in encode
      return encode_basestring_ascii(o)
      UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte
      >>>

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        FilipeManana Filipe Manana (Inactive) added a comment -

        At the very least, there's the inconsistency of being able to encode such string but being unable to decode the produced string.
        This can break view-engine, cross data center replicaton, etc.

        Show
        FilipeManana Filipe Manana (Inactive) added a comment - At the very least, there's the inconsistency of being able to encode such string but being unable to decode the produced string. This can break view-engine, cross data center replicaton, etc.
        Hide
        nessence Alex Leverington added a comment -

        I think this is related to http://www.couchbase.com/issues/browse/MB-6138 which has additional client code to reproduce

        Show
        nessence Alex Leverington added a comment - I think this is related to http://www.couchbase.com/issues/browse/MB-6138 which has additional client code to reproduce
        Hide
        damien damien added a comment -

        We will be dropping non-utf8 keys per CBD-453

        Show
        damien damien added a comment - We will be dropping non-utf8 keys per CBD-453

          People

          • Assignee:
            damien damien
            Reporter:
            FilipeManana Filipe Manana (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes