Uploaded image for project: 'Couchbase Python Client Library'
  1. Couchbase Python Client Library
  2. PYCBC-153

Appending strings results in ValueFormatError on get

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 1.0.0
    • Fix Version/s: 1.1.0
    • Component/s: library
    • Security Level: Public
    • Labels:
      None
    • Environment:
      Couchbase 2.0.1 Community Edition, couchbase Python client 1.0.0 (not beta)

      Description

      When storing strings using the `set` command and using `append` to append another string to it results in a ValueFormatError. It appears that string values are actually converted to JSON instead of just being stored as strings. The result being that the appended string is appended after quotation marks. Adding "baz" to "bar" results in "bar"baz in CB and is now unserializable.

      I understand why the error is thrown, and I would expect it from JSON serialized objects, but not from standard strings. Other memcached libraries, and the previous version of the python CB client, worked appropriately. Of course, the code workaround is to use format=couchbase.FMT_BYTES on the set command, but this shouldn't be necessary when flags are already used when storing an object to determine the format. If it is a python string, then mark it as such in the flags, but adding additional escaping breaks other commands that should "just work", especially when wanting to make switching to CB easier.

      Example:
      Python 2.7.3 (default, Aug 1 2012, 05:14:39)
      [GCC 4.6.3] on linux2
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import couchbase
      >>> c = couchbase.Couchbase.connect(bucket='default', host='192.168.56.70')
      >>> c.set("foo", "bar")
      OperationResult<RC=0x0, Key=foo, CAS=0x8bac42ea89010000>
      >>> c.append("foo", "baz")
      OperationResult<RC=0x0, Key=foo, CAS=0x3ae494c68b010000>
      >>> c.get("foo")
      Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/opt/[redacted]/usr/local/lib/python2.7/site-packages/couchbase/connection.py", line 325, in get
      return _Base.get(self, key, ttl, quiet)
      File "/usr/lib/python2.7/json/_init_.py", line 326, in loads
      return _default_decoder.decode(s)
      File "/usr/lib/python2.7/json/decoder.py", line 369, in decode
      raise ValueError(errmsg("Extra data", s, end, len(s)))
      couchbase.exceptions.ValueFormatError: <Failed to decode bytes, Results=1, inner_cause=Extra data: line 1 column 5 - line 1 column 8
      (char 5 - 8), C Source=(src/convert.c,215), OBJ='"bar"baz'>
      >>>

        Issue Links

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          mnunberg Mark Nunberg added a comment -

          There is no auto-detection magic when storing serialized objects. If the format is JSON then the value will always be stored as JSON. If the format is Pickle then it will always be stored as pickle.

          The flags do not express the native python types, but rather the storage format of the value on the server.

          Of course there are conflicting semantics of "Just Work". When using views on strings for example, if they're just stored as plain strings, then they won't just work – so it's really a question of what preferences you have.

          In the future we may wish to add a pseudo FMT_AUTO to "do the right thing" depending on the type; i.e. if it's a dict/list/boolean/None; use JSON; if it's a string type (i.e. basestring) use FMT_UTF8; if it's a 'bytes' object, use FMT_BYTES; if it's something else, use FMT_PICKLE.

          I believe this is what you are hinting at.

          Show
          mnunberg Mark Nunberg added a comment - There is no auto-detection magic when storing serialized objects. If the format is JSON then the value will always be stored as JSON. If the format is Pickle then it will always be stored as pickle. The flags do not express the native python types, but rather the storage format of the value on the server. Of course there are conflicting semantics of "Just Work". When using views on strings for example, if they're just stored as plain strings, then they won't just work – so it's really a question of what preferences you have. In the future we may wish to add a pseudo FMT_AUTO to "do the right thing" depending on the type; i.e. if it's a dict/list/boolean/None; use JSON; if it's a string type (i.e. basestring) use FMT_UTF8; if it's a 'bytes' object, use FMT_BYTES; if it's something else, use FMT_PICKLE. I believe this is what you are hinting at.
          Hide
          leonexis Leo Tindle added a comment -

          FMT_AUTO might be a good option. My primary concern was trying to figure out why append wasn't working when I was porting some code to use the new CB client and expected it to work like the MC clients (like the old CB client did). Perhaps a better explanation in the append/prepend documentation highlighting the fact that, by default and unlike memcached clients, strings are automatically stored as FMT_JSON as well and must be explicitly stored with FMT_UTF8 (or FMT_BYTES) in order for data changes using append/prepend to be formatted correctly for retrieval. A similar message exists in the API documentation, but was not clear that strings are stored this way by default as well (differing from other MC/older CB client).

          As a side note, it would be nice if there was an option that could be specified in get() to get the raw data instead of having to change a flag for the entire connection then make sure nothing else uses it before switching back. This way, code that requires the raw output could receive it without affecting other code that needs the converted output that use the same connection object.

          But if this is "how it should be," then I guess this ticket can be closed.

          Show
          leonexis Leo Tindle added a comment - FMT_AUTO might be a good option. My primary concern was trying to figure out why append wasn't working when I was porting some code to use the new CB client and expected it to work like the MC clients (like the old CB client did). Perhaps a better explanation in the append/prepend documentation highlighting the fact that, by default and unlike memcached clients, strings are automatically stored as FMT_JSON as well and must be explicitly stored with FMT_UTF8 (or FMT_BYTES) in order for data changes using append/prepend to be formatted correctly for retrieval. A similar message exists in the API documentation, but was not clear that strings are stored this way by default as well (differing from other MC/older CB client). As a side note, it would be nice if there was an option that could be specified in get() to get the raw data instead of having to change a flag for the entire connection then make sure nothing else uses it before switching back. This way, code that requires the raw output could receive it without affecting other code that needs the converted output that use the same connection object. But if this is "how it should be," then I guess this ticket can be closed.
          Hide
          mnunberg Mark Nunberg added a comment -

          The behavior stated here isn't a bug per se, so there is nothing to fix. PYCBC-157 has the request for "FMT_AUTO"

          Show
          mnunberg Mark Nunberg added a comment - The behavior stated here isn't a bug per se, so there is nothing to fix. PYCBC-157 has the request for "FMT_AUTO"
          Hide
          mnunberg Mark Nunberg added a comment -

          PYCBC-158 filed as well

          Show
          mnunberg Mark Nunberg added a comment - PYCBC-158 filed as well

            People

            • Assignee:
              mnunberg Mark Nunberg
              Reporter:
              leonexis Leo Tindle
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes