Details
-
Improvement
-
Resolution: Unresolved
-
Major
-
2.5.1
-
Security Level: Public
-
None
Description
Background
With the APPEND/PREPEND opcodes, and with Sub-Document API in 4.5, it is possible for clients to read/update only part of a document in a more efficient manner. Consider a client wishing to perform an update which is small compared to the current document size - for example, adding a 100B field to an 10KB JSON document.
Prior to Subdoc, a client would have to:
- GET() the existing value (10KB server -> client).
- Modify the value locally to add the 100B field.
- SET() a complete, new value (10.1KB client -> server).
With subdoc however, this can be achieved much more efficiently - the client just sends:
- SUBDOC_DICT_ADD() specifying the path to add and the new path value (100B server -> client).
Subdoc has two advantages - first we reduce from 2 round-trips to one; and secondly we transmit significantly less data - 100B compared to 20.1KB (including both directions). This reduces both operation latency increases network throughput.
(The same is true of APPEND/PREPEND - the client sends only the fragment of data they wish to append/prepend to the existing value, and the server handles adding that value to the document).
However, Subdoc and Append/Prepend are currently (5.5) implemented in KV-Engine in the front-end - to actually execute the SUBDOC_DICT_ADD example above, KV-Engine still manipulates whole documents:
- Frontend thread fetches existing (10KB) value from ep-engine (which if non-resident means reading 10KB from disk).
- Frontend thread modifies the value by adding in the user's new field.
- Frontend thread stores a complete, new value (10.1KB) into ep-engine (which will require asynchronously writing 10.1KB to disk, and replicating the complete 10.1KB value to any DCP consumers).
As such, even though we have significantly reduced the client -> server cost of this operation, we haven't improved things further in the stack - in the storage layer, and to DCP consumers.
Request
It would significantly improve performance (in a number of metrics) if we "pushed down" sub-document operations to lower in the stack.
There's a few options on how "deep" we go, but for example if we pushed the DICT_ADD example above down to the storage engine, we could reduce the cost for an update to a non-resident item from: read 10KB, write 10KB to write 100B. This assumes a storage engine which supports such operations (for example RocksDB supports this via the Merge Operator
We could also extend this to DCP - only send over the part which has changed, instead of the complete 10.1KB.