Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60961

XDCR - Decrease HLV size via CAS delta computation

Details

    Description

      Current HLV Scenario
      The centralized HLV definition is here: https://docs.google.com/document/d/1nskp5BvmH5vgjSjYX-eWNLygwlDNBipDsQdh0eOEjUw/edit#heading=h.gudkwthdx9bj

      An example of it is:

      "_vv": {
           “cvCAS”: "0x00007dacae059a16”,     
           "src": "5pRi8Piv1yLcLJ1iVNJIsA",     
           "ver": "0x00007dacae059a16",     
           "mv": {   	
              "NqiIe0LekFPLeX4JvTO6Iw": "0x00008cd6ac059a16",       
              "LhRPsa7CpjEvP5zeXTXEBA": "0x00008cd6ad159a16”,
              “Xefjbisodifion34asfkiorgjiA”: "0x00008cd5ad159a16”,          
           }     
           "pv": {       
              "YZvBpEaztom9z5V/hDoeIw": "0x000045d6ac059a16",     
           }   
      }
      

      The size calculation of HLV is https://docs.google.com/document/d/1S0WStgbqRGlmwbdNyHn_9qqEqvshPv9WyG8skauJDEc/edit#heading=h.7qibb1egmwzb
      (It may change, so copying here for future)

      Total space requirement 

      • if HLV only contains CV: 94+1=95 bytes
      • If HLV contains CV+PV: 95 + 6 + n * (25+21) = 101 + 46n
      • n: # of entries in PV
      • If HLV contains CV+PV+MV: 101 + 46n + 6 + m * (25+21) = 107 + 46n + 46m
      • m # of entries in MV
        The calculation is based on Couchbase server cluster ID and CAS size. Mobile ID and version size is unknown.

      Given it has CV and PV only, the formula is:

      Size of bytes of the Version Vector per document is: 103 + 46n
      n: # of entries in version vector (or number of clusters in replication topology)
      

      Issue
      For customers who have documents that are small in size but large in number of actors, it is possible that HLV can end up taking the majority of body size.
      Recently, XDCR has another project (MB-58989) that plans to use HLV. It is possible customers may complain about the HLV size, and thus the discussion.
      Also, given that HLV at this moment is not shipped yet, it is the ideal time to think about size optimization.
      One optimization that both Mobile and XDCR team talked about was potentially using “deltas” to calculate values. I couldn’t find the conversation but I think that fell off the radar.

      This ticket is created to revive that conversation and describe a rudimentary “compression” algorithm to save HLV space.

      Proposed Solution
      See https://docs.google.com/document/d/1Zhta5468hXS2XKU26bWAn7OJix0IkVEi8CiL6jojIAI/edit for proposed solution

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-60961
          # Subject Branch Project Status CR V

          Activity

            People

              sumukh.bhat Sumukh Bhat
              neil.huang Neil Huang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There is 1 open Gerrit change

                  PagerDuty