Uploaded image for project: 'Couchbase Mobile'
  1. Couchbase Mobile
  2. CM-169

Support Request : End to end replication tracing

    XMLWordPrintable

Details

    Description

      Background

      Currently (ca. Iridium), we've got a really nice way of tracing BLIP replications in Sync Gateway with the use of contexts in logging, e.g.:

      $ kubectl logs sync-gateway-import-7b57bcb47c-4mtmk | grep 76c87c91
       
      2019-05-28T15:59:52.122Z [INF] HTTP+: #009:     --> 101 [76c87c91] Upgraded to BLIP+WebSocket protocol. User:accounts.google.com_1.  (0.0 ms)
      2019-05-28T15:59:52.122Z [INF] WS: c:[76c87c91] Start BLIP/Websocket handler
      2019-05-28T15:59:52.140Z [INF] SyncMsg: c:[76c87c91] #1: Type:getCheckpoint Client:cp-GYqFozjdo+MV/RS/WAULxl8XGpw= User:accounts.google.com_1
      2019-05-28T15:59:52.155Z [INF] SyncMsg: c:[76c87c91] #2: Type:subChanges Since:391334 DocIDs:[foo bar]  User:accounts.google.com_1
      2019-05-28T15:59:52.155Z [INF] Sync: c:[76c87c91] Sending changes since 391334. User:accounts.google.com_1
      2019-05-28T15:59:52.346Z [INF] SyncMsg: c:[76c87c91] #3: Type:proposeChanges #Changes: 0 User:accounts.google.com_1
      2019-05-28T15:59:52.390Z [INF] SyncMsg: c:[76c87c91] #4: Type:setCheckpoint Client:cp-GYqFozjdo+MV/RS/WAULxl8XGpw= Rev:0-8  User:accounts.google.com_1
      2019-05-28T15:59:52.390Z [INF] Sync: c:[76c87c91] Sent 2 changes to client, from seq 393945.  User:accounts.google.com_1
      2019-05-28T15:59:52.390Z [INF] Sync: c:[76c87c91] Sent all changes to client. User:accounts.google.com_1
      2019-05-28T15:59:52.487Z [INF] SyncMsg: c:[76c87c91] #5: Type:setCheckpoint Client:cp-GYqFozjdo+MV/RS/WAULxl8XGpw= Rev:0-9  User:accounts.google.com_1
      2019-05-28T15:59:52.501Z [INF] HTTP: c:[76c87c91] #009:    --> BLIP+WebSocket connection closed
      

      Obviously, we expect that SG will generally be participating in more replications than an individual CBL instance, so the ability to differentiate multiple replications on the CBL side is less complicated.

      Problem

      We do also have a reasonable ability to differentiate concurrent replications on the Couchbase Lite side, however there are scenarios that aren't covered here.

      Take the use case of multiple corporate devices connecting to SG with the same username to pull largely the same data (e.g. in-store POS/catalogue use case). Whether or not having a shared username is the preferred approach here, it's definitely a pattern we see. In this case, the best option we have to match the logs on either side is to look for the checkpoint ID that CBL logs, and find the matching context on the SG side. This does feel a little hacky, but generally works; however in cases where it's the first replication CBL won't ask SG for the checkpoint at all, so we can only match by time (and pray that the clocks are in sync to the millisecond.

      Proposal

      At a high level, I'd propose that there's a shared ID/key that is passed as part of the initial connection which then gets logged on both sides, so we have a clean reference even if the username is the same etc.

      Ideally, it might be nice to share the context used on the SG side to avoid adding further fields etc to be tracked. I suspect that it wouldn't be too difficult to do this with BLIP, whether it was an automatic part of the existing handshake (i.e. an extra metadata field that older clients could ignore) or an additional getContext type command that newer CBL clients could call as part of their initial connection.

      Going even further, if we could match SG's logging style in CBL (simply because SG got there first ), it would be really nice to have the context logged for every relevant line on the CBL side.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            priya.rajagopal Priya Rajagopal
            James Flather James Flather (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty