Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-60861

KV: Structured Logging

    XMLWordPrintable

Details

    • Task
    • Resolution: Unresolved
    • Major
    • Morpheus
    • None
    • memcached
    • None
    • 0

    Description

      Our current logging is unstructured free-form text. The logs often contain invaluable debugging information, but since that information is encoded in an unstructured format, it is difficult to extract by tooling (whether for ourselves or for support/observability) and difficult to enforce consistency.

      This is a proposal to change the way memcached logs are formatted, by keeping a separation between the log message and the contextual data, in a machine-readable way.

      A structured logging approach will allow us to:

      • Standardise specific KV messages to be used for real-time analytics (such as HELO and Slow messages).
        • There has been a request (MB-60263) to expose the list of connected clients for example. Connected clients are visible in the HELO logs, but these are not formally stable. A structured log can initially contain the same information as the current HELO message, but has the benefit that it can be extended over time with additional context without fearing breaking some regex somewhere.
      • Transform logs to other structured formats like JSON
        • jq (a command line tool for processing JSON) can query JSON similar to (awk/sed) and filter (similar to grep). A JSON log stream can then be queries directly, for example, "all logs for replication stream for specific vBucket", or "all logs for this bucket and eventing service, but discard other services". These are possible with grep/rg, but less straightforward.
      • Enhance existing logging without breaking tooling in unpredictable ways
        • Tooling should not be sensitive to an extra key-value pair in a structured log, but in an unstructured log where a regex is used, a simple change can break the regex.
        • An unstructured log message changing format has unpredictable side-effects. For example, an automated check for "rollback to 0" using a regex will quietly stop alerting in some or all cases, but in the case of a structured log, the tooling can alert with "missing key: purge_seqno" or similar when the format change has broken it.
      • We can easily add truly unique connection/request IDs to operations
        • Currently, requires appending/prepending text to every log line, but in a structured log, it is another key-value pair.

      Summary

      The goal is to have a format on-disk that is readily convertable to JSON, to allow for better tooling and real-time problem detection, as well as to allow KV to improve logs without fearing breaking that tooling.

      An extension of this can be to change the on-disk log format itself to JSON, but that is not strictly required.

      Format

      TBD

      For the goal of just achieving structured logging, we don't require the logs to be stored as JSON. We just need machine-readability. One simple format which is close to the current one and is machine readable is:

      <TIMESTAMP> <SEVERITY> <MESSAGE> {<JSON>} 

      We have the message context be a JSON object attached to the message. The message would contain fixed text with no data embedded in it. The JSON object part can be any custom format, including key=value, or JSON5 for readability, but I think the case for JSON is that everything can already read it.

      The above is also trivial to convert to pure JSON via a simple Python script, but preserves the familiar look of the logs. Additionally, once KV changes are done to support the above format, we can trivially convert the on-disk logs to pure JSON trivially, subject to approval and discussion.

      2024-01-31T15:15:02.441980+00:00 WARNING 552: (appdata) DCP (Producer) eq_dcpq:eventing:1949123:eventing:1949123 - (vb:742) Stream request requires rollback to seqno:0 because purge seqno (398805) is greater than start seqno - could miss purged deletions. Client requested seqnos:{198293,18446744073709551615} snapshot:{198293,198293} uuid:50966892929477 

      2024-01-31T15:15:02.441980+00:00 WARNING Stream request requires rollback {"fd": 552, "bucket": "appdata", "type": "Producer", "name": "eq_dcpq:eventing:1949123:eventing:1949123", "rollback_seqno": "0", "purge_seqno": "398805", "requested_seqnos": ["198293","18446744073709551615"], "snapshot": ["198293","198293"], "uuid": "50966892929477", "reason": "purge seqno is greater than start seqno - could miss purged deletions"}

      Approach

      We will need to update our logger formatter, from formatting a single string with values, to accepting a (fixed) message and key-value pairs. Once support for that is in-place, we can work on incrementally updating our logging code to use the key-value pairs.

      logger->warn(
              "({}) Stream request requires rollback to seqno:{} "
              "because {}. Client requested seqnos:{{{},{}}} "
              "snapshot:{{{},{}}} uuid:{}",
              vbucket,
              *rollback_seqno,
              needsRollback->rollbackReason,
              req.start_seqno,
              req.end_seqno,
              req.snap_start_seqno,
              req.snap_end_seqno,
              req.vbucket_uuid);
      

      logger->warn(
              "Stream request requires rollback",
              {"vb", vbucket},
              {"rollback_senqo", *rollback_seqno},
              {"reason", needsRollback->rollbackReason},
              {"seqnos", {req.start_seqno, req.end_seqno}},
              {"snapshot", {req.snap_start_seqno, req.snap_end_seqno}}
              {"uuid", req.vbucket_uuid});
      

      These can be formatted into any format we decide on, including as a JSON object or the format of "<TS> <SEV> <MSG> <JSON>" proposed above, which can then be piped into a script to convert into JSON and query or ingest into tooling.

      Attachments

        For Gerrit Dashboard: MB-60861
        # Subject Branch Project Status CR V

        Activity

          People

            vesko.karaganev Vesko Karaganev
            vesko.karaganev Vesko Karaganev
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              PagerDuty