XMLWordPrintable

Details

    • Improvement
    • Resolution: Unresolved
    • Minor
    • not-targeted
    • None
    • cluster-monitor
    • None

    Description

      From KubeCon 2021 Intuit presentation: consider deploying fluentd as an aggregator for fluent bit forwarding - simple configuration of fluent bit then always/optionally to forward.

      The aggregator can be dynamically updated, scaled independently (e.g. to 0 as well), etc. as required. It makes configuration of back end for logging to be independent of the sidecars.

      There is the potential as well to reduce the effort done in the sidecar and just push this to the aggregator, particularly multiline and enrichment can be done in batches rather than per log line.

      Covered in my previous proposal.

      Now we have logs being shipped out, we need a good mechanism to collect and aggregate them to provide a simple system overview. This would also simplify support collection of logs from a single point.

      A quick summary of the 12-factor recommendation is in order here:
      In staging or production deploys, each process’ stream will be captured by the execution environment, collated together with all other streams from the app, and routed to one or more final destinations for viewing and long-term archival. These archival destinations are not visible to or configurable by the app, and instead are completely managed by the execution environment. Open-source log routers (such as Logplex and Fluentd) are available for this purpose.

      The event stream for an app can be routed to a file, or watched via realtime tail in a terminal. Most significantly, the stream can be sent to a log indexing and analysis system such as Splunk, or a general-purpose data warehousing system such as Hadoop/Hive. These systems allow for great power and flexibility for introspecting an app’s behavior over time

      We should aggregate the information from the server pods as well as the operator and DAC ones too at the very least. We also should extend this with kubernetes events information (e.g. pod scheduling, eviction, node failure, etc.) and anything else that might be useful. This will allow us to get a much fuller understanding of the system at any point in time.
       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              patrick.stephens Patrick Stephens (Inactive)
              patrick.stephens Patrick Stephens (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty