Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-2108

Fluent bit improvements

    XMLWordPrintable

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Done
    • None
    • 2.3.0
    • logging, operator
    • None
    • 1

    Description

      Various improvements found that may be useful during testing:

      1. Include cluster name when enriching the log data
      2. Reduce the tail refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
      3. Provide full integration tests with CI
      4. Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
      5. Look to see if we can provide counters for various errors and/or prometheus metrics (optionally): coming soon in FB but also see https://github.com/neiman-marcus/fluent-bit-out-prometheus-metrics 
      6. Add Docker-compose stack as an example for local usage.
      7. Reduce cyclometric complexity and refactor watcher to simplify, standardise logging as per operator too.
      8. Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.
      9. Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke 
      10. Rewrite the 4 letter level names for java logs. Good example of sorting case out as well: https://github.com/sassoftware/viya4-monitoring-kubernetes/blob/eaaf0498f835cbabbcf9f55715ddeafae2d68ca5/logging/fb/fluent-bit_config.configmap_open.yaml#L731 
      11. Ensure we test mount path changes, i.e. that we pick up the config from there/mount it in and can watch for changes: K8S-2324

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            It would be nice to be able to move certain bits of the user documentation to the couchbase-fluent-bit repo. This would Couchbase-specific documentation (i.e., not directed at the open source audience) that would be pulled into the Operator docs. It will likely be mostly reference-type documentation, such as compatibility tables, marked-up snippets of config files, etc.

            Our docs site generator — Antora — supports using tags instead of branches for version control of the docs. So if we want to move some doc content to the fluent bit repo, then we need to make sure that the docs are considered when tagging a release. We can control how the semver appears as in the docs, but we need to make sure that docs updates go in before the release is tagged, since there will eventually be releases that are incompatible with older/newer versions of the Operator, and we need to be able to pull relevant docs into each version of the Operator docs.

            eric.schneider Eric Schneider (Inactive) added a comment - It would be nice to be able to move certain bits of the user documentation to the couchbase-fluent-bit repo. This would Couchbase-specific documentation (i.e., not directed at the open source audience) that would be pulled into the Operator docs. It will likely be mostly reference-type documentation, such as compatibility tables, marked-up snippets of config files, etc. Our docs site generator — Antora — supports using tags instead of branches for version control of the docs. So if we want to move some doc content to the fluent bit repo, then we need to make sure that the docs are considered when tagging a release. We can control how the semver appears as in the docs, but we need to make sure that docs updates go in before the release is tagged, since there will eventually be releases that are incompatible with older/newer versions of the Operator, and we need to be able to pull relevant docs into each version of the Operator docs.

            Be good to provide a parser configuration that works for the daemonset as well (or PromTail). This should be more straightforward as we're just capturing the JSON output from stdout.

            patrick.stephens Patrick Stephens (Inactive) added a comment - Be good to provide a parser configuration that works for the daemonset as well (or PromTail). This should be more straightforward as we're just capturing the JSON output from stdout.

            Possible option to mangle timestamps for out-of-order issues with Loki: https://github.com/fluent/fluent-bit/issues/2015#issuecomment-627547843

            patrick.stephens Patrick Stephens (Inactive) added a comment - Possible option to mangle timestamps for out-of-order issues with Loki:  https://github.com/fluent/fluent-bit/issues/2015#issuecomment-627547843

            Fluent bit 1.8.2 now drops retrying a chunk as soon as it gets an out-of-order response from Loki.

            Hopefully this will be resolved at some point on the Loki side: https://github.com/grafana/loki/issues/1544 

            patrick.stephens Patrick Stephens (Inactive) added a comment - Fluent bit 1.8.2 now drops retrying a chunk as soon as it gets an out-of-order response from Loki. Hopefully this will be resolved at some point on the Loki side: https://github.com/grafana/loki/issues/1544  

            People

              roo.thorp Roo Thorp
              patrick.stephens Patrick Stephens (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty