Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-2108

Fluent bit improvements

    XMLWordPrintable

Details

    • Improvement
    • Status: Closed
    • Minor
    • Resolution: Done
    • None
    • 2.3.0
    • logging, operator
    • None
    • 1

    Description

      Various improvements found that may be useful during testing:

      1. Include cluster name when enriching the log data
      2. Reduce the tail refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
      3. Provide full integration tests with CI
      4. Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
      5. Look to see if we can provide counters for various errors and/or prometheus metrics (optionally): coming soon in FB but also see https://github.com/neiman-marcus/fluent-bit-out-prometheus-metrics 
      6. Add Docker-compose stack as an example for local usage.
      7. Reduce cyclometric complexity and refactor watcher to simplify, standardise logging as per operator too.
      8. Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.
      9. Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke 
      10. Rewrite the 4 letter level names for java logs. Good example of sorting case out as well: https://github.com/sassoftware/viya4-monitoring-kubernetes/blob/eaaf0498f835cbabbcf9f55715ddeafae2d68ca5/logging/fb/fluent-bit_config.configmap_open.yaml#L731 
      11. Ensure we test mount path changes, i.e. that we pick up the config from there/mount it in and can watch for changes: K8S-2324

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            patrick.stephens Patrick Stephens (Inactive) created issue -
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Field Original Value New Value
            Fix Version/s not-targeted [ 16613 ]
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
            1. Include cluster name when enriching the log data
            2. Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
            3. Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
            Various improvements found that may be useful during testing:
            1. Include cluster name when enriching the log data
            2. Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
            3. Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
            4. Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config.
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
            1. Include cluster name when enriching the log data
            2. Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
            3. Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
            4. Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config.
            Various improvements found that may be useful during testing:
            1. Include cluster name when enriching the log data
            2. Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
            3. Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
            4. Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config.
            5. Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
            1. Include cluster name when enriching the log data
            2. Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
            3. Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
            4. Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config.
            5. Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
            Various improvements found that may be useful during testing:
            1. Include cluster name when enriching the log data
            2. Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
            3. Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
            4. Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config.
            5. Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
            6. Add Docker-compose stack as an example for local usage.
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
            1. Include cluster name when enriching the log data
            2. Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
            3. Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
            4. Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config.
            5. Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
            6. Add Docker-compose stack as an example for local usage.
            Various improvements found that may be useful during testing:
            1. Include cluster name when enriching the log data
            2. Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
            3. Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
            4. Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config.
            5. Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
            6. Add Docker-compose stack as an example for local usage.
            7. Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
            1. Include cluster name when enriching the log data
            2. Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
            3. Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
            4. Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config.
            5. Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
            6. Add Docker-compose stack as an example for local usage.
            7. Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
            Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config.
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # Add Docker-compose stack as an example for local usage.
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config.
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # Add Docker-compose stack as an example for local usage.
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
            Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # Add Docker-compose stack as an example for local usage.
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Link This issue relates to K8S-2112 [ K8S-2112 ]
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # Add Docker-compose stack as an example for local usage.
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
            Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # Add Docker-compose stack as an example for local usage.
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # Reduce cyclometric complexity and refactor watcher to simplify - standardise logging as per operator too.
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # Add Docker-compose stack as an example for local usage.
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # Reduce cyclometric complexity and refactor watcher to simplify - standardise logging as per operator too.
            Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # Add Docker-compose stack as an example for local usage.
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # Reduce cyclometric complexity and refactor watcher to simplify - standardise logging as per operator too.
             # Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # Add Docker-compose stack as an example for local usage.
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # Reduce cyclometric complexity and refactor watcher to simplify - standardise logging as per operator too.
             # Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.
            Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # Add Docker-compose stack as an example for local usage.
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify- - standardise logging as per operator too.
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # Add Docker-compose stack as an example for local usage.
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify- - standardise logging as per operator too.
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
            Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify- -- standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify- -- standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
            Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify- – standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input.

            It would be nice to be able to move certain bits of the user documentation to the couchbase-fluent-bit repo. This would Couchbase-specific documentation (i.e., not directed at the open source audience) that would be pulled into the Operator docs. It will likely be mostly reference-type documentation, such as compatibility tables, marked-up snippets of config files, etc.

            Our docs site generator — Antora — supports using tags instead of branches for version control of the docs. So if we want to move some doc content to the fluent bit repo, then we need to make sure that the docs are considered when tagging a release. We can control how the semver appears as in the docs, but we need to make sure that docs updates go in before the release is tagged, since there will eventually be releases that are incompatible with older/newer versions of the Operator, and we need to be able to pull relevant docs into each version of the Operator docs.

            eric.schneider Eric Schneider (Inactive) added a comment - It would be nice to be able to move certain bits of the user documentation to the couchbase-fluent-bit repo. This would Couchbase-specific documentation (i.e., not directed at the open source audience) that would be pulled into the Operator docs. It will likely be mostly reference-type documentation, such as compatibility tables, marked-up snippets of config files, etc. Our docs site generator — Antora — supports using tags instead of branches for version control of the docs. So if we want to move some doc content to the fluent bit repo, then we need to make sure that the docs are considered when tagging a release. We can control how the semver appears as in the docs, but we need to make sure that docs updates go in before the release is tagged, since there will eventually be releases that are incompatible with older/newer versions of the Operator, and we need to be able to pull relevant docs into each version of the Operator docs.
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify- – standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input.
            Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify- – standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: [https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke
             # Rewrite the 4 letter level names for java logs
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify- – standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: [https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke
             # Rewrite the 4 letter level names for java logs
            Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify- – standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: [https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke
             # Rewrite the 4 letter level names for java logs.
             # Ensure we test mount path changes, i.e. that we pick up the config from there/mount it in and can watch for changes.
            simon.murray Simon Murray made changes -
            Rank Ranked higher
            simon.murray Simon Murray made changes -
            Rank Ranked higher
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.
             # Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally)
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify- – standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: [https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke
             # Rewrite the 4 letter level names for java logs.
             # Ensure we test mount path changes, i.e. that we pick up the config from there/mount it in and can watch for changes.
            Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # -Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.-
             # -Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018-
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally): coming soon in FB but also see [https://github.com/neiman-marcus/fluent-bit-out-prometheus-metrics
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify- – standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: [https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke
             # Rewrite the 4 letter level names for java logs. Good example of sorting case out as well: [https://github.com/sassoftware/viya4-monitoring-kubernetes/blob/eaaf0498f835cbabbcf9f55715ddeafae2d68ca5/logging/fb/fluent-bit_config.configmap_open.yaml#L731
             # Ensure we test mount path changes, i.e. that we pick up the config from there/mount it in and can watch for changes.
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Link This issue relates to K8S-2147 [ K8S-2147 ]
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # -Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.-
             # -Provide full integration tests with CI - already present locally but unable to run until https://issues.couchbase.com/browse/CBD-4018-
             # Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally): coming soon in FB but also see [https://github.com/neiman-marcus/fluent-bit-out-prometheus-metrics
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify- – standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: [https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke
             # Rewrite the 4 letter level names for java logs. Good example of sorting case out as well: [https://github.com/sassoftware/viya4-monitoring-kubernetes/blob/eaaf0498f835cbabbcf9f55715ddeafae2d68ca5/logging/fb/fluent-bit_config.configmap_open.yaml#L731
             # Ensure we test mount path changes, i.e. that we pick up the config from there/mount it in and can watch for changes.
            Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # -Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.-
             # -Provide full integration tests with CI-
             # -Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112-
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally): coming soon in FB but also see [https://github.com/neiman-marcus/fluent-bit-out-prometheus-metrics
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify, standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: [https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke
             # -Rewrite the 4 letter level names for java logs. Good example of sorting case out as well: [https://github.com/sassoftware/viya4-monitoring-kubernetes/blob/eaaf0498f835cbabbcf9f55715ddeafae2d68ca5/logging/fb/fluent-bit_config.configmap_open.yaml#L731]- 
             # Ensure we test mount path changes, i.e. that we pick up the config from there/mount it in and can watch for changes.

            Be good to provide a parser configuration that works for the daemonset as well (or PromTail). This should be more straightforward as we're just capturing the JSON output from stdout.

            patrick.stephens Patrick Stephens (Inactive) added a comment - Be good to provide a parser configuration that works for the daemonset as well (or PromTail). This should be more straightforward as we're just capturing the JSON output from stdout.
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
             # Include cluster name when enriching the log data
             # Documentation on fd limits, etc. for ifsnotify and the like
             # -Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.-
             # -Provide full integration tests with CI-
             # -Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112-
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally): coming soon in FB but also see [https://github.com/neiman-marcus/fluent-bit-out-prometheus-metrics
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify, standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: [https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke
             # -Rewrite the 4 letter level names for java logs. Good example of sorting case out as well: [https://github.com/sassoftware/viya4-monitoring-kubernetes/blob/eaaf0498f835cbabbcf9f55715ddeafae2d68ca5/logging/fb/fluent-bit_config.configmap_open.yaml#L731]- 
             # Ensure we test mount path changes, i.e. that we pick up the config from there/mount it in and can watch for changes.
            Various improvements found that may be useful during testing:
             # -Include cluster name when enriching the log data-
             # Documentation on fd limits, etc. for ifsnotify and the like
             # -Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.-
             # -Provide full integration tests with CI-
             # -Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112-
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally): coming soon in FB but also see [https://github.com/neiman-marcus/fluent-bit-out-prometheus-metrics
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify, standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: [https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke
             # -Rewrite the 4 letter level names for java logs. Good example of sorting case out as well: [https://github.com/sassoftware/viya4-monitoring-kubernetes/blob/eaaf0498f835cbabbcf9f55715ddeafae2d68ca5/logging/fb/fluent-bit_config.configmap_open.yaml#L731]- 
             # Ensure we test mount path changes, i.e. that we pick up the config from there/mount it in and can watch for changes.
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
             # -Include cluster name when enriching the log data-
             # Documentation on fd limits, etc. for ifsnotify and the like
             # -Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.-
             # -Provide full integration tests with CI-
             # -Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112-
             # Look to see if we can provide counters for various errors and/or prometheus metrics (optionally): coming soon in FB but also see [https://github.com/neiman-marcus/fluent-bit-out-prometheus-metrics
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify, standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: [https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke
             # -Rewrite the 4 letter level names for java logs. Good example of sorting case out as well: [https://github.com/sassoftware/viya4-monitoring-kubernetes/blob/eaaf0498f835cbabbcf9f55715ddeafae2d68ca5/logging/fb/fluent-bit_config.configmap_open.yaml#L731]- 
             # Ensure we test mount path changes, i.e. that we pick up the config from there/mount it in and can watch for changes.
            Various improvements found that may be useful during testing:
             # -Include cluster name when enriching the log data-
             # Documentation on fd limits, etc. for ifsnotify and the like
             # -Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.-
             # -Provide full integration tests with CI-
             # -Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112-
             # -Look to see if we can provide counters for various errors and/or prometheus metrics (optionally): coming soon in FB but also see [https://github.com/neiman-marcus/fluent-bit-out-prometheus-metrics]- 
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify, standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: [https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke
             # -Rewrite the 4 letter level names for java logs. Good example of sorting case out as well: [https://github.com/sassoftware/viya4-monitoring-kubernetes/blob/eaaf0498f835cbabbcf9f55715ddeafae2d68ca5/logging/fb/fluent-bit_config.configmap_open.yaml#L731]- 
             # Ensure we test mount path changes, i.e. that we pick up the config from there/mount it in and can watch for changes.
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Component/s logging [ 16330 ]
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Link This issue relates to K8S-2171 [ K8S-2171 ]
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Link This issue relates to K8S-2172 [ K8S-2172 ]

            Possible option to mangle timestamps for out-of-order issues with Loki: https://github.com/fluent/fluent-bit/issues/2015#issuecomment-627547843

            patrick.stephens Patrick Stephens (Inactive) added a comment - Possible option to mangle timestamps for out-of-order issues with Loki:  https://github.com/fluent/fluent-bit/issues/2015#issuecomment-627547843
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Assignee Patrick Stephens [ JIRAUSER25332 ] Roo Thorp [ JIRAUSER25108 ]
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Fix Version/s 2.3.0 [ 17600 ]
            Fix Version/s not-targeted [ 16613 ]

            Fluent bit 1.8.2 now drops retrying a chunk as soon as it gets an out-of-order response from Loki.

            Hopefully this will be resolved at some point on the Loki side: https://github.com/grafana/loki/issues/1544 

            patrick.stephens Patrick Stephens (Inactive) added a comment - Fluent bit 1.8.2 now drops retrying a chunk as soon as it gets an out-of-order response from Loki. Hopefully this will be resolved at some point on the Loki side: https://github.com/grafana/loki/issues/1544  
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
             # -Include cluster name when enriching the log data-
             # Documentation on fd limits, etc. for ifsnotify and the like
             # -Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.-
             # -Provide full integration tests with CI-
             # -Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112-
             # -Look to see if we can provide counters for various errors and/or prometheus metrics (optionally): coming soon in FB but also see [https://github.com/neiman-marcus/fluent-bit-out-prometheus-metrics]- 
             # -Add Docker-compose stack as an example for local usage.-
             # Use bats-test possibly to provide a framework to run various tests better than the current single script (and output in different formats, etc.)
             # -Reduce cyclometric complexity and refactor watcher to simplify, standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: [https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke
             # -Rewrite the 4 letter level names for java logs. Good example of sorting case out as well: [https://github.com/sassoftware/viya4-monitoring-kubernetes/blob/eaaf0498f835cbabbcf9f55715ddeafae2d68ca5/logging/fb/fluent-bit_config.configmap_open.yaml#L731]- 
             # Ensure we test mount path changes, i.e. that we pick up the config from there/mount it in and can watch for changes.
            Various improvements found that may be useful during testing:
             # -Include cluster name when enriching the log data-
             # -Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.-
             # -Provide full integration tests with CI-
             # -Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112-
             # -Look to see if we can provide counters for various errors and/or prometheus metrics (optionally): coming soon in FB but also see [https://github.com/neiman-marcus/fluent-bit-out-prometheus-metrics]- 
             # -Add Docker-compose stack as an example for local usage.-
             # -Reduce cyclometric complexity and refactor watcher to simplify, standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # -Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: [https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke]- 
             # -Rewrite the 4 letter level names for java logs. Good example of sorting case out as well: [https://github.com/sassoftware/viya4-monitoring-kubernetes/blob/eaaf0498f835cbabbcf9f55715ddeafae2d68ca5/logging/fb/fluent-bit_config.configmap_open.yaml#L731]- 
             # Ensure we test mount path changes, i.e. that we pick up the config from there/mount it in and can watch for changes.
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Link This issue relates to K8S-2324 [ K8S-2324 ]
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Resolution Done [ 6 ]
            Status In Progress [ 3 ] Resolved [ 5 ]
            patrick.stephens Patrick Stephens (Inactive) made changes -
            Description Various improvements found that may be useful during testing:
             # -Include cluster name when enriching the log data-
             # -Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.-
             # -Provide full integration tests with CI-
             # -Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112-
             # -Look to see if we can provide counters for various errors and/or prometheus metrics (optionally): coming soon in FB but also see [https://github.com/neiman-marcus/fluent-bit-out-prometheus-metrics]- 
             # -Add Docker-compose stack as an example for local usage.-
             # -Reduce cyclometric complexity and refactor watcher to simplify, standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # -Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: [https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke]- 
             # -Rewrite the 4 letter level names for java logs. Good example of sorting case out as well: [https://github.com/sassoftware/viya4-monitoring-kubernetes/blob/eaaf0498f835cbabbcf9f55715ddeafae2d68ca5/logging/fb/fluent-bit_config.configmap_open.yaml#L731]- 
             # Ensure we test mount path changes, i.e. that we pick up the config from there/mount it in and can watch for changes.
            Various improvements found that may be useful during testing:
             # -Include cluster name when enriching the log data-
             # -Reduce the _tail_ refresh interval (currently 60 seconds) so it picks up logs sooner - the container starts quickly but if the log directory or rebalance is not present then can take a while once server starts so we may lose logs on a quick failure.-
             # -Provide full integration tests with CI-
             # -Default Loki output - need to make sure no impact on customer usage, ideally a simple method to enable during testing but with a managed config. Relates to K8S-2112-
             # -Look to see if we can provide counters for various errors and/or prometheus metrics (optionally): coming soon in FB but also see [https://github.com/neiman-marcus/fluent-bit-out-prometheus-metrics]- 
             # -Add Docker-compose stack as an example for local usage.-
             # -Reduce cyclometric complexity and refactor watcher to simplify, standardise logging as per operator too.-
             # -Add unit tests for watcher functionality - all covered by integration tests currently so shift left if possible.-
             # -Document GKE set up - issues with Autopilot and Promtail. Stalling of loki input. Working now it seems without PV: [https://github.com/patrick-stephens/couchbase-gitops/tree/main/gke]- 
             # -Rewrite the 4 letter level names for java logs. Good example of sorting case out as well: [https://github.com/sassoftware/viya4-monitoring-kubernetes/blob/eaaf0498f835cbabbcf9f55715ddeafae2d68ca5/logging/fb/fluent-bit_config.configmap_open.yaml#L731]- 
             # Ensure we test mount path changes, i.e. that we pick up the config from there/mount it in and can watch for changes: K8S-2324
            roo.thorp Roo Thorp made changes -
            Status Resolved [ 5 ] Closed [ 6 ]

            People

              roo.thorp Roo Thorp
              patrick.stephens Patrick Stephens (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty