Uploaded image for project: 'Couchbase Kubernetes'
  1. Couchbase Kubernetes
  2. K8S-2023

Attempt to capture logs from failed nodes.

    XMLWordPrintable

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • not-targeted
    • operator, supportability
    • None
    • 1

    Description

      In order to make complete RCA determinations we require logs from the culprit - the node that encountered or triggered the issue. When the Operator behaves correctly it removes the culprit, replacing it with a clean, healthy node. In such circumstances the logs from removed node are unattainable which inhibits our ability to fully understand and explain the sequence of events.

      Is there a method (perhaps a feature enhancement or best practice configuration) that will allow end-users to capture these logs so we can perform full RCA in these circumstances.

      Clearly there are circumstances where the node was removed because it truly vanished - but there are other circumstances where a node may be considered unhealthy enough to remove but still able to perform a cbcollect.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Just to be clear, cbcollect != couchbase logs (apologies I didn't make that clear in the title). Lots of the debugging / RCA we do is based on the commands that are run at collection time (especially stats commands and system commands) rather than the logs. This is about being able to run `cbcollect` on a node that has been ejected but is still reachable - streaming our log files won't cut it.

          dhaikney David Haikney added a comment - Just to be clear, cbcollect != couchbase logs (apologies I didn't make that clear in the title). Lots of the debugging / RCA we do is based on the commands that are run at collection time (especially stats commands and system commands) rather than the logs. This is about being able to run `cbcollect` on a node that has been ejected but is still reachable - streaming our log files won't cut it.

          People

            simon.murray Simon Murray
            dhaikney David Haikney
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty