Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-18518

[FTS] supportability: dump stats periodically into fts.log

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 4.5.0
    • 4.5.0
    • cbft
    • None
    • Untriaged
    • Unknown

    Description

      • Currently there's no info/logging smts in fts.log that clearly narrate the complete series of events or what happened when with respect to indexing (say indexing is stuck, we have no idea from when, how long, what may have caused it, did dcp return all mutations requested, what seq nos were requested, was there an error returned by dcp etc...)
      • Stats are available and can be polled regularly for tests but history of stats are not recorded for anywhere to construct the "story of events".
      • Currently test logs provide sufficient info that's useful in diagnosing issues(when what happened) but that info is never going to be available from the field unless it is logged.

      I suggest dumping the stats shown on UI also to the log so it can reliably capture the fts progress every 'n' secs so it's easy to correlate it to any external cluster event.

      goxdcr does this, it dumps stats every 10s. We can do it probably less often. Attaching a sample goxdcr.log.

      StatisticsManager 2015-06-30T11:46:01.283-07:00 [INFO] Stats for pipeline 39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1-54357147 {"CkptMgr": {"num_checkpoints": 0, "num_failedckpts": 0, "time_committing": {"count": 0, "max": 0, "mean": 0, "min": 0}}, "Errors": "[]", "Overview": {"bandwidth_usage": 0, "changes_left": 0, "data_replicated": 488029, "dcp_datach_length": 0, "dcp_dispatch_time": 20, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "deletion_filtered": 0, "deletion_received_from_dcp": 0, "docs_checked": 0, "docs_failed_cr_source": 1649, "docs_filtered": 0, "docs_opt_repd": 0, "docs_processed": 2500, "docs_received_from_dcp": 2500, "docs_rep_queue": 0, "docs_written": 851, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "expiry_filtered": 0, "expiry_received_from_dcp": 0, "num_checkpoints": 0, "num_failedckpts": 0, "rate_doc_checks": 0, "rate_doc_opt_repd": 0, "rate_received_from_dcp": 0, "rate_replicated": 0, "set_docs_written": 851, "set_failed_cr_source": 1649, "set_filtered": 0, "set_received_from_dcp": 2500, "size_rep_queue": 0, "time_committing": 0, "wtavg_docs_latency": 0, "wtavg_meta_latency": 13}, "Progress": "Pipeline is running", "Router_dcp_39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1_10.1.2.16:11210_0": {"deletion_filtered": 0, "docs_filtered": 0, "expiry_filtered": 0, "set_filtered": 0}, "Router_dcp_39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1_10.1.2.16:11210_1": {"deletion_filtered": 0, "docs_filtered": 0, "expiry_filtered": 0, "set_filtered": 0}, "Status": "Replicating", "dcp_39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1_10.1.2.16:11210_0": {"dcp_datach_length": 0, "dcp_dispatch_time": {"count": 1252, "max": 710, "mean": 19.416, "min": 8}, "deletion_received_from_dcp": 0, "docs_received_from_dcp": 1252, "expiry_received_from_dcp": 0, "set_received_from_dcp": 1252}, "dcp_39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1_10.1.2.16:11210_1": {"dcp_datach_length": 0, "dcp_dispatch_time": {"count": 1248, "max": 1190, "mean": 22.144, "min": 8}, "deletion_received_from_dcp": 0, "docs_received_from_dcp": 1248, "expiry_received_from_dcp": 0, "set_received_from_dcp": 1248}, "xmem_39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1_10.1.2.20:11210_0": {"data_replicated": 10894, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "docs_failed_cr_source": 25, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 19, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "set_docs_written": 19, "set_failed_cr_source": 25, "size_rep_queue": 0, "wtavg_docs_latency": {"count": 19, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 44, "max": 46, "mean": 9.931818181818182, "min": 1}}, "xmem_39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1_10.1.2.20:11210_1": {"data_replicated": 6307, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "docs_failed_cr_source": 23, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 11, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "set_docs_written": 11, "set_failed_cr_source": 23, "size_rep_queue": 0, "wtavg_docs_latency": {"count": 11, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 34, "max": 27, "mean": 7.5588235294117645, "min": 4}}, "xmem_39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1_10.1.2.20:11210_10": {"data_replicated": 6312, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "docs_failed_cr_source": 33, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 11, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "set_docs_written": 11, "set_failed_cr_source": 33, "size_rep_queue": 0, "wtavg_docs_latency": {"count": 11, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 44, "max": 49, "mean": 7.931818181818182, "min": 0}}, "xmem_39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1_10.1.2.20:11210_11": {"data_replicated": 4018, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "docs_failed_cr_source": 25, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 7, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "set_docs_written": 7, "set_failed_cr_source": 25, "size_rep_queue": 0, "wtavg_docs_latency": {"count": 7, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 32, "max": 18, "mean": 3.40625, "min": 0}}, "xmem_39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1_10.1.2.20:11210_12": {"data_replicated": 4588, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "docs_failed_cr_source": 25, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 8, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "set_docs_written": 8, "set_failed_cr_source": 25, "size_rep_queue": 0, "wtavg_docs_latency": {"count": 8, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 33, "max": 20, "mean": 6.575757575757576, "min": 4}}, "xmem_39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1_10.1.2.20:11210_13": {"data_replicated": 10894, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "docs_failed_cr_source": 24, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 19, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "set_docs_written": 19, "set_failed_cr_source": 24, "size_rep_queue": 0, "wtavg_docs_latency": {"count": 19, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 43, "max": 25, "mean": 10.69767441860465, "min": 0}}, "xmem_39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1_10.1.2.20:11210_14": {"data_replicated": 5161, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "docs_failed_cr_source": 25, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 9, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "set_docs_written": 9, "set_failed_cr_source": 25, "size_rep_queue": 0, "wtavg_docs_latency": {"count": 9, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 34, "max": 18, "mean": 5.911764705882353, "min": 1}}, "xmem_39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1_10.1.2.20:11210_15": {"data_replicated": 8602, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "docs_failed_cr_source": 29, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 15, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "set_docs_written": 15, "set_failed_cr_source": 29, "size_rep_queue": 0, "wtavg_docs_latency": {"count": 15, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 44, "max": 20, "mean": 6.318181818181818, "min": 1}}, "xmem_39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1_10.1.2.20:11210_16": {"data_replicated": 8029, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "docs_failed_cr_source": 25, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 14, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "set_docs_written": 14, "set_failed_cr_source": 25, "size_rep_queue": 0, "wtavg_docs_latency": {"count": 14, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 39, "max": 34, "mean": 13.23076923076923, "min": 0}}, "xmem_39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1_10.1.2.20:11210_17": {"data_replicated": 8604, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "docs_failed_cr_source": 27, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 15, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "set_docs_written": 15, "set_failed_cr_source": 27, "size_rep_queue": 0, "wtavg_docs_latency": {"count": 15, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 42, "max": 42, "mean": 17.976190476190474, "min": 3}}, "xmem_39e729c88e82e50b50a6279d443ad16d/sasl_bucket_1/sasl_bucket_1_10.1.2.20:11210_18": {"data_replicated": 6308, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "docs_failed_cr_source": 27, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 11, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "set_docs_written": 11, "set_failed_cr_source": 27, "size_rep_queue": 0, "wtavg_docs_latency": {"count": 11, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 38, "max": 21, "mean": 6.868
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            apiravi Aruna Piravi (Inactive)
            apiravi Aruna Piravi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty