Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49481

Investigate timeout observed in longevity with p2p enabled - RuntimeCtx:Execution timed out - Pipeline did not start in a timely manner, possibly due to busy source or target. Will try again...

    XMLWordPrintable

Details

    • Task
    • Resolution: Fixed
    • Major
    • 7.1.0
    • 7.1.0
    • XDCR
    • 1

    Description

      7.1.0-1677 with p2p enabled

      Test:
      -test tests/integration/neo/test_neo_couchstore_milestone3.yml -scope tests/integration/neo/scope_couchstore.yml
      Scale 3
      Iteration 2

      Following errors observed in goxdcr logs of .120.77:

      2021-11-10T09:17:03.328-08:00 INFO GOXDCR.RuntimeCtx: Executing Action timed out
      2021-11-10T09:17:03.328-08:00 INFO GOXDCR.RuntimeCtx: ****************************
      2021-11-10T09:17:03.328-08:00 ERRO GOXDCR.RuntimeCtx: 790274de382206ee50d3b2d252bc235e/bucket4/bucket4 error starting pipeline context. err=Execution timed out
      2021-11-10T09:17:03.328-08:00 ERRO GOXDCR.GenericPipeline: 790274de382206ee50d3b2d252bc235e/bucket4/bucket4-446897690 failed to start, err=map[genericPipeline.context.Start:Execution timed out]
      2021-11-10T09:17:03.328-08:00 ERRO GOXDCR.PipelineMgr: Failed to start the pipeline 790274de382206ee50d3b2d252bc235e/bucket4/bucket4-446897690
      2021-11-10T09:17:03.328-08:00 ERRO GOXDCR.PipelineMgr: Failed to update pipeline 790274de382206ee50d3b2d252bc235e/bucket4/bucket4, err=genericPipeline.context.Start : Execution timed out
      2021-11-10T09:17:03.328-08:00 ERRO GOXDCR.PipelineMgr: Update of pipeline 790274de382206ee50d3b2d252bc235e/bucket4/bucket4 failed with errors=genericPipeline.context.Start : Execution timed out
      2021-11-10T09:17:03.329-08:00 INFO GOXDCR.PipelineMgr: Replication 790274de382206ee50d3b2d252bc235e/bucket4/bucket4 update experienced error(s): genericPipeline.context.Start : Execution timed out. Scheduling a redo.
      2021-11-10T09:17:03.329-08:00 INFO GOXDCR.PipelineMgr: Replication status is updated with error(s) genericPipeline.context.Start : Execution timed out, current status=name={790274de382206ee50d3b2d252bc235e/bucket4/bucket4}, status={Pending}, errors={[{"time":"2021-11-10T09:17:03.328988484-08:00","errMsg":"genericPipeline.context.Start:Execution timed out"},{"time":"2021-11-10T09:15:13.369268597-08:00","errMsg":"RuntimeCtx:Execution timed out"},{"time":"2021-11-10T09:14:50.054942879-08:00","errMsg":"genericPipeline.context.Start:Execution timed out"},{"time":"2021-11-10T09:12:54.907260126-08:00","errMsg":"RuntimeCtx:Execution timed out"},{"time":"2021-11-10T09:12:34.474400407-08:00","errMsg":"CheckpointMgr : key not found"},{"time":"2021-11-10T09:10:52.273511113-08:00","errMsg":"genericPipeline.context.Start:Execution timed out"},{"time":"2021-11-10T09:09:04.96817513-08:00","errMsg":"CheckpointMgr : key not found"},{"time":"2021-11-10T09:07:24.323262444-08:00","errMsg":"CheckpointMgr : key not found"}]}, oldProgress={Done PeerToPeer communication and metadata merging}, progress={Pipeline failed to start, err=map[genericPipeline.context.Start:Execution timed out]}, oldBackfillProgress={Source nozzles have been closed}, backfillProgress={Pipeline has been stopped}
      2021-11-10T09:17:03.329-08:00 INFO GOXDCR.PipelineMgr: Replication 790274de382206ee50d3b2d252bc235e/bucket4/bucket4's status experienced changes or errors (<nil>). However, last update resulted in failure, so will reschedule a future update
      2021-11-10T09:17:03.329-08:00 INFO GOXDCR.PipelineMgr: Pipeline updater scheduled to update in 10s
      2021-11-10T09:17:03.329-08:00 INFO GOXDCR.StatsMgr: Stats for pipeline 790274de382206ee50d3b2d252bc235e/bucket4/bucket4-446897690 {"Backfill Old Progress": "Source nozzles have been closed", "Backfill Progress": "Pipeline has been stopped", "CkptMgr": {"num_checkpoints": 0, "num_failedckpts": 0, "time_committing": {"count": 0, "max": 0, "mean": 0, "min": 0}}, "Errors": "[{\"time\":\"2021-11-10T09:17:03.328988484-08:00\",\"errMsg\":\"genericPipeline.context.Start:Execution timed out\"},{\"time\":\"2021-11-10T09:15:13.369268597-08:00\",\"errMsg\":\"RuntimeCtx:Execution timed out\"},{\"time\":\"2021-11-10T09:14:50.054942879-08:00\",\"errMsg\":\"genericPipeline.context.Start:Execution timed out\"},{\"time\":\"2021-11-10T09:12:54.907260126-08:00\",\"errMsg\":\"RuntimeCtx:Execution timed out\"},{\"time\":\"2021-11-10T09:12:34.474400407-08:00\",\"errMsg\":\"CheckpointMgr : key not found\"},{\"time\":\"2021-11-10T09:10:52.273511113-08:00\",\"errMsg\":\"genericPipeline.context.Start:Execution timed out\"},{\"time\":\"2021-11-10T09:09:04.96817513-08:00\",\"errMsg\":\"CheckpointMgr : key not found\"},{\"time\":\"2021-11-10T09:07:24.323262444-08:00\",\"errMsg\":\"CheckpointMgr : key not found\"}]", "OldProgress": "Done PeerToPeer communication and metadata merging", "Overview": {"CurrentTime": 1636564503366905744, "add_docs_written": 0, "bandwidth_usage": 0, "changes_left": 1074187, "data_merged": 0, "data_replicated": 0, "datapool_failed_gets": 0, "dcp_datach_length": 0, "dcp_dispatch_time": 0, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "deletion_filtered": 0, "deletion_received_from_dcp": 0, "docs_checked": 0, "docs_cloned": 0, "docs_failed_cr_source": 0, "docs_filtered": 0, "docs_merged": 0, "docs_opt_repd": 0, "docs_processed": 0, "docs_received_from_dcp": 0, "docs_rep_queue": 0, "docs_unable_to_filter": 0, "docs_written": 0, "expiry_docs_merged": 0, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "expiry_filtered": 0, "expiry_received_from_dcp": 0, "expiry_stripped": 0, "num_checkpoints": 0, "num_failedckpts": 0, "rate_doc_checks": 0, "rate_doc_opt_repd": 0, "rate_received_from_dcp": 0, "rate_replicated": 0, "resp_wait_time": 0, "set_docs_written": 0, "set_failed_cr_source": 0, "set_filtered": 0, "set_received_from_dcp": 0, "size_rep_queue": 0, "target_docs_skipped": 0, "throttle_latency": 0, "throughput_throttle_latency": 0, "time_committing": 0, "wtavg_docs_latency": 0, "wtavg_get_doc_latency": 0, "wtavg_merge_latency": 0, "wtavg_meta_latency": 0}, "Progress": "Pipeline failed to start, err=map[genericPipeline.context.Start:Execution timed out]", "Router_dcp_790274de382206ee50d3b2d252bc235e/bucket4/bucket4_172.23.120.77:11210_0": {"datapool_failed_gets": 0, "deletion_filtered": 0, "docs_cloned": 0, "docs_filtered": 0, "docs_unable_to_filter": 0, "expiry_filtered": 0, "expiry_stripped": 0, "set_filtered": 0, "throughput_throttle_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}}, "Router_dcp_790274de382206ee50d3b2d252bc235e/bucket4/bucket4_172.23.120.77:11210_1": {"datapool_failed_gets": 0, "deletion_filtered": 0, "docs_cloned": 0, "docs_filtered": 0, "docs_unable_to_filter": 0, "expiry_filtered": 0, "expiry_stripped": 0, "set_filtered": 0, "throughput_throttle_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}}, "Status": "Pending", "dcp_790274de382206ee50d3b2d252bc235e/bucket4/bucket4_172.23.120.77:11210_0": {"dcp_datach_length": 0, "dcp_dispatch_time": {"count": 0, "max": 0, "mean": 0, "min": 0}, "deletion_received_from_dcp": 0, "docs_received_from_dcp": 0, "expiry_received_from_dcp": 0, "set_received_from_dcp": 0}, "dcp_790274de382206ee50d3b2d252bc235e/bucket4/bucket4_172.23.120.77:11210_1": {"dcp_datach_length": 0, "dcp_dispatch_time": {"count": 0, "max": 0, "mean": 0, "min": 0}, "deletion_received_from_dcp": 0, "docs_received_from_dcp": 0, "expiry_received_from_dcp": 0, "set_received_from_dcp": 0}, "xmem_790274de382206ee50d3b2d252bc235e/bucket4/bucket4_172.23.106.138:11210_0": {"add_docs_cas_changed": 0, "add_docs_written": 0, "data_replicated": 0, "datapool_failed_gets": 0, "deletion_docs_cas_changed": 0, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "deletion_target_docs_skipped": 0, "docs_failed_cr_source": 0, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 0, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "expiry_target_docs_skipped": 0, "resp_wait_time": {"count": 0, "max": 0, "mean": 0, "min": 0}, "set_docs_cas_changed": 0, "set_docs_written": 0, "set_failed_cr_source": 0, "set_target_docs_skipped": 0, "size_rep_queue": 0, "target_docs_skipped": 0, "throttle_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}, "wtavg_docs_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}, "wtavg_get_doc_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}}, "xmem_790274de382206ee50d3b2d252bc235e/bucket4/bucket4_172.23.106.138:11210_1": {"add_docs_cas_changed": 0, "add_docs_written": 0, "data_replicated": 0, "datapool_failed_gets": 0, "deletion_docs_cas_changed": 0, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "deletion_target_docs_skipped": 0, "docs_failed_cr_source": 0, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 0, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "expiry_target_docs_skipped": 0, "resp_wait_time": {"count": 0, "max": 0, "mean": 0, "min": 0}, "set_docs_cas_changed": 0, "set_docs_written": 0, "set_failed_cr_source": 0, "set_target_docs_skipped": 0, "size_rep_queue": 0, "target_docs_skipped": 0, "throttle_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}, "wtavg_docs_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}, "wtavg_get_doc_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}}, "xmem_790274de382206ee50d3b2d252bc235e/bucket4/bucket4_172.23.121.118:11210_0": {"add_docs_cas_changed": 0, "add_docs_written": 0, "data_replicated": 0, "datapool_failed_gets": 0, "deletion_docs_cas_changed": 0, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "deletion_target_docs_skipped": 0, "docs_failed_cr_source": 0, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 0, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "expiry_target_docs_skipped": 0, "resp_wait_time": {"count": 0, "max": 0, "mean": 0, "min": 0}, "set_docs_cas_changed": 0, "set_docs_written": 0, "set_failed_cr_source": 0, "set_target_docs_skipped": 0, "size_rep_queue": 0, "target_docs_skipped": 0, "throttle_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}, "wtavg_docs_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}, "wtavg_get_doc_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}}, "xmem_790274de382206ee50d3b2d252bc235e/bucket4/bucket4_172.23.121.118:11210_1": {"add_docs_cas_changed": 0, "add_docs_written": 0, "data_replicated": 0, "datapool_failed_gets": 0, "deletion_docs_cas_changed": 0, "deletion_docs_written": 0, "deletion_failed_cr_source": 0, "deletion_target_docs_skipped": 0, "docs_failed_cr_source": 0, "docs_opt_repd": 0, "docs_rep_queue": 0, "docs_written": 0, "expiry_docs_written": 0, "expiry_failed_cr_source": 0, "expiry_target_docs_skipped": 0, "resp_wait_time": {"count": 0, "max": 0, "mean": 0, "min": 0}, "set_docs_cas_changed": 0, "set_docs_written": 0, "set_failed_cr_source": 0, "set_target_docs_skipped": 0, "size_rep_queue": 0, "target_docs_skipped": 0, "throttle_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}, "wtavg_docs_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}, "wtavg_get_doc_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}, "wtavg_meta_latency": {"count": 0, "max": 0, "mean": 0, "min": 0}}}
      

      Errors observed in UI as well - screenshot attached

      Logs:
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.106.134.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.106.136.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.120.74.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.120.77.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.120.81.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.120.86.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.121.77.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.123.24.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.123.25.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.123.26.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.123.31.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.123.32.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.123.33.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.96.122.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.96.14.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.96.243.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.96.254.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.96.48.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.97.105.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.97.110.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.97.112.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.97.148.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.97.149.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.97.150.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.97.151.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.97.241.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636570260/collectinfo-2021-11-10T185101-ns_1%40172.23.97.74.zip

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              neil.huang Neil Huang
              arunkumar Arunkumar Senthilnathan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty