Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-48850

[System Test] XDCR OOM killed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • 7.1.0
    • 7.1.0
    • XDCR

    Description

      Build : 7.1.0-1461
      Test : -test tests/integration/neo/test_neo_magma_wo_gsi_n1ql.yml -scope tests/integration/neo/scope_neo_magma_wo_gsi_n1ql.yml
      Scale : 3
      Iteration : 2nd

      This is a new longevity system test with Magma storage backend on all buckets and are having between 40-70% Resident ratio at all times in the test. The only participating services in this test are KV, Backup and XDCR.

      Seeing XDCR OOM killed on 172.23.123.26 at 2021-10-11T09:56:07

      172.23.123.26 : crash
      [user:info,2021-10-11T09:56:07.644-07:00,ns_1@172.23.123.26:<0.13948.0>:ns_log:crash_consumption_loop:72]Service 'goxdcr' exited with status 137. Restarting. Messages
      

      From the XDCR log on 172.23.123.26 around the same time :

      2021-10-11T09:55:44.958-07:00 ERRO GOXDCR.Adminport: Unable to generate req or resp http: invalid Read on closed Body
      2021-10-11T09:55:44.959-07:00 ERRO GOXDCR.Adminport: Unable to generate req or resp http: invalid Read on closed Body
      2021/10/11 09:55:45 http: superfluous response.WriteHeader call from github.com/couchbase/goxdcr/adminport.(*httpServer).systemHandler.func2 (admin_httpd.go:148)
      2021/10/11 09:55:45 http: superfluous response.WriteHeader call from github.com/couchbase/goxdcr/adminport.(*httpServer).systemHandler.func2 (admin_httpd.go:148)
      2021-10-11T09:55:45.275-07:00 WARN GOXDCR.XmemNozzle: xmem_4c1b34dca75c61d454790b89688f66b8/bucket4/bucket4_172.23.120.58:11210_0 Received temporary error in setMeta response. Response status=TMPFAIL, err = <nil>, response=<ud>MCResponse status=TMPFAIL, opcode=0xa2, opaque=197377, msg: </ud>
      2021-10-11T09:55:45.275-07:00 WARN GOXDCR.XmemNozzle: xmem_4c1b34dca75c61d454790b89688f66b8/bucket4/bucket4_172.23.120.58:11210_1 Received temporary error in setMeta response. Response status=TMPFAIL, err = <nil>, response=<ud>MCResponse status=TMPFAIL, opcode=0xa2, opaque=458785, msg: </ud>
      2021-10-11T09:55:45.331-07:00 WARN GOXDCR.Utils: Get(ckpt/4c1b34dca75c61d454790b89688f66b8/bucket9/bucket9/558) took 3.101889527s
      2021-10-11T09:55:45.331-07:00 ERRO GOXDCR.Adminport: Unable to generate req or resp http: invalid Read on closed Body
      2021-10-11T09:55:43.342-07:00 INFO GOXDCR.DcpNozzle: dcp_4c1b34dca75c61d454790b89688f66b8/bucket4/bucket4_172.23.123.26:11210_1 incrementing counter for inactive streams to 10
      2021-10-11T09:55:45.415-07:00 WARN GOXDCR.XmemNozzle: xmem_4c1b34dca75c61d454790b89688f66b8/bucket4/bucket4_172.23.120.58:11210_0 Received temporary error in setMeta response. Response status=TMPFAIL, err = <nil>, response=<ud>MCResponse status=TMPFAIL, opcode=0xa2, opaque=197378, msg: </ud>
      2021-10-11T09:55:45.415-07:00 WARN GOXDCR.XmemNozzle: xmem_4c1b34dca75c61d454790b89688f66b8/bucket4/bucket4_172.23.120.58:11210_1 Received temporary error in setMeta response. Response status=TMPFAIL, err = <nil>, response=<ud>MCResponse status=TMPFAIL, opcode=0xa2, opaque=458786, msg: </ud>
      2021/10/11 09:55:45 http: superfluous response.WriteHeader call from github.com/couchbase/goxdcr/adminport.(*httpServer).systemHandler.func2 (admin_httpd.go:148)
      2021-10-11T09:55:45.424-07:00 WARN GOXDCR.Utils: top_svc.getNodeList() took 17.769128292s
      2021-10-11T09:55:45.846-07:00 INFO GOXDCR.StatsMgr: 4c1b34dca75c61d454790b89688f66b8/bucket4/bucket4-425085966 expvar=Stats for pipeline 4c1b34dca75c61d454790b89688f66b8/bucket4/bucket4-425085966 {"Backfill Old Progress": "Secondary Pipeline has been constructed", "Backfill Progress": "Backfill Pipeline is constructed", "Errors": "[{\"time\":\"2021-10-11T09:55:18.411814156-07:00\",\"errMsg\":\"TopoChangeDet : Restarting pipeline due to source topology change...\"}]", "OldProgress": "Pipeline is running", "Overview": {"CurrentTime": 1633971240339083295, "add_docs_written": 0, "bandwidth_usage": 0, "changes_left": 755544, "data_merged": 0, "data_replicated": 36638061, "datapool_failed_gets": 0, "dcp_datach_length": 1374, "dcp_dispatch_time": 14714, "deletion_docs_written": 710, "deletion_failed_cr_source": 0, "deletion_filtered": 0, "deletion_received_from_dcp": 825, "docs_checked": 3769011, "docs_cloned": 0, "docs_failed_cr_source": 11533, "docs_filtered": 0, "docs_merged": 0, "docs_opt_repd": 710, "docs_processed": 4689496, "docs_received_from_dcp": 66406, "docs_rep_queue": 0, "docs_unable_to_filter": 0, "docs_written": 47835, "expiry_docs_merged": 0, "expiry_docs_written": 47835, "expiry_failed_cr_source": 3394, "expiry_filtered": 0, "expiry_received_from_dcp": 58267, "expiry_stripped": 0, "num_checkpoints": 0, "num_failedckpts": 0, "rate_doc_checks": 0, "rate_doc_opt_repd": 0, "rate_received_from_dcp": 0, "rate_replicated": 0, "resp_wait_time": 1469, "set_docs_written": 47125, "set_failed_cr_source": 11533, "set_filtered": 0, "set_received_from_dcp": 65581, "size_rep_queue": 0, "target_docs_skipped": 0, "throttle_latency": 0, "throughput_throttle_latency": 0, "time_committing": 0, "wtavg_docs_latency": 0, "wtavg_get_doc_latency": 0, "wtavg_merge_latency": 0, "wtavg_meta_latency": 0}, "Progress": "Received error report : Restarting pipeline due to source topology change...", "Status": "Pending"}
       
      2021-10-11T09:55:45.846-07:00 WARN GOXDCR.StatsMgr: 4c1b34dca75c61d454790b89688f66b8/bucket4/bucket4-425085966 updateStatsOnce encountered error = Pipeline is no longer running, exit.
      2021-10-11T09:55:45.846-07:00 INFO GOXDCR.StatsMgr: 4c1b34dca75c61d454790b89688f66b8/bucket4/bucket4-425085966 updateStats exited
      2021-10-11T09:55:43.426-07:00 INFO GOXDCR.CheckpointMgr: MainPipeline 4c1b34dca75c61d454790b89688f66b8/bucket4/bucket4 Pipeline is no longer running, exit.
      2021-10-11T09:55:45.852-07:00 INFO GOXDCR.CheckpointMgr: MainPipeline 4c1b34dca75c61d454790b89688f66b8/bucket4/bucket4 Exits checkpointing routine.
      2021-10-11T09:55:44.441-07:00 INFO GOXDCR.StatsMgr: Stats for pipeline 4c1b34dca75c61d454790b89688f66b8/bucket4/bucket4-425085966 {"Backfill Old Progress": "Secondary Pipeline has been constructed", "Backfill Progress": "Backfill Pipeline is constructed", "Errors": "[{\"time\":\"2021-10-11T09:55:18.411814156-07:00\",\"errMsg\":\"TopoChangeDet : Restarting pipeline due to source topology change...\"}]", "OldProgress": "Pipeline is running", "Overview": {"CurrentTime": 1633971240339083295, "add_docs_written": 0, "bandwidth_usage": 0, "changes_left": 755544, "data_merged": 0, "data_replicated": 36638061, "datapool_failed_gets": 0, "dcp_datach_length": 1374, "dcp_dispatch_time": 14714, "deletion_docs_written": 710, "deletion_failed_cr_source": 0, "deletion_filtered": 0, "deletion_received_from_dcp": 825, "docs_checked": 3769011, "docs_cloned": 0, "docs_failed_cr_source": 11533, "docs_filtered": 0, "docs_merged": 0, "docs_opt_repd": 710, "docs_processed": 4689496, "docs_received_from_dcp": 66406, "docs_rep_queue": 0, "docs_unable_to_filter": 0, "docs_written": 47835, "expiry_docs_merged": 0, "expiry_docs_written": 47835, "expiry_failed_cr_source": 3394, "expiry_filtered": 0, "expiry_received_from_dcp": 58267, "expiry_stripped": 0, "num_checkpoints": 0, "num_failedckpts": 0, "rate_doc_checks": 0, "rate_doc_opt_repd": 0, "rate_received_from_dcp": 0, "rate_replicated": 0, "resp_wait_time": 1469, "set_docs_written": 47125, "set_failed_cr_source": 11533, "set_filtered": 0, "set_received_from_dcp": 65581, "size_rep_queue": 0, "target_docs_skipped": 0, "throttle_latency": 0, "throughput_throttle_latency": 0, "time_committing": 0, "wtavg_docs_latency": 0, "wtavg_get_doc_latency": 0, "wtavg_merge_latency": 0, "wtavg_meta_latency": 0}, "Progress": "Received error report : Restarting pipeline due to source topology change...", "Status": "Pending"}
       
      2021-10-11T09:55:53.326-07:00 INFO GOXDCR.TopoChangeDet: TopologyChangeDetectorSvc for pipeline 4c1b34dca75c61d454790b89688f66b8/bucket8/bucket8 handleTargetTopologyChange completed
      2021-10-11T09:55:53.404-07:00 WARN GOXDCR.XmemNozzle: xmem_4c1b34dca75c61d454790b89688f66b8/bucket4/bucket4_172.23.120.58:11210_1 Received temporary error in setMeta response. Response status=TMPFAIL, err = <nil>, response=<ud>MCResponse status=TMPFAIL, opcode=0xa2, opaque=458787, msg: </ud>
      2021-10-11T09:55:54.003-07:00 WARN GOXDCR.XmemNozzle: xmem_4c1b34dca75c61d454790b89688f66b8/bucket4/bucket4_172.23.120.58:11210_1 Received temporary error in setMeta response. Response status=TMPFAIL, err = <nil>, response=<ud>MCResponse status=TMPFAIL, opcode=0xa2, opaque=458788, msg: </ud>
      2021-10-11T09:56:13.316-07:00 INFO GOXDCR.ReplMgr: GOMAXPROCS=4
      2021-10-11T09:56:13.318-07:00 INFO GOXDCR.SecuritySvc: Security service started. Waiting for security context to be initialized
      2021-10-11T09:56:13.319-07:00 INFO GOXDCR.SecuritySvc: Received security change notification. code 7
      

      Not sure if this is the same reason as MB-48849 or different.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              mihir.kamdar Mihir Kamdar (Inactive)
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty