Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-51438

[System Test][CBM] backup task failed with error - failed to transfer key value data: DCP stream closed unexpectedly

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Critical
    • None
    • 7.1.0
    • tools

    Description

      Build : 7.1.0-2475 (RC1)
      Test : -test tests/integration/neo/test_neo_magma_milestone4.yml -scope tests/integration/neo/scope_neo_magma.yml
      Scale : 3
      Iteration : 4th

      A backup task that started at 2022-03-13T20:00:56-07:00 failed due to the following error :
      failed to execute cluster operations: failed to execute bucket operation for bucket 'bucket6': failed to transfer bucket data for bucket 'bucket6': failed to transfer key value data: failed to transfer key value data: DCP stream closed unexpectedly, check the logs for more information

      Task information:

      {
        "task_name": "backup-1",
        "status": "failed",
        "start": "2022-03-13T20:00:56.621231943-07:00",
        "end": "2022-03-13T20:07:10.289787634-07:00",
        "node_runs": [
          {
            "node_id": "7b91337236f62ce6fb4c1b1d0dee3643",
            "status": "failed",
            "start": "2022-03-13T20:00:56.808365675-07:00",
            "end": "2022-03-13T20:07:10.12409355-07:00",
            "error": "exit status 1: failed to execute cluster operations: failed to execute bucket operation for bucket 'bucket6': failed to transfer bucket data for bucket 'bucket6': failed to transfer key value data: failed to transfer key value data: DCP stream closed unexpectedly, check the logs for more information",
            "progress": 0,
            "stats": {
              "error": "failed to execute cluster operations: failed to execute bucket operation for bucket 'bucket6': failed to transfer bucket data for bucket 'bucket6': failed to transfer key value data: failed to transfer key value data: DCP stream closed unexpectedly, check the logs for more information",
              "stats": {
                "started_at": 1647226857442320100,
                "buckets": {
                  "ITEM": {
                    "total_vbuckets": 1024,
                    "vbuckets_complete": 1024,
                    "bytes_received": 40416,
                    "failover_logs_received": 1024,
                    "started_at": 1647226862450999000,
                    "finished_at": 1647226864024624000,
                    "complete": true
                  },
                  "N1QL_SYSTEM_BUCKET": {
                    "estimated_total_items": 10149,
                    "total_items": 934,
                    "total_vbuckets": 1024,
                    "vbuckets_complete": 1024,
                    "bytes_received": 652805,
                    "snapshot_markers_received": 1024,
                    "failover_logs_received": 1024,
                    "mutations_received": 271,
                    "deletions_received": 663,
                    "started_at": 1647226867542167300,
                    "finished_at": 1647226871399142100,
                    "complete": true
                  },
                  "NEW_ORDER": {
                    "total_vbuckets": 1024,
                    "vbuckets_complete": 1024,
                    "bytes_received": 40368,
                    "failover_logs_received": 1024,
                    "started_at": 1647226874263119000,
                    "finished_at": 1647226875357299500,
                    "complete": true
                  },
                  "WAREHOUSE": {
                    "total_vbuckets": 1024,
                    "vbuckets_complete": 1024,
                    "bytes_received": 40400,
                    "failover_logs_received": 1024,
                    "started_at": 1647226879732026000,
                    "finished_at": 1647226881040402700,
                    "complete": true
                  },
                  "bucket4": {
                    "estimated_total_items": 200882329,
                    "total_items": 6599746,
                    "total_vbuckets": 1024,
                    "vbuckets_complete": 1024,
                    "bytes_received": 5201541475,
                    "snapshot_markers_received": 1024,
                    "failover_logs_received": 1024,
                    "mutations_received": 6599746,
                    "deletions_received": 481044,
                    "started_at": 1647226883928414500,
                    "finished_at": 1647226926178169600,
                    "complete": true
                  },
                  "bucket5": {
                    "estimated_total_items": 134802494,
                    "total_items": 6607521,
                    "total_vbuckets": 1024,
                    "vbuckets_complete": 1024,
                    "bytes_received": 5188742822,
                    "snapshot_markers_received": 1024,
                    "failover_logs_received": 1024,
                    "mutations_received": 6607521,
                    "deletions_received": 114379,
                    "started_at": 1647226929640775200,
                    "finished_at": 1647226965228502800,
                    "complete": true
                  },
                  "bucket6": {
                    "estimated_total_items": 67882518,
                    "total_items": 13671160,
                    "total_vbuckets": 1024,
                    "vbuckets_complete": 709,
                    "bytes_received": 8395126216,
                    "snapshot_markers_received": 1024,
                    "failover_logs_received": 1024,
                    "mutations_received": 10663228,
                    "deletions_received": 454922,
                    "started_at": 1647227165636273400
                  },
                  "bucket7": {
                    "estimated_total_items": 253976334,
                    "total_items": 2252734,
                    "total_vbuckets": 1024,
                    "vbuckets_complete": 1024,
                    "bytes_received": 1778401010,
                    "snapshot_markers_received": 1024,
                    "failover_logs_received": 1024,
                    "mutations_received": 2264819,
                    "deletions_received": 24204,
                    "started_at": 1647226968673550800,
                    "finished_at": 1647227027369205000,
                    "complete": true
                  },
                  "bucket8": {
                    "estimated_total_items": 1386936,
                    "total_vbuckets": 1024,
                    "vbuckets_complete": 1024,
                    "bytes_received": 85424,
                    "snapshot_markers_received": 1024,
                    "failover_logs_received": 1024,
                    "started_at": 1647227030751241000,
                    "finished_at": 1647227046184171000,
                    "complete": true
                  },
                  "default": {
                    "estimated_total_items": 716565849,
                    "total_items": 1012641,
                    "total_vbuckets": 1024,
                    "vbuckets_complete": 1024,
                    "bytes_received": 8678680501,
                    "snapshot_markers_received": 1024,
                    "failover_logs_received": 1024,
                    "mutations_received": 1017441,
                    "deletions_received": 5263967,
                    "started_at": 1647227049313686500,
                    "finished_at": 1647227161924542200,
                    "complete": true
                  }
                }
              }
            },
            "error_code": 2
          }
        ],
        "error": "exit status 1: failed to execute cluster operations: failed to execute bucket operation for bucket 'bucket6': failed to transfer bucket data for bucket 'bucket6': failed to transfer key value data: failed to transfer key value data: DCP stream closed unexpectedly, check the logs for more information",
        "error_code": 2,
        "type": "BACKUP",
        "show": true
      }

      From the backup logs on 172.23.108.144 (attached):

      2022-03-13T20:06:56.687-07:00 (DCP) (bucket6) (vb 16) Stream closed because all items were streamed | {"uuid":186116183045141,"snap_start":0,"snap_end":76767,"last_seqno":76767,"retries":0}
      2022-03-13T20:06:56.721-07:00 (DCP) (bucket6) (vb 828) Stream closed because all items were streamed | {"uuid":94444265413278,"snap_start":0,"snap_end":82980,"last_seqno":82980,"retries":0}
      2022-03-13T20:06:56.942-07:00 (DCP) (bucket6) (vb 441) Stream closed because all items were streamed | {"uuid":65054041517834,"snap_start":0,"snap_end":61397,"last_seqno":61397,"retries":0}
      2022-03-13T20:06:57.024-07:00 (DCP) (bucket6) (vb 143) Stream closed because all items were streamed | {"uuid":254728410754358,"snap_start":0,"snap_end":85519,"last_seqno":85519,"retries":0}
      2022-03-13T20:06:57.072-07:00 (DCP) (bucket6) (vb 574) Stream closed because all items were streamed | {"uuid":46686512727618,"snap_start":0,"snap_end":72721,"last_seqno":72721,"retries":0}
      2022-03-13T20:07:06.234-07:00 WARN: (DCP) (bucket6) (vb 499) Stream closed due to EOF being received while streaming, this could happen for a number of different reasons (e.g. non-graceful failover) | {"uuid":245483932323630,"snap_start":0,"snap_end":65938,"last_seqno":5026,"retries":0} -- couchbase.(*DCPAsyncWorker).End() at dcp_async_worker.go:538
      2022-03-13T20:07:06.234-07:00 WARN: (DCP) (bucket6) (vb 499) Received an unexpected error whilst streaming, beginning teardown: DCP stream closed unexpectedly, check the logs for more information -- couchbase.(*DCPAsyncWorker).handleDCPError() at dcp_async_worker.go:620
      2022-03-13T20:07:06.234-07:00 WARN: (DCP) (bucket6) (vb 380) Stream closed by backup client | {"uuid":35681504557486,"snap_start":0,"snap_end":63935,"last_seqno":0,"retries":0} -- couchbase.(*DCPAsyncWorker).End() at dcp_async_worker.go:535
      2022-03-13T20:07:06.234-07:00 WARN: (DCP) (bucket6) (vb 75) Stream closed by backup client | {"uuid":84247502724580,"snap_start":0,"snap_end":76092,"last_seqno":0,"retries":0} -- couchbase.(*DCPAsyncWorker).End() at dcp_async_worker.go:535
      2022-03-13T20:07:06.234-07:00 WARN: (DCP) (bucket6) (vb 705) Stream closed by backup client | {"uuid":198071661100768,"snap_start":0,"snap_end":84569,"last_seqno":13275,"retries":0} -- couchbase.(*DCPAsyncWorker).End() at dcp_async_worker.go:535
      2022-03-13T20:07:06.234-07:00 WARN: (DCP) (bucket6) (vb 669) Stream closed by backup client | {"uuid":46835564232924,"snap_start":0,"snap_end":73764,"last_seqno":8327,"retries":0} -- couchbase.(*DCPAsyncWorker).End() at dcp_async_worker.go:535
      2022-03-13T20:07:06.234-07:00 (Cmd) Error backing up cluster: failed to execute cluster operations: failed to execute bucket operation for bucket 'bucket6': failed to transfer bucket data for bucket 'bucket6': failed to transfer key value data: failed to transfer key value data: DCP stream closed unexpectedly, check the logs for more information
      2022-03-13T20:07:06.234-07:00 WARN: (DCP) (bucket6) (vb 109) Stream closed by backup client | {"uuid":147204617841209,"snap_start":0,"snap_end":38042,"last_seqno":0,"retries":0} -- couchbase.(*DCPAsyncWorker).End() at dcp_async_worker.go:535
      2022-03-13T20:07:06.234-07:00 WARN: (DCP) (bucket6) (vb 57) Stream closed by backup client | {"uuid":93662387071940,"snap_start":0,"snap_end":46770,"last_seqno":0,"retries":0} -- couchbase.(*DCPAsyncWorker).End() at dcp_async_worker.go:535
      2022-03-13T20:07:06.234-07:00 WARN: (DCP) (bucket6) (vb 55) Stream closed by backup client | {"uuid":5817735385231,"snap_start":0,"snap_end":83259,"last_seqno":0,"retries":0} -- couchbase.(*DCPAsyncWorker).End() at dcp_async_worker.go:535
      

      The test did a hard failover for one KV node - 172.23.108.141 while the backup was going on :

      [2022-03-13T20:07:03-07:00, sequoiatools/couchbase-cli:7.1:d31969] failover -c 172.23.108.139:8091 --server-failover 172.23.108.141:8091 -u Administrator -p password --hard
      [2022-03-13T20:07:13-07:00, sequoiatools/couchbase-cli:7.1:c1f009] rebalance -c 172.23.108.139:8091 -u Administrator -p password
      

      Marking this as a regression since it looks similar to MB-50950 which was fixed recently.

      Attachments

        1. backup-0.log
          48.63 MB
          Mihir Kamdar
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            mihir.kamdar Mihir Kamdar (Inactive)
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty