Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49379

[System Test][Backup Service] merge task failed with error "exit status 2" - panic observed in backup service logs

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.1.0
    • 7.1.0
    • tools
    • Untriaged
    • 1
    • No
    • Tools 2021 Nov

    Description

      7.1.0-1601

      Test:
      -test tests/integration/neo/test_neo_couchstore_milestone2.yml -scope tests/integration/neo/scope_couchstore.yml
      Scale 3
      Iteration 3

      Merge task failed with error:

      {
        "task_name": "merge",
        "status": "failed",
        "start": "2021-11-04T12:00:06.811226974-07:00",
        "end": "2021-11-04T12:00:08.252187104-07:00",
        "node_runs": [
          {
            "node_id": "d5bf3aa717103eb071e09e2d86f86a50",
            "status": "failed",
            "start": "2021-11-04T12:00:06.91662584-07:00",
            "end": "2021-11-04T12:00:08.182840868-07:00",
            "error": "exit status 2: ",
            "progress": 0.11029411764705883,
            "stats": {
              "id": "85c96cd3-dad0-450f-991d-c87ea45a20d9",
              "current_transfer": 1,
              "total_transfers": 5,
              "transfers": [
                {
                  "description": "(1/5) Merging backup 2021-11-03T11_00_09.840354141-07_00",
                  "stats": {
                    "started_at": 1636052406975149300,
                    "buckets": {
                      "bucket1": {
                        "total_items": 6409198,
                        "started_at": 1636052407350514700,
                        "finished_at": 1636052407365567500,
                        "complete": true
                      },
                      "bucket2": {
                        "total_items": 6444642,
                        "started_at": 1636052407534402800,
                        "finished_at": 1636052407548489200,
                        "complete": true
                      },
                      "bucket6": {
                        "total_items": 1838703,
                        "started_at": 1636052407723123700,
                        "finished_at": 1636052407761463300,
                        "complete": true
                      },
                      "bucket7": {
                        "total_items": 651000,
                        "started_at": 1636052407973302300,
                        "finished_at": 1636052408012203300,
                        "complete": true
                      },
                      "default": {
                        "total_items": 8959410,
                        "started_at": 1636052407201981400,
                        "finished_at": 1636052407218421200,
                        "complete": true
                      }
                    }
                  },
                  "progress": 0.5514705882352942,
                  "eta": "2021-11-04T12:03:26.067601569-07:00"
                }
              ],
              "progress": 0.11029411764705883,
              "eta": "2021-11-04T12:00:09.067601569-07:00"
            },
            "error_code": 2
          }
        ],
        "error": "exit status 2: ",
        "error_code": 2,
        "type": "MERGE",
        "show": true
      }
      

      We can see the following panic in .123.28 backup node's log:

      panic: runtime error: index out of range [-1]
       
      goroutine 1 [running]:
      github.com/couchbase/backup/archive.(*repoDir).shouldRollback(0xc0009f6078, 0xc000d76348, 0xc0005f99b0)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/backup/archive/dir_repo.go:753 +0xed
      github.com/couchbase/backup/archive.(*repoDir).updateSourceRanges(_, {0xc000a78858, 0xc00096e0c0, 0xc0007561f8, 0xc000850438, 0xc00096ea98, 0xc00096ec30, 0xc000210990, 0xc000913998, 0xc000756cc0, ...}, ...)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/backup/archive/dir_repo.go:743 +0x7a
      github.com/couchbase/backup/archive.(*repoDir).getBackupSeqNoRange(_, {_, _}, {_, _}, {_, _})
              /home/couchbase/jenkins/workspace/couchbase-server-unix/backup/archive/dir_repo.go:702 +0x4ba
      github.com/couchbase/backup/archive.(*Source).GetDataRanges(_, {_, _}, {_, _})
              /home/couchbase/jenkins/workspace/couchbase-server-unix/backup/archive/source.go:161 +0x85
      github.com/couchbase/backup/plan/services/data.(*dataRange).Execute(0xc00027ab90, {0x7fd1840d5c58, 0xc00031e1e0}, {0x7fd1840d5ca0, 0xc000234500}, 0xc0011f0650)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/backup/plan/services/data/range.go:44 +0x169
      github.com/couchbase/backup/plan/services/data.(*Data).Execute(0xc00012eea0, {0x7fd1840d5c58, 0xc00031e1e0}, {0x7fd1840d5ca0, 0xc000234500}, 0xc00111e650)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/backup/plan/services/data/data.go:53 +0xca
      github.com/couchbase/backup/plan/cluster/bucket.(*Bucket).Execute(0xc00030e000, {0x7fd1841c6440, 0xc00031e1e0}, {0x7fd1841c64d8, 0xc000234500}, 0x0)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/backup/plan/cluster/bucket/bucket.go:195 +0xab4
      github.com/couchbase/backup/plan/cluster.(*Cluster).Execute.func1(0xc0)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/backup/plan/cluster/cluster.go:91 +0x130
      github.com/couchbase/backup/plan/cluster.(*Cluster).Execute(0xc00027a4b0, {0x7fd184134218, 0xc00031e1e0}, {0x7fd1841342c0, 0xc000234500}, 0xc0005f8650)
              /home/couchbase/jenkins/workspace/couchbase-server-unix/backup/plan/cluster/cluster.go:130 +0x48c
      github.com/couchbase/backup/plan.(*Plan).Execute(0xc000729d40, {0x1aaca60, 0xc00031e1e0}, {0x1ab3998, 0xc000234500}, {0x1a8e690, 0xc00003d340})
              /home/couchbase/jenkins/workspace/couchbase-server-unix/backup/plan/plan.go:76 +0x156
      github.com/couchbase/backup/archive.(*Archive).MergeIncrementalBackups(0xc00007eae0, {{0x7ffead2f46f3, 0x24}, {0x7ffead2f4770, 0xa}, {0x7ffead2f477b, 0xa}, 0x3, {0x1a8e690, 0xc00003d340}, ...})
              /home/couchbase/jenkins/workspace/couchbase-server-unix/backup/archive/archive.go:1040 +0xc25
      main.(*MergeContext).Run(0xc00028c500)
              backup/cmd/cbbackupmgr/merge.go:150 +0xb85
      github.com/couchbase/cbflag.(*Command).parseFlags(0xc00035fc00, 0xc0006bf218, {0xc000142020, 0x9, 0x10000c0000102e8})
              /home/couchbase/.cbdepscache/gomodcache/pkg/mod/github.com/couchbase/cbflag@v0.0.0-20210923160146-4b1144509806/command.go:251 +0x1227
      github.com/couchbase/cbflag.(*Command).parse(0xc00035fc00, 0xc0006bf218, {0xc000142020, 0x10, 0x9})
              /home/couchbase/.cbdepscache/gomodcache/pkg/mod/github.com/couchbase/cbflag@v0.0.0-20210923160146-4b1144509806/command.go:102 +0x114
      github.com/couchbase/cbflag.(*Command).parseCommands(0xc00035fd50, 0xc0006bf218, {0xc000142010, 0xa, 0xa})
              /home/couchbase/.cbdepscache/gomodcache/pkg/mod/github.com/couchbase/cbflag@v0.0.0-20210923160146-4b1144509806/command.go:114 +0x1ac
      github.com/couchbase/cbflag.(*Command).parse(0xc00035fd50, 0xc0006bf218, {0xc000142010, 0x0, 0xa})
              /home/couchbase/.cbdepscache/gomodcache/pkg/mod/github.com/couchbase/cbflag@v0.0.0-20210923160146-4b1144509806/command.go:100 +0x125
      github.com/couchbase/cbflag.(*CLI).Parse(0xc0000ac120, {0xc000142000, 0xb, 0xb})
              /home/couchbase/.cbdepscache/gomodcache/pkg/mod/github.com/couchbase/cbflag@v0.0.0-20210923160146-4b1144509806/cli.go:69 +0x19c
      main.main()
              backup/cmd/cbbackupmgr/main.go:4066 +0xda65
      

      Cluster config:

      ########## Cluster config ##################
      ######  fts : 3 ===== > [172.23.104.155:8091 172.23.96.148:8091 172.23.97.122:8091]  ###########
      ######  kv : 9 ===== > [172.23.104.157:8091 172.23.104.70:8091 172.23.106.100:8091 172.23.108.103:8091 172.23.97.119:8091 172.23.97.121:8091 172.23.97.239:8091 172.23.99.21:8091 172.23.99.25:8091]  ###########
      ######  eventing : 3 ===== > [172.23.104.5:8091 172.23.123.27:8091 172.23.98.135:8091]  ###########
      ######  cbas : 3 ===== > [172.23.105.107:8091 172.23.106.188:8091 172.23.99.20:8091]  ###########
      ######  backup : 1 ===== > [172.23.123.28:8091]  ###########
      ######  n1ql : 2 ===== > [172.23.97.242:8091 172.23.99.11:8091]  ###########
      ######  index : 8 ===== > [172.23.104.137:8091 172.23.105.111:8091 172.23.120.245:8091 172.23.121.117:8091 172.23.121.3:8091 172.23.96.251:8091 172.23.96.252:8091 172.23.96.253:8091]  ###########
      

      During this time, the test was doing a rebalance:

      2021-11-04T11:17:48-07:00, sequoiatools/couchbase-cli:7.1:d9c3e7] rebalance -c 172.23.108.103:8091 -u Administrator -p password
      [2021-11-04T12:07:45-07:00, sequoiatools/cmd:420973] 60
      

      Steps to reproduce
      1) Spin up a cluster with KV
      2) Create some collections
      3) Run a backup
      4) Run another backup
      5) Merge the backups
      6) Run a backup
      7) Merge the backups

      We should see a panic due to an incorrectly persisted data range file.

      Logs:
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.104.137.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.104.155.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.104.157.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.104.5.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.104.67.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.104.70.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.105.107.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.105.111.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.106.188.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.108.103.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.120.245.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.121.117.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.121.3.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.123.27.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.123.28.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.96.148.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.96.251.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.96.252.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.96.253.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.97.119.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.97.121.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.97.122.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.97.239.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.97.242.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.98.135.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.99.11.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1636054666/collectinfo-2021-11-04T193749-ns_1%40172.23.99.20.zip

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-49379
          # Subject Branch Project Status CR V

          Activity

            People

              arunkumar Arunkumar Senthilnathan (Inactive)
              arunkumar Arunkumar Senthilnathan (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty