Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31318

CLONE - [System Test] Indexer node crashed during rebalance out of a data node

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 6.0.0
    • 5.5.0
    • secondary-index
    • centos2 cluster
    • Untriaged
    • No

    Description

      This bug is backport for MB-30899.

      Build : 5.5.0-2958 (RC4)
      Test : -test tests/2i/test_idx_rebalance_replica_vulcan_kv_opt.yml -scope tests/2i/scope_idx_rebalance_replica_vulcan_new.yml
      Iteration : 2 (after ~15 hrs of test run)

      The test has a step to rebalance out a data node. While this was step was in progress for ~3.5 hrs, one of the indexer nodes - 172.23.96.251 - crashed with the following panic, causing the rebalance operation to fail.

       
      2018-07-10T13:02:41.527-07:00 [Info] TK StreamBegin MAINT_STREAM other-1 919 19213254259634 97974
      2018-07-10T13:02:41.534-07:00 [Warn] StreamState::updateHWT Received Partial Last Snapshot in HWT Bucket other-1 StreamId MAINT_STREAM vbucket 919 Snapshot 97961-103258 Seqno 97974 Vbuuid 19213254259634 lastSnap 97961-101599 lastSnapSeqno 97974
      2018-07-10T13:02:41.860-07:00 [Info] ServiceMgr::rebalanceJanitor Running Periodic Cleanup
      2018-07-10T13:02:41.886-07:00 [Info] default/default_claims/Backstore#10628194403243651532:0 Plasma: logCleaner: starting... frag 31, data: 137915921, used: 199971086 log:(6332853364 - 6533533696)
      2018-07-10T13:02:41.887-07:00 [Info] default/default_claims/Mainstore#10628194403243651532:0 Plasma: logCleaner: starting... frag 31, data: 123121413, used: 178482601 log:(3336769976 - 3515736064)
      2018-07-10T13:02:41.891-07:00 [Info] default/default_claims/Backstore#10628194403243651532:0 Plasma: logCleaner: completed... frag 30, data: 137915161, used: 199868050, relocated: 2768332, retries: 17556, skipped: 507466 log:(6332853364 - 6533533696)
      2018-07-10T13:02:41.924-07:00 [Info] default/default_claims/Mainstore#10628194403243651532:0 Plasma: logCleaner: completed... frag 30, data: 123411143, used: 178788761, relocated: 692358, retries: 95, skipped: 173654 log:(3337993855 - 3516784640)
      2018-07-10T13:02:42.154-07:00 [Info] KVSender::sendMutationTopicRequest Success Projector 172.23.96.214:9999 Topic MAINT_STREAM_TOPIC_b339bb28442503bd16bbb1d072b01f16 other-1 InstanceIds [4374274174613210241 15666765576583350329 10628194403243651532 2110540235627860952 3957993379324274668 9967365399957918943 5639115208808567798 13111269798958670197]
      panic: runtime error: index out of range
       
      goroutine 366965823 [running]:
      panic(0xe3c4c0, 0xc4200160c0)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:500 +0x1a1 fp=0xc45ed3b4e8 sp=0xc45ed3b458
      runtime.panicindex()
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:27 +0x6d fp=0xc45ed3b518 sp=0xc45ed3b4e8
      github.com/couchbase/indexing/secondary/protobuf/projector.(*TsVbuuid).Swap(0xc435bec870, 0x2d3, 0x3c3)
              goproj/src/github.com/couchbase/indexing/secondary/protobuf/projector/common.go:404 +0x14a fp=0xc45ed3b550 sp=0xc45ed3b518
      sort.medianOfThree(0x1832d20, 0xc435bec870, 0x3c3, 0x34b, 0x2d3)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/sort/sort.go:79 +0xb9 fp=0xc45ed3b580 sp=0xc45ed3b550
      sort.doPivot(0x1832d20, 0xc435bec870, 0x0, 0x3c4, 0x19a0, 0xc4426d0a00)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/sort/sort.go:101 +0x68e fp=0xc45ed3b610 sp=0xc45ed3b580
      sort.quickSort(0x1832d20, 0xc435bec870, 0x0, 0x3c4, 0x13)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/sort/sort.go:188 +0x83 fp=0xc45ed3b670 sp=0xc45ed3b610
      sort.Sort(0x1832d20, 0xc435bec870)
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/sort/sort.go:222 +0x80 fp=0xc45ed3b6b0 sp=0xc45ed3b670
      github.com/couchbase/indexing/secondary/protobuf/projector.(*TsVbuuid).Union(0xc420140ab0, 0xc435bec7e0, 0xc4383ef673)
              goproj/src/github.com/couchbase/indexing/secondary/protobuf/projector/common.go:251 +0x8c0 fp=0xc45ed3b888 sp=0xc45ed3b6b0
      github.com/couchbase/indexing/secondary/indexer.updateActiveTsFromResponse(0xc4383ef673, 0x7, 0xc420140ab0, 0xc44497c690, 0xc4363ac090)
              goproj/src/github.com/couchbase/indexing/secondary/indexer/kv_sender.go:900 +0xf8 fp=0xc45ed3b8e0 sp=0xc45ed3b888
      github.com/couchbase/indexing/secondary/indexer.(*kvSender).openMutationStream.func1.1()
              goproj/src/github.com/couchbase/indexing/secondary/indexer/kv_sender.go:293 +0x199 fp=0xc45ed3ba90 sp=0xc45ed3b8e0
      github.com/couchbase/indexing/secondary/indexer.execWithStopCh(0xc45ed3bb68, 0xc43eb4a060, 0xc4358a2e00)
              goproj/src/github.com/couchbase/indexing/secondary/indexer/kv_sender.go:1319 +0x53 fp=0xc45ed3bac0 sp=0xc45ed3ba90
      github.com/couchbase/indexing/secondary/indexer.(*kvSender).openMutationStream.func1(0x0, 0x0, 0x0, 0xa0, 0xf1a1c0)
              goproj/src/github.com/couchbase/indexing/secondary/indexer/kv_sender.go:300 +0x1f6 fp=0xc45ed3bbf0 sp=0xc45ed3bac0
      github.com/couchbase/indexing/secondary/common.(*RetryHelper).Run(0xc45ed3be50, 0xc4286f5400, 0xc453f29c82)
              goproj/src/github.com/couchbase/indexing/secondary/common/retry_helper.go:36 +0x52 fp=0xc45ed3bc38 sp=0xc45ed3bbf0
      github.com/couchbase/indexing/secondary/indexer.(*kvSender).openMutationStream(0xc42017ca80, 0xc4211e0001, 0xc4426d8000, 0x1, 0x1, 0xc4403e3730, 0xc43eb4a000, 0xc43eb4a060)
              goproj/src/github.com/couchbase/indexing/secondary/indexer/kv_sender.go:326 +0x96e fp=0xc45ed3bf50 sp=0xc45ed3bc38
      runtime.goexit()
              /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc45ed3bf58 sp=0xc45ed3bf50
      created by github.com/couchbase/indexing/secondary/indexer.(*kvSender).handleOpenStream
              goproj/src/github.com/couchbase/indexing/secondary/indexer/kv_sender.go:147 +0x1ec
      

      Logs on Supportal : http://supportal.couchbase.com/snapshot/dad95d23905d9834b7b373d62c7535cb::0

      Marking this as regression as this could be related to the fixes that went in for MB-30327 or MB-30376. Pls change it if investigation reveals this is not a regression.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            mihir.kamdar Mihir Kamdar (Inactive)
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty