Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-39811

Crash and hang during GetRangeSplitItems for init right siblings

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Critical
    • Morpheus
    • 5.5.3
    • storage-engine
    • Untriaged
    • 1
    • Unknown

    Description

      In the related CBSE-8368, there is a problematic index. During recovery of this index, there is a crash-restart loop with the following stacktrace:

      2020-03-03T07:33:42.815+00:00 [Info] ServiceMgr::GetCurrentTopology [0 0 0 0 0 0 0 1]
      unexpected fault address 0x0
      fatal error: fault
      [signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x6c1e35]goroutine 731 [running]:
      runtime.throw(0xf8d3c4, 0x5)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/panic.go:566 +0x95 fp=0xc4281c3308 sp=0xc4281c32e8
      runtime.sigpanic()
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/sigpanic_unix.go:27 +0x288 fp=0xc4281c3360 sp=0xc4281c3308
      sync/atomic.LoadUint64(0x414e6c696e6622, 0x414e6c696e65736c)
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/sync/atomic/asm_amd64.s:102 +0x5 fp=0xc4281c3368 sp=0xc4281c3360
      github.com/couchbase/nitro/skiplist.(*Node).getNext(0x414e6c696e6573, 0x9, 0x414e6c696e6573, 0x0)
      	goproj/src/github.com/couchbase/nitro/skiplist/node_amd64.go:108 +0x4d fp=0xc4281c3388 sp=0xc4281c3368
      github.com/couchbase/nitro/skiplist.(*Skiplist).GetRangeSplitItems(0xc42011e9c0, 0x1, 0x0, 0x0, 0x1)
      	goproj/src/github.com/couchbase/nitro/skiplist/skiplist.go:419 +0x1ad fp=0xc4281c3428 sp=0xc4281c3388
      github.com/couchbase/plasma.(*Plasma).GetRangePartitions(0xc424de5400, 0x1, 0x0, 0x0, 0x0)
      	goproj/src/github.com/couchbase/plasma/page_visitor.go:89 +0x163 fp=0xc4281c34d8 sp=0xc4281c3428
      github.com/couchbase/plasma.(*Plasma).PageVisitor(0xc424de5400, 0xc4283c2c40, 0x1, 0xc42483f740, 0xc42483c5c0)
      	goproj/src/github.com/couchbase/plasma/page_visitor.go:33 +0x79 fp=0xc4281c3598 sp=0xc4281c34d8
      github.com/couchbase/plasma.(*Plasma).doRecovery(0xc424de5400, 0xc42483f420, 0x21)
      	goproj/src/github.com/couchbase/plasma/plasma.go:982 +0x47d fp=0xc4281c3670 sp=0xc4281c3598
      github.com/couchbase/plasma.New(0x1e, 0x12c, 0x5, 0x4, 0x106c5d8, 0x106c5a0, 0x106c5e0, 0xc424840620, 0xc424840630, 0x106c5a0, ...)
      	goproj/src/github.com/couchbase/plasma/plasma.go:598 +0x11b0 fp=0xc4281c3d80 sp=0xc4281c3670
      github.com/couchbase/indexing/secondary/indexer.(*plasmaSlice).initStores.func2(0xc4248405f0, 0xc4236c6d80, 0xc424840610, 0xc423d32a80)
      	goproj/src/github.com/couchbase/indexing/secondary/indexer/plasma_slice.go:310 +0x91 fp=0xc4281c3f80 sp=0xc4281c3d80
      runtime.goexit()
      	/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc4281c3f88 sp=0xc4281c3f80
      created by github.com/couchbase/indexing/secondary/indexer.(*plasmaSlice).initStores
      	goproj/src/github.com/couchbase/indexing/secondary/indexer/plasma_slice.go:315 +0x28f1
      

       

       

      This happens for about 1 month and then a recovery gets stuck while recovering the same index. The go routine dump shows that this also occurs in similar area to the crash:

      1 @ 0xcda020 0xcdc05d 0xa92983 0xa92279 0xa9c6dd 0xa9a6b0 0x63ec61 0x465931
      #	0xcda020	github.com/couchbase/nitro/skiplist.(*Node).getNext+0x0					goproj/src/github.com/couchbase/nitro/skiplist/node_amd64.go:104
      #	0xcdc05c	github.com/couchbase/nitro/skiplist.(*Skiplist).GetRangeSplitItems+0x1ac		goproj/src/github.com/couchbase/nitro/skiplist/skiplist.go:419
      #	0xa92982	github.com/couchbase/plasma.(*Plasma).GetRangePartitions+0x162				goproj/src/github.com/couchbase/plasma/page_visitor.go:89
      #	0xa92278	github.com/couchbase/plasma.(*Plasma).PageVisitor+0x78					goproj/src/github.com/couchbase/plasma/page_visitor.go:33
      #	0xa9c6dc	github.com/couchbase/plasma.(*Plasma).doRecovery+0x47c					goproj/src/github.com/couchbase/plasma/plasma.go:982
      #	0xa9a6af	github.com/couchbase/plasma.New+0x11af							goproj/src/github.com/couchbase/plasma/plasma.go:598
      #	0x63ec60	github.com/couchbase/indexing/secondary/indexer.(*plasmaSlice).initStores.func2+0x90	goproj/src/github.com/couchbase/indexing/secondary/indexer/plasma_slice.go:310
      

       

      Logs show that the issues is happening for the backstore of the problematic index.

       

      Also, this index appears to have very large keys ~145K.

       

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            akhil.mundroy Akhil Mundroy
            akhil.mundroy Akhil Mundroy
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty