Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-29764

Indexer crashes with goroutine stack exceeds 1000000000-byte limit

    XMLWordPrintable

Details

    • Triaged
    • Unknown
    • Storage-Sprint-End-Jun-1-2018, Storage-Sprint-End-Jun-15-2018

    Description

      Indexer process goroutine stack exceeds 1000000000-byte limit fatal error: stack overflow and cause the indexer to crash.

      This is the stack

      StorageMgr::handleCreateSnapshot Added New Snapshot Index: 2954511192241179090 PartitionId: 0 SliceId: 0 Crc64: 3092221115143794419 (SnapshotInfo: count:10889206 committed:false) SnapCreateDur 62.255µs SnapOpenDur 1.078635ms
      runtime: goroutine stack exceeds 1000000000-byte limit
      fatal error: stack overflow
      runtime stack:
      runtime.throw(0xe730bc, 0xe)
      /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.3/go/src/runtime/panic.go:566 +0x95 fp=0x7f317bffeb88 sp=0x7f317bffeb68
      runtime.newstack()
      /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.3/go/src/runtime/stack.go:1061 +0x416 fp=0x7f317bffed08 sp=0x7f317bffeb88
      runtime.morestack()
      /home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.3/go/src/runtime/asm_amd64.s:366 +0x7f fp=0x7f317bffed10 sp=0x7f317bffed08
      goroutine 10783 [running]:
      github.com/couchbase/plasma.(*item).getPtrKeyItem(0xc4a6b64003, 0x0)
      /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/plasma/item.go:100 fp=0xc5894cc2b8 sp=0xc5894cc2b0
      github.com/couchbase/plasma.(*item).Size(0xc4a6b64003, 0x0)

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Build couchbase-server-6.0.0-1212 contains plasma commit 259ad19 with commit message:
            MB-29764 item: Add corruption check for item data

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.0.0-1212 contains plasma commit 259ad19 with commit message: MB-29764 item: Add corruption check for item data

            This fix only avoids the stack overflow problem, but the root cause of the memory corruption that caused the overflow is still not addressed. The root cause of the corruption is hard to determine. With this fix it is still possible to still run into a corrupted item, but instead of running into a stack overflow, a panic will occur and a stack will be dumped. That should provide more details and will hopefully lead to determining the root cause.

            Also, we have requested information on the document details, like key types, length and will try a reproduction with synthetic data. Another exercise planned is to look through the potential areas where unsafe memory access is performed.

            srinath.duvuru Srinath Duvuru added a comment - This fix only avoids the stack overflow problem, but the root cause of the memory corruption that caused the overflow is still not addressed. The root cause of the corruption is hard to determine. With this fix it is still possible to still run into a corrupted item, but instead of running into a stack overflow, a panic will occur and a stack will be dumped. That should provide more details and will hopefully lead to determining the root cause. Also, we have requested information on the document details, like key types, length and will try a reproduction with synthetic data. Another exercise planned is to look through the potential areas where unsafe memory access is performed.

            adding due date as 6/11, we'll continue to try to catch it in Vulcan but because this is a 5.1.2 maintenance ticket we will determine next week on whether this should block Vulcan.

            tai.tran Tai Tran (Inactive) added a comment - adding due date as 6/11, we'll continue to try to catch it in Vulcan but because this is a 5.1.2 maintenance ticket we will determine next week on whether this should block Vulcan.

            Sarath Lakshman thought that the various fixes in MB-29800 can remedy this problem, CBSS-74 will try to re-produce the problem. 

            tai.tran Tai Tran (Inactive) added a comment - Sarath Lakshman thought that the various fixes in MB-29800 can remedy this problem, CBSS-74 will try to re-produce the problem. 

            resolve this ticket along with MB-29800 for Vulcan, Srinath will continue to track it down for 5.1.2 via MB-29952 (a clone of this ticket) and with unit tests for ticket CBSS-74. 

            tai.tran Tai Tran (Inactive) added a comment - resolve this ticket along with MB-29800 for Vulcan, Srinath will continue to track it down for 5.1.2 via  MB-29952  (a clone of this ticket) and with unit tests for ticket CBSS-74. 

            People

              srinath.duvuru Srinath Duvuru
              krishna.doddi Krishna Doddi
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty