Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-47760

[BP 7.0.2 MB-47749] - handleIndexMergeSnapshot() panic from extraneous s.muSnap.Unlock() calls

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      secondary/indexer/storage_manager.go handleIndexMergeSnapshot() does not lock, and its caller does not hold, the s.muSnap mutex (apparently it used to lock this at the top), but it still contains rarely-entered error reporting code blocks that unlock this mutex. If one of these error blocks gets entered, it triggers a panic.

      The fix is to delete the s.muSnap.Unlock() calls from this method. There are three of them (one is in commented-out code and two are in live code).
       

      Seen in currently undelivered new version of set14_rebalance_test.go TestFailoverAndRebalance:
      http://ci2i-unstable.northscale.in/gsi-04.08.2021-09.40.fail.html

      fatal error: sync: unlock of unlocked mutex
       
      goroutine 166 [running]:
      runtime.throw(0x131592e, 0x1e)
              /home/buildbot/.cbdepscache/exploded/x86_64/go-1.16.5/go/src/runtime/panic.go:1117 +0x72 fp=0xc006507ba0 sp=0xc006507b70 pc=0x43fdd2
      sync.throw(0x131592e, 0x1e)
              /home/buildbot/.cbdepscache/exploded/x86_64/go-1.16.5/go/src/runtime/panic.go:1103 +0x35 fp=0xc006507bc0 sp=0xc006507ba0 pc=0x474375
      sync.(*Mutex).unlockSlow(0xc004e96610, 0xffffffff)
              /home/buildbot/.cbdepscache/exploded/x86_64/go-1.16.5/go/src/sync/mutex.go:196 +0xd8 fp=0xc006507be8 sp=0xc006507bc0 pc=0x491e18
       
       
      Unlock call that panicked:
       
      sync.(*Mutex).Unlock(...)
              /home/buildbot/.cbdepscache/exploded/x86_64/go-1.16.5/go/src/sync/mutex.go:190
       
       
      Caused by s.muSnap.Unlock() in handleIndexMergeSnapshot (storage_manager.go:1720):
       
      github.com/couchbase/indexing/secondary/indexer.(*storageMgr).handleIndexMergeSnapshot(0xc004e96580, 0x14a4a80, 0xc00cdd0ff0)
              /opt/build/goproj/src/github.com/couchbase/indexing/secondary/indexer/storage_manager.go:1720 +0x699 fp=0xc006507f20 sp=0xc006507be8 pc=0xfb67f9
       
       
      github.com/couchbase/indexing/secondary/indexer.(*storageMgr).handleSupvervisorCommands(0xc004e96580, 0x14a4a80, 0xc00cdd0ff0)
              /opt/build/goproj/src/github.com/couchbase/indexing/secondary/indexer/storage_manager.go:224 +0x1b2 fp=0xc006507f58 sp=0xc006507f20 pc=0xfab8f2
      github.com/couchbase/indexing/secondary/indexer.(*storageMgr).run(0xc004e96580)
              /opt/build/goproj/src/github.com/couchbase/indexing/secondary/indexer/storage_manager.go:182 +0x48 fp=0xc006507fd8 sp=0xc006507f58 pc=0xfab628
      runtime.goexit()
              /home/buildbot/.cbdepscache/exploded/x86_64/go-1.16.5/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc006507fe0 sp=0xc006507fd8 pc=0x4792e1
      created by github.com/couchbase/indexing/secondary/indexer.NewStorageManager
              /opt/build/goproj/src/github.com/couchbase/indexing/secondary/indexer/storage_manager.go:157 +0x2e5
      

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-47760
          # Subject Branch Project Status CR V

          Activity

            Build couchbase-server-7.0.1-5990 contains indexing commit 6e28177 with commit message:
            MB-47760 Delete s.muSnap.Unlock() calls from handleIndexMergeSnapshot()

            build-team Couchbase Build Team added a comment - Build couchbase-server-7.0.1-5990 contains indexing commit 6e28177 with commit message: MB-47760 Delete s.muSnap.Unlock() calls from handleIndexMergeSnapshot()

            Kevin Cherkauer, can you please provide steps to validate this ticket? It doesn't look like verifiable using functional test.

            hemant.rajput Hemant Rajput added a comment - Kevin Cherkauer , can you please provide steps to validate this ticket? It doesn't look like verifiable using functional test.

            Hemant Rajput As luck would have it, the CI Bot run

              http://ci2i-unstable.northscale.in/gsi-17.08.2021-10.31.fail.html

            hit the same issue in rebalance, but with this fix already in place, and therefore it returned the following error message instead of panicking on mutex unlock:

            Duplicate partition 5 found between source 14955238945985669378 and target 17746317167073026776
            

            Also I verified from code inspection that the fix removed the mutex Unlock() call that threw the panic before.

            So you can close this.

             

             

            kevin.cherkauer Kevin Cherkauer added a comment - Hemant Rajput As luck would have it, the CI Bot run   http://ci2i-unstable.northscale.in/gsi-17.08.2021-10.31.fail.html hit the same issue in rebalance, but with this fix already in place, and therefore it returned the following error message instead of panicking on mutex unlock: Duplicate partition 5 found between source 14955238945985669378 and target 17746317167073026776 Also I verified from code inspection that the fix removed the mutex Unlock() call that threw the panic before. So you can close this.    

            Closing the ticket based on Kevin's comment

            hemant.rajput Hemant Rajput added a comment - Closing the ticket based on Kevin's comment

            People

              kevin.cherkauer Kevin Cherkauer
              jeelan.poola Jeelan Poola
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty