Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59857

BP [7.2.4] - XDCR - Deadlock in MapShaRefCounter while cleaning up

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 7.2.4
    • 7.6.0, 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.0.5, 7.1.0, 7.1.1, 7.1.2, 7.1.3
    • XDCR
    • Untriaged
    • 0
    • No

    Description

      The cleanup of MapShaRefCounter is stuck at the case statement below, waiting for value to be enqueued in the channel c.singleUpsert

      func (c *MapShaRefCounter) DelAndCleanup() error {
          c.lock.Lock()
          defer c.lock.Unlock()
          // Don't allow any upserts to occur concurrently - force hold lock
          select { 
          case <-c.singleUpsert:
              c.metadataSvc.Del(c.metakvOpKey, nil /*revision*/)
          }   
          close(c.singleUpsert)
          return nil
      } 

      The callgraph for DelAndCleanup is:

      1 @ 0x43d376 0x40b5ac 0x40afd8 0x92ea05 0x92bff2 0x9385c6 0x46cde1
      #       0x92ea04        github.com/couchbase/goxdcr/metadata_svc.(*MapShaRefCounter).DelAndCleanup+0x84                 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/ShaRefCounterService.go:598
      #       0x92bff1        github.com/couchbase/goxdcr/metadata_svc.(*ShaRefCounterService).CleanupMapping+0x3f1           /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/ShaRefCounterService.go:245
      #       0x9385c5        github.com/couchbase/goxdcr/metadata_svc.(*CheckpointsService).DelCheckpointsDocs.func3+0xa5    /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/checkpoints_service.go:199 

      The value in the channel c.singleUpsert will be enqueued in the upsertMapping() as below (see the upsertCh):

      func (c *MapShaRefCounter) upsertMapping(specInternalId string, cleanup bool) error {
          c.lock.RLock()
          needToSync := c.needToSync
          needToSyncRev := c.needToSyncRevision
          upsertCh := c.singleUpsert
          c.lock.RUnlock() 
          ...     
          defer func() {
              if err == nil { 
                  var needToReUpsert bool
                  c.lock.Lock()
                  if c.needToSyncRevision == needToSyncRev {
                      c.setNeedToSyncNoLock(false)
                  } else {
                      // Someone jumped in, will need to re-upsert
                      needToReUpsert = true
                  }
                  c.lock.Unlock()
                  if needToReUpsert {
                      defer c.upsertMapping(specInternalId, cleanup)
                  }
              }
              upsertCh <- true
          }()
      ...
      } 

      But the function is stuck at acquiring the lock c.lock.Lock() and hence it does not get a chance to enqueue into upsertCh

      The callgraph for upserMapping is:

      1 @ 0x43d376 0x44ddd3 0x44ddad 0x468e05 0x483485 0x484a76 0x484a55 0x92e813 0x92e417 0x92ed7c 0x92c188 0x93fab3 0x9c9e82 0x9c8fdc 0x46cde1
      #       0x468e04        sync.runtime_SemacquireMutex+0x24                                                                               /home/couchbase/.cbdepscache/exploded/x86_64/go-1.18
      .7/go/src/runtime/sema.go:71
      #       0x483484        sync.(*Mutex).lockSlow+0x164                                                                                    /home/couchbase/.cbdepscache/exploded/x86_64/go-1.18
      .7/go/src/sync/mutex.go:162
      #       0x484a75        sync.(*Mutex).Lock+0x35                                                                                         /home/couchbase/.cbdepscache/exploded/x86_64/go-1.18
      .7/go/src/sync/mutex.go:81
      #       0x484a54        sync.(*RWMutex).Lock+0x14                                                                                       /home/couchbase/.cbdepscache/exploded/x86_64/go-1.18
      .7/go/src/sync/rwmutex.go:139
      #       0x92e812        github.com/couchbase/goxdcr/metadata_svc.(*MapShaRefCounter).upsertMapping.func1+0x92                           /home/couchbase/jenkins/workspace/couchbase-server-u
      nix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/ShaRefCounterService.go:526
      #       0x92e416        github.com/couchbase/goxdcr/metadata_svc.(*MapShaRefCounter).upsertMapping+0x636                                /home/couchbase/jenkins/workspace/couchbase-server-u
      nix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/ShaRefCounterService.go:587
      #       0x92ed7b        github.com/couchbase/goxdcr/metadata_svc.(*MapShaRefCounter).ReInitUsingMergedMappingDoc+0x27b                  /home/couchbase/jenkins/workspace/couchbase-server-u
      nix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/ShaRefCounterService.go:640
      #       0x92c187        github.com/couchbase/goxdcr/metadata_svc.(*ShaRefCounterService).reInitUsingMergedMappingDoc+0xe7               /home/couchbase/jenkins/workspace/couchbase-server-u
      nix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/ShaRefCounterService.go:258
      #       0x93fab2        github.com/couchbase/goxdcr/metadata_svc.(*CheckpointsService).UpsertAndReloadCheckpointCompleteSet+0x152       /home/couchbase/jenkins/workspace/couchbase-server-u
      nix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/checkpoints_service.go:1033
      #       0x9c9e81        github.com/couchbase/goxdcr/pipeline_svc.(*CheckpointManager).mergeAndPersistBrokenMappingDocsAndCkpts+0x3e1    /home/couchbase/jenkins/workspace/couchbase-server-u
      nix/goproj/src/github.com/couchbase/goxdcr/pipeline_svc/checkpoint_manager.go:3272
      #       0x9c8fdb        github.com/couchbase/goxdcr/pipeline_svc.(*CheckpointManager).mergeFinalCkpts.func1+0x39b                       /home/couchbase/jenkins/workspace/couchbase-server-u
      nix/goproj/src/github.com/couchbase/goxdcr/pipeline_svc/checkpoint_manager.go:3168 

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ayush.nayyar Ayush Nayyar
              sudeep.jathar Sudeep Jathar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty