Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-57234

[BP 7.2.1] - XDCR - repeated spec del on non-KV node will hang because handler is not running

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 7.2.1
    • 7.0.0, 7.0.1, 7.0.2, 7.0.3, 7.0.4, 7.1.4, 7.0.5, 7.1.0, 7.1.1, 7.1.2, 7.1.3
    • XDCR
    • Untriaged
    • 0
    • No

    Description

      Issue Resolution
      When a replication spec change was made to a non-Data Service node, delete replication hung and caused the node to return an incorrect replication configuration. XDCR now checks that the node is running the Data Service and handles it correctly.

       

      1. Create a 2-node source cluster, 1-node target cluster. 1 source node is KV and 1 source node is back up (anything but KV)
      2. Create replication, delete replication, repeat 6x, which will fill up the channel.
      3. Capture go-routine of the non-KV source node, and we can see the lock contention:

      2 @ 0x10003e216 0x10004f3cf 0x10004f3a6 0x10006d9a6 0x10008d6e5 0x1005d950e 0x1005d94ea 0x1005d92b5 0x1005d92b6 0x100b5c0bf 0x100072201
      #       0x10006d9a5     sync.runtime_SemacquireMutex+0x25                                                                       /Users/neil.huang/.cbdepscache/exploded/x86_64/go-1.20/go/src/runtime/sema.go:77
      #       0x10008d6e4     sync.(*Mutex).lockSlow+0x164                                                                            /Users/neil.huang/.cbdepscache/exploded/x86_64/go-1.20/go/src/sync/mutex.go:171
      #       0x1005d950d     sync.(*Mutex).Lock+0x8d                                                                                 /Users/neil.huang/.cbdepscache/exploded/x86_64/go-1.20/go/src/sync/mutex.go:90
      #       0x1005d94e9     github.com/couchbase/goxdcr/metadata_svc.(*ReplicationSpecService).updateCacheInternal+0x69             /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/metadata_svc/replication_spec_service.go:1164
      #       0x1005d92b4     github.com/couchbase/goxdcr/metadata_svc.(*ReplicationSpecService).updateCache+0x214                    /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/metadata_svc/replication_spec_service.go:1159
      #       0x1005d92b5     github.com/couchbase/goxdcr/metadata_svc.(*ReplicationSpecService).ReplicationSpecServiceCallback+0x215 /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/metadata_svc/replication_spec_service.go:1122
      #       0x100b5c0be     github.com/couchbase/goxdcr/replication_manager.(*MetakvChangeListener).metakvCallback_async+0x5e       /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/replication_manager/metakv_change_listener.go:97
      

      1 @ 0x10003e216 0x10000944e 0x100008ffd 0x100673fa6 0x1005daaa3 0x1005da5ea 0x1005d983b 0x1005d92b5 0x1005d92b6 0x100b5c0bf 0x100072201
      #       0x100673fa5     github.com/couchbase/goxdcr/backfill_manager.(*BackfillMgr).ReplicationSpecChangeCallback+0x605         /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/backfill_manager/backfill_manager.go:619
      #       0x1005daaa2     github.com/couchbase/goxdcr/metadata_svc.(*ReplicationSpecService).executeCallbackWithPriority+0x162    /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/metadata_svc/replication_spec_service.go:1327
      #       0x1005da5e9     github.com/couchbase/goxdcr/metadata_svc.(*ReplicationSpecService).callMetadataChangeCb+0x329           /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/metadata_svc/replication_spec_service.go:1293
      #       0x1005d983a     github.com/couchbase/goxdcr/metadata_svc.(*ReplicationSpecService).updateCacheInternal+0x3ba            /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/metadata_svc/replication_spec_service.go:1195
      #       0x1005d92b4     github.com/couchbase/goxdcr/metadata_svc.(*ReplicationSpecService).updateCache+0x214                    /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/metadata_svc/replication_spec_service.go:1159
      #       0x1005d92b5     github.com/couchbase/goxdcr/metadata_svc.(*ReplicationSpecService).ReplicationSpecServiceCallback+0x215 /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/metadata_svc/replication_spec_service.go:1122
      #       0x100b5c0be     github.com/couchbase/goxdcr/replication_manager.(*MetakvChangeListener).metakvCallback_async+0x5e       /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/replication_manager/metakv_change_listener.go:97
      

      1 @ 0x10003e216 0x10004f3cf 0x10004f3a6 0x10006d9a6 0x10008d6e5 0x1005dcd8d 0x1005dcd69 0x10065a8b3 0x100662716 0x10008da82 0x100661867 0x100661835 0x100072201
      #       0x10006d9a5     sync.runtime_SemacquireMutex+0x25                                                       /Users/neil.huang/.cbdepscache/exploded/x86_64/go-1.20/go/src/runtime/sema.go:77
      #       0x10008d6e4     sync.(*Mutex).lockSlow+0x164                                                            /Users/neil.huang/.cbdepscache/exploded/x86_64/go-1.20/go/src/sync/mutex.go:171
      #       0x1005dcd8c     sync.(*Mutex).Lock+0x8c                                                                 /Users/neil.huang/.cbdepscache/exploded/x86_64/go-1.20/go/src/sync/mutex.go:90
      #       0x1005dcd68     github.com/couchbase/goxdcr/metadata_svc.(*ReplicationSpecService).SetDerivedObj+0x68   /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/metadata_svc/replication_spec_service.go:1573
      #       0x10065a8b2     github.com/couchbase/goxdcr/pipeline_manager.(*PipelineManager).StopPipeline+0x832      /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/pipeline_manager/pipeline_manager.go:658
      #       0x100662715     github.com/couchbase/goxdcr/pipeline_manager.(*PipelineUpdater).run.func1+0xe75         /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/pipeline_manager/pipeline_manager.go:1635
      #       0x10008da81     sync.(*Once).doSlow+0xc1                                                                /Users/neil.huang/.cbdepscache/exploded/x86_64/go-1.20/go/src/sync/once.go:74
      #       0x100661866     sync.(*Once).Do+0x46                                                                    /Users/neil.huang/.cbdepscache/exploded/x86_64/go-1.20/go/src/sync/once.go:65
      #       0x100661834     github.com/couchbase/goxdcr/pipeline_manager.(*PipelineUpdater).run+0x14                /Users/neil.huang/source/couchbase/goproj/src/github.com/couchbase/goxdcr/pipeline_manager/pipeline_manager.go:1620
      

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-57234
          # Subject Branch Project Status CR V

          Activity

            People

              ayush.nayyar Ayush Nayyar
              neil.huang Neil Huang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty