Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-36557

TSan: lock-order-inversion between VBucket::stateLock and ConnMap::connsLock

    XMLWordPrintable

Details

    • Triaged
    • Yes
    • KV-Engine Mad-Hatter GA

    Description

      As seen during the kv-engine-post-commit-Tsan job for patch http://review.couchbase.org/#/c/116358/ (MB-36372), there is a lock-order-inversion between VBucket::stateLock and ConnMap::connsLock - they are acquired in different orders in KVBucket::setVBucketState and DcpConnMap::disconnect:

      ThreadSanitizer: lock-order-inversion (potential deadlock)(install/bin/../lib/libtsan.so.0+0x5b63d)AnnotateRWLockAcquired
       
        Cycle in lock order graph: M454999369234753584 (0x000000000000) => M494281 (VBucket::stateLock) => M454999369234753584
       
        Mutex M494281 (VBucket::stateLock) acquired here while holding mutex M454999369234753584 in thread T6:
          #0 AnnotateRWLockAcquired  (libtsan.so.0+0x00000005b63d)
          ...
          #6 ActiveStream::setDead(end_stream_status_t) kv_engine/engines/ep/src/dcp/active_stream.cc:1256 (ep.so+0x0000000ac6b4)
          ...
          #9 DcpProducer::setDisconnect() kv_engine/engines/ep/src/dcp/producer.cc:1581 (ep.so+0x00000010af03)
          #10 DcpConnMap::disconnect(void const*) kv_engine/engines/ep/src/dcp/dcpconnmap.cc:330 (ep.so+0x0000000dc6d1)
          #11 EventuallyPersistentEngine::handleDisconnect(void const*) kv_engine/engines/ep/src/ep_engine.cc:6174 (ep.so+0x00000017562b)
          ...
       
        Mutex M454999369234753584 (ConnMap::connsLock) previously acquired by the same thread here:
          #0 pthread_mutex_lock  (libtsan.so.0+0x00000003876f)
          #1 __gthread_mutex_lock /usr/local/include/c++/7.3.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748 (memcached+0x00000043b06f)
          #2 std::mutex::lock() /usr/local/include/c++/7.3.0/bits/std_mutex.h:103 (memcached+0x00000043b06f)
          #3 std::lock_guard::lock_guard(std::mutex&) /usr/local/include/c++/7.3.0/bits/std_mutex.h:162 (ep.so+0x0000000dc3b5)
          #4 DcpConnMap::disconnect(void const*) kv_engine/engines/ep/src/dcp/dcpconnmap.cc:316 (ep.so+0x0000000dc3b5)
          #5 EventuallyPersistentEngine::handleDisconnect(void const*) kv_engine/engines/ep/src/ep_engine.cc:6174 (ep.so+0x00000017562b)
          ...
       
        Mutex M454999369234753584 (ConnMap::connsLock) acquired here while holding mutex M494281 in thread T8:
          #0 pthread_mutex_lock  (libtsan.so.0+0x00000003876f)
          ...
          #4 DcpConnMap::vbucketStateChanged(Vbid, vbucket_state_t, bool) kv_engine/engines/ep/src/dcp/dcpconnmap.cc:240 (ep.so+0x0000000d7b7e)
          #5 KVBucket::setVBucketState_UNLOCKED(std::shared_ptr&, vbucket_state_t, nlohmann::basic_json, std::allocator >, bool, long, unsigned long, double, std::allocator, nlohmann::adl_serializer> const&, TransferVB, bool, std::unique_lock&, folly::SharedMutexImpl::WriteHolder&) kv_engine/engines/ep/src/kv_bucket.cc:910 (ep.so+0x0000002200e8)
          #6 KVBucket::setVBucketState(Vbid, vbucket_state_t, nlohmann::basic_json, std::allocator >, bool, long, unsigned long, double, std::allocator, nlohmann::adl_serializer> const&, TransferVB, void const*) kv_engine/engines/ep/src/kv_bucket.cc:857 (ep.so+0x000000220c72)
          #7 EventuallyPersistentEngine::setVBucketState(...) kv_engine/engines/ep/src/ep_engine.cc:6505 (ep.so+0x000000175932)
          ...
       
        Mutex M494281 (VBucket::stateLock) previously acquired by the same thread here:
          #0 AnnotateRWLockAcquired  (libtsan.so.0+0x00000005b63d)
          ...
          #6 KVBucket::setVBucketState(Vbid, vbucket_state_t, nlohmann::basic_json, std::allocator >, bool, long, unsigned long, double, std::allocator, nlohmann::adl_serializer> const&, TransferVB, void const*) kv_engine/engines/ep/src/kv_bucket.cc:856 (ep.so+0x000000220c41)
          #7 EventuallyPersistentEngine::setVBucketState(...) kv_engine/engines/ep/src/ep_engine.cc:6505 (ep.so+0x000000175932)
          ...
      

      Link to TSan report: http://cv.jenkins.couchbase.com/job/kv_engine-master-post-commit-TSan/660/ThreadSanitizer/type.2130731106/

      Note there's other TSan reported issues there, but they don't seem directly related; however subsequent builds also report the above error so pretty confident the aforementioned patch is the cause of this problem.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-36557
          # Subject Branch Project Status CR V

          Activity

            People

              drigby Dave Rigby (Inactive)
              drigby Dave Rigby (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty