Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-36557

TSan: lock-order-inversion between VBucket::stateLock and ConnMap::connsLock

    XMLWordPrintable

Details

    • Triaged
    • Yes
    • KV-Engine Mad-Hatter GA

    Description

      As seen during the kv-engine-post-commit-Tsan job for patch http://review.couchbase.org/#/c/116358/ (MB-36372), there is a lock-order-inversion between VBucket::stateLock and ConnMap::connsLock - they are acquired in different orders in KVBucket::setVBucketState and DcpConnMap::disconnect:

      ThreadSanitizer: lock-order-inversion (potential deadlock)(install/bin/../lib/libtsan.so.0+0x5b63d)AnnotateRWLockAcquired
       
        Cycle in lock order graph: M454999369234753584 (0x000000000000) => M494281 (VBucket::stateLock) => M454999369234753584
       
        Mutex M494281 (VBucket::stateLock) acquired here while holding mutex M454999369234753584 in thread T6:
          #0 AnnotateRWLockAcquired  (libtsan.so.0+0x00000005b63d)
          ...
          #6 ActiveStream::setDead(end_stream_status_t) kv_engine/engines/ep/src/dcp/active_stream.cc:1256 (ep.so+0x0000000ac6b4)
          ...
          #9 DcpProducer::setDisconnect() kv_engine/engines/ep/src/dcp/producer.cc:1581 (ep.so+0x00000010af03)
          #10 DcpConnMap::disconnect(void const*) kv_engine/engines/ep/src/dcp/dcpconnmap.cc:330 (ep.so+0x0000000dc6d1)
          #11 EventuallyPersistentEngine::handleDisconnect(void const*) kv_engine/engines/ep/src/ep_engine.cc:6174 (ep.so+0x00000017562b)
          ...
       
        Mutex M454999369234753584 (ConnMap::connsLock) previously acquired by the same thread here:
          #0 pthread_mutex_lock  (libtsan.so.0+0x00000003876f)
          #1 __gthread_mutex_lock /usr/local/include/c++/7.3.0/x86_64-pc-linux-gnu/bits/gthr-default.h:748 (memcached+0x00000043b06f)
          #2 std::mutex::lock() /usr/local/include/c++/7.3.0/bits/std_mutex.h:103 (memcached+0x00000043b06f)
          #3 std::lock_guard::lock_guard(std::mutex&) /usr/local/include/c++/7.3.0/bits/std_mutex.h:162 (ep.so+0x0000000dc3b5)
          #4 DcpConnMap::disconnect(void const*) kv_engine/engines/ep/src/dcp/dcpconnmap.cc:316 (ep.so+0x0000000dc3b5)
          #5 EventuallyPersistentEngine::handleDisconnect(void const*) kv_engine/engines/ep/src/ep_engine.cc:6174 (ep.so+0x00000017562b)
          ...
       
        Mutex M454999369234753584 (ConnMap::connsLock) acquired here while holding mutex M494281 in thread T8:
          #0 pthread_mutex_lock  (libtsan.so.0+0x00000003876f)
          ...
          #4 DcpConnMap::vbucketStateChanged(Vbid, vbucket_state_t, bool) kv_engine/engines/ep/src/dcp/dcpconnmap.cc:240 (ep.so+0x0000000d7b7e)
          #5 KVBucket::setVBucketState_UNLOCKED(std::shared_ptr&, vbucket_state_t, nlohmann::basic_json, std::allocator >, bool, long, unsigned long, double, std::allocator, nlohmann::adl_serializer> const&, TransferVB, bool, std::unique_lock&, folly::SharedMutexImpl::WriteHolder&) kv_engine/engines/ep/src/kv_bucket.cc:910 (ep.so+0x0000002200e8)
          #6 KVBucket::setVBucketState(Vbid, vbucket_state_t, nlohmann::basic_json, std::allocator >, bool, long, unsigned long, double, std::allocator, nlohmann::adl_serializer> const&, TransferVB, void const*) kv_engine/engines/ep/src/kv_bucket.cc:857 (ep.so+0x000000220c72)
          #7 EventuallyPersistentEngine::setVBucketState(...) kv_engine/engines/ep/src/ep_engine.cc:6505 (ep.so+0x000000175932)
          ...
       
        Mutex M494281 (VBucket::stateLock) previously acquired by the same thread here:
          #0 AnnotateRWLockAcquired  (libtsan.so.0+0x00000005b63d)
          ...
          #6 KVBucket::setVBucketState(Vbid, vbucket_state_t, nlohmann::basic_json, std::allocator >, bool, long, unsigned long, double, std::allocator, nlohmann::adl_serializer> const&, TransferVB, void const*) kv_engine/engines/ep/src/kv_bucket.cc:856 (ep.so+0x000000220c41)
          #7 EventuallyPersistentEngine::setVBucketState(...) kv_engine/engines/ep/src/ep_engine.cc:6505 (ep.so+0x000000175932)
          ...
      

      Link to TSan report: http://cv.jenkins.couchbase.com/job/kv_engine-master-post-commit-TSan/660/ThreadSanitizer/type.2130731106/

      Note there's other TSan reported issues there, but they don't seem directly related; however subsequent builds also report the above error so pretty confident the aforementioned patch is the cause of this problem.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              drigby Dave Rigby (Inactive)
              drigby Dave Rigby (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty