Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-20759

False positive data race on DcpConnMap::numActiveSnoozingBackfills

    XMLWordPrintable

Details

    • Untriaged
    • Unknown

    Description

      Seen during investigation of MB-20751 - it believes that numActiveSnoozingBackfills is accessed without locks/non-atomically in two threads, however we use our own spin lock to guard this:

      memcached<0.76.0>: WARNING: ThreadSanitizer: data race (pid=23569)
      memcached<0.76.0>:   Read of size 2 at 0x7d840000eca2 by thread T8 (mutexes: write M294, read M27095, write M66205, write M102676, write M95235):
      memcached<0.76.0>:     #0 DcpConnMap::canAddBackfillToActiveQ() /home/daver/repos/couchbase/server/ep-engine/src/connmap.cc:1308 (ep.so+0x000000045ac5)
      memcached<0.76.0>:     #1 BackfillManager::schedule(SingleThreadedRCPtr<Stream>, unsigned long, unsigned long) /home/daver/repos/couchbase/server/ep-engine/src/dcp/backfill-manager.cc:142 (ep.so+0x00000005b0eb)
      memcached<0.76.0>:     #2 DcpProducer::scheduleBackfillManager(SingleThreadedRCPtr<Stream>, unsigned long, unsigned long) /home/daver/repos/couchbase/server/ep-engine/src/dcp/producer.cc:702 (ep.so+0x000000078fe3)
      memcached<0.76.0>:     #3 ActiveStream::scheduleBackfill_UNLOCKED(bool) /home/daver/repos/couchbase/server/ep-engine/src/dcp/stream.cc:1016 (ep.so+0x00000008f280)
      memcached<0.76.0>:     #4 ActiveStream::transitionState(stream_state_t) /home/daver/repos/couchbase/server/ep-engine/src/dcp/stream.cc:1145 (ep.so+0x000000090589)
      memcached<0.76.0>:     #5 ActiveStream::setActive() /home/daver/repos/couchbase/server/ep-engine/src/dcp/stream.h:204 (ep.so+0x00000009958e)
      memcached<0.76.0>:     #6 DcpProducer::streamRequest(unsigned int, unsigned int, unsigned short, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long, unsigned long*, ENGINE_ERROR_CODE (*)(vbucket_failover_t*, unsigned long, void const*)) /home/daver/repos/couchbase/server/ep-engine/src/dcp/producer.cc:327 (ep.so+0x00000007f85d)
      memcached<0.76.0>:     #7 EvpDcpStreamReq /home/daver/repos/couchbase/server/ep-engine/src/ep_engine.cc:1573 (ep.so+0x0000000cea78)
      memcached<0.76.0>:     #8 dcp_stream_req_executor /home/daver/repos/couchbase/server/memcached/daemon/mcbp_executors.cc:2272 (memcached+0x00000045925c)
      memcached<0.76.0>:     #9 process_bin_packet /home/daver/repos/couchbase/server/memcached/daemon/mcbp_executors.cc:4650 (memcached+0x00000046481d)
      memcached<0.76.0>:     #10 mcbp_complete_nread(McbpConnection*) /home/daver/repos/couchbase/server/memcached/daemon/mcbp_executors.cc:4759 (memcached+0x00000046481d)
      memcached<0.76.0>:     #11 conn_nread(McbpConnection*) /home/daver/repos/couchbase/server/memcached/daemon/statemachine_mcbp.cc:314 (memcached+0x000000472678)
      memcached<0.76.0>:     #12 McbpStateMachine::execute(McbpConnection&) /home/daver/repos/couchbase/server/memcached/daemon/statemachine_mcbp.h:43 (memcached+0x000000447054)
      memcached<0.76.0>:     #13 McbpConnection::runStateMachinery() /home/daver/repos/couchbase/server/memcached/daemon/connection_mcbp.cc:1003 (memcached+0x000000447054)
      memcached<0.76.0>:     #14 McbpConnection::runEventLoop(short) /home/daver/repos/couchbase/server/memcached/daemon/connection_mcbp.cc:1274 (memcached+0x0000004470dd)
      memcached<0.76.0>:     #15 run_event_loop /home/daver/repos/couchbase/server/memcached/daemon/connections.cc:147 (memcached+0x00000044b9e9)
      memcached<0.76.0>:     #16 event_handler(int, short, void*) /home/daver/repos/couchbase/server/memcached/daemon/memcached.cc:851 (memcached+0x00000041466c)
      memcached<0.76.0>:     #17 event_persist_closure /home/couchbase/serverjenkins/workspace/cbdeps-platform-build/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1319 (libevent_core-2.0.so.5+0x00000000b6b7)
      memcached<0.76.0>:     #18 event_process_active_single_queue /home/couchbase/serverjenkins/workspace/cbdeps-platform-build/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1363 (libevent_core-2.0.so.5+0x00000000b6b7)
      memcached<0.76.0>:     #19 event_process_active /home/couchbase/serverjenkins/workspace/cbdeps-platform-build/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1438 (libevent_core-2.0.so.5+0x00000000b6b7)
      memcached<0.76.0>:     #20 event_base_loop /home/couchbase/serverjenkins/workspace/cbdeps-platform-build/deps/packages/build/libevent/libevent-prefix/src/libevent/event.c:1639 (libevent_core-2.0.so.5+0x00000000b6b7)
      memcached<0.76.0>:     #21 CouchbaseThread::run() /home/daver/repos/couchbase/server/platform/src/cb_pthreads.cc:54 (libplatform.so.0.1.0+0x0000000057a5)
      memcached<0.76.0>:     #22 platform_thread_wrap /home/daver/repos/couchbase/server/platform/src/cb_pthreads.cc:66 (libplatform.so.0.1.0+0x0000000057a5)
      memcached<0.76.0>: 
      memcached<0.76.0>:   Previous write of size 2 at 0x7d840000eca2 by thread T55:
      memcached<0.76.0>:     #0 DcpConnMap::decrNumActiveSnoozingBackfills() /home/daver/repos/couchbase/server/ep-engine/src/connmap.cc:1319 (ep.so+0x000000045b7b)
      memcached<0.76.0>:     #1 BackfillManager::backfill() /home/daver/repos/couchbase/server/ep-engine/src/dcp/backfill-manager.cc:273 (ep.so+0x00000005a783)
      memcached<0.76.0>:     #2 BackfillManagerTask::run() /home/daver/repos/couchbase/server/ep-engine/src/dcp/backfill-manager.cc:62 (ep.so+0x00000005ac1c)
      memcached<0.76.0>:     #3 ExecutorThread::run() /home/daver/repos/couchbase/server/ep-engine/src/executorthread.cc:115 (ep.so+0x000000108d96)
      memcached<0.76.0>:     #4 launch_executor_thread /home/daver/repos/couchbase/server/ep-engine/src/executorthread.cc:33 (ep.so+0x000000109675)
      memcached<0.76.0>:     #5 CouchbaseThread::run() /home/daver/repos/couchbase/server/platform/src/cb_pthreads.cc:54 (libplatform.so.0.1.0+0x0000000057a5)
      memcached<0.76.0>:     #6 platform_thread_wrap /home/daver/repos/couchbase/server/platform/src/cb_pthreads.cc:66 (libplatform.so.0.1.0+0x0000000057a5)
      

      We either need to change to an atomic incr / deci (probably preferred), and/or teach ThreadSanitizer that our SpinLocks are valid mutexes.

      Attachments

        Activity

          People

            drigby Dave Rigby (Inactive)
            drigby Dave Rigby (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty