Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-34778

Deadlock in bucket logger code [ETA 2019/7/11]

    XMLWordPrintable

Details

    • Triaged
    • Unknown
    • KV-Engine Mad-Hatter Beta

    Description

      The following thread looks suspicious:

      Thread 32 (Thread 0x7f319c97d700 (LWP 15422)):
      #0  0x00007f319eb4951d in __lll_lock_wait () from /lib64/libpthread.so.0
      #1  0x00007f319eb44e1b in _L_lock_812 () from /lib64/libpthread.so.0
      #2  0x00007f319eb44ce8 in pthread_mutex_lock () from /lib64/libpthread.so.0
      #3  0x00007f31a2199d41 in cb::logger::unregisterSpdLogger(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /opt/couchbase/bin/../lib/libmemcached_logger.so.1.0.0
      #4  0x00007f3199fd634b in std::_Sp_counted_ptr<BucketLogger*, (__gnu_cxx::_Lock_policy)2>::_M_dispose() () from /opt/couchbase/bin/../lib/../lib/ep.so
      #5  0x00007f31a219fb81 in spdlog::details::registry::flush_all() () from /opt/couchbase/bin/../lib/libmemcached_logger.so.1.0.0
      #6  0x00007f31a21a0405 in _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN6spdlog7details15periodic_workerC4ERKSt8functionIFvvEENSt6chrono8durationIlSt5ratioILl1ELl1EEEEEUlvE_EEEEE6_M_runEv () from /opt/couchbase/bin/../lib/libmemcached_logger.so.1.0.0
      #7  0x00007f319f329dcf in execute_native_thread_routine () from /opt/couchbase/bin/../lib/libstdc++.so.6
      #8  0x00007f319eb42e25 in start_thread () from /lib64/libpthread.so.0
      #9  0x00007f319e86cbad in clone () from /lib64/libc.so.6
      

      The method spdlog::details::registry::flush_all() looks like:

          void flush_all()
          {
              std::lock_guard<std::mutex> lock(logger_map_mutex_);
              for (auto &l : loggers_)
              {
                  auto shared = l.second.lock();
                  if (shared)
                  {
                      shared->flush();
                  }
              }
          }
      

      which in turn calls cb::logger::unregisterSpdLogger which calls details::registry::instance().drop(name) which looks like:

          void drop(const std::string &logger_name)
          {
              std::lock_guard<std::mutex> lock(logger_map_mutex_);
              loggers_.erase(logger_name);
          }
      

      (the mutex is already held by the same thread, which isn't allowed by a std::mutex).

      The problem spreads out to the frontend threads via LWP 15429:

      Thread 25 (Thread 0x7f3194717700 (LWP 15429)):
      #0  0x00007f319eb4951d in __lll_lock_wait () from /lib64/libpthread.so.0
      #1  0x00007f319eb44e1b in _L_lock_812 () from /lib64/libpthread.so.0
      #2  0x00007f319eb44ce8 in pthread_mutex_lock () from /lib64/libpthread.so.0
      #3  0x00007f31a21a4e74 in spdlog::register_logger(std::shared_ptr<spdlog::logger>) () from /opt/couchbase/bin/../lib/libmemcached_logger.so.1.0.0
      #4  0x00007f31a219a615 in cb::logger::registerSpdLogger(std::shared_ptr<spdlog::logger>) () from /opt/couchbase/bin/../lib/libmemcached_logger.so.1.0.0
      #5  0x0000000000437840 in ServerLogApi::register_spdlogger(std::shared_ptr<spdlog::logger>) ()
      #6  0x00007f3199fd601d in BucketLogger::createBucketLogger(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /opt/couchbase/bin/../lib/../lib/ep.so
      #7  0x00007f3199feb75a in ConnHandler::ConnHandler(EventuallyPersistentEngine&, void const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /opt/couchbase/bin/../lib/../lib/ep.so
      #8  0x00007f319a037918 in DcpProducer::DcpProducer(EventuallyPersistentEngine&, void const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, bool) () from /opt/couchbase/bin/../lib/../lib/ep.so
      #9  0x00007f319a019bc1 in DcpConnMap::newProducer(void const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int) () from /opt/couchbase/bin/../lib/../lib/ep.so
      #10 0x00007f319a080af1 in EventuallyPersistentEngine::dcpOpen(void const*, unsigned int, unsigned int, unsigned int, cb::const_char_buffer, cb::const_char_buffer) () from /opt/couchbase/bin/../lib/../lib/ep.so
      #11 0x00007f319a080e80 in non-virtual thunk to EventuallyPersistentEngine::open(gsl::not_null<void const*>, unsigned int, unsigned int, unsigned int, cb::const_char_buffer, cb::const_char_buffer) () from /opt/couchbase/bin/../lib/../lib/ep.so
      #12 0x000000000045ed0a in dcpOpen(Cookie&, unsigned int, unsigned int, unsigned int, cb::const_char_buffer, cb::const_char_buffer) ()
      #13 0x00000000004b8806 in dcp_open_executor(Cookie&) ()
      #14 0x00000000004ad2ef in execute_client_request_packet(Cookie&, cb::mcbp::Request const&) ()
      #15 0x000000000049b8dc in Cookie::execute() ()
      #16 0x00000000004ccf48 in StateMachine::conn_execute() ()
      #17 0x000000000048db7c in Connection::runStateMachinery() ()
      #18 0x000000000048dc1b in Connection::runEventLoop(short) ()
      #19 0x000000000049a7bc in run_event_loop(Connection*, short) ()
      #20 0x000000000042f607 in event_handler(int, short, void*) ()
      #21 0x00007f31a0401107 in event_process_active_single_queue.isra.26 () from /opt/couchbase/bin/../lib/libevent_core.so.2.1.8
      #22 0x00007f31a040167f in event_base_loop () from /opt/couchbase/bin/../lib/libevent_core.so.2.1.8
      #23 0x000000000046ff59 in worker_libevent(void*) ()
      #24 0x00007f31a1113f27 in platform_thread_wrap(void*) () from /opt/couchbase/bin/../lib/libplatform_so.so.0.1.0
      #25 0x00007f319eb42e25 in start_thread () from /lib64/libpthread.so.0
      #26 0x00007f319e86cbad in clone () from /lib64/libc.so.6
      

      We're holding the DCP conns lock, and is trying to get the map-mutex in the spdlogger registry (and then the other frontend threads gets blocked trying to get the dcp manager conn lock to get dcp stats)

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ben.huddleston Ben Huddleston
              trond Trond Norbye
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty