Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-29108

connection_stats is not thread safe

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 5.5.0
    • 5.5.0
    • memcached
    • None
    • Untriaged
    • Unknown

    Description

      As identified by TSAN on multiple occurrences, the implementation of connection_stats within connections.cc is not thread safe when dumping the connection stats.

      Adding some debug code into the method:

      connection_stats

      void connection_stats(ADD_STAT add_stats, const void* cookie, const int64_t fd) {
          std::lock_guard<std::mutex> lock(connections.mutex);
          std::cout << "FD: " << fd << std::endl;
          std::cout << "Thread: " << std::this_thread::get_id() << std::endl;
          for (auto *c : connections.conns) {
              std::cout << "Connection Thread: " << c->getThread()->thread_id << std::endl;
              std::cout << "Connection FD: " << c->getSocketDescriptor() << std::endl;
              if (c->getSocketDescriptor() == fd || fd == -1) {
                  auto stats = c->toJSON();
                  // no key, JSON value contains all properties of the connection.
                  auto stats_str = to_string(stats, false);
                  add_stats(nullptr, 0, stats_str.data(), uint32_t(stats_str.size()),
                            cookie);
              }
          }
      }
      

      yields the following output:

      debug output

      FD: -1
      Thread: 0x70000b43c000
      Connection Thread: 0x70000b3b9000
      Connection FD: 34
      Connection Thread: 0x70000b43c000
      Connection FD: 33
      

      when running the TEST_F(HelloTest, AgentName) test. It seems this behaviour only occurs when the fd variable is -1 hence dumping details for all connections, as this is causing concurrent access from differing threads to the connection itself.

       

      The root cause of the problem is that it tries to look at connection structures owned by a different worker thread than the calling connection. This is because the locking model in the core is in such a way that each connection object use the mutex of the working thread in order to protect it's members. A worker thread is NOT allowed to try to lock another worker thread (as that may lead to deadlocks)

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              tim.bradgate Tim Bradgate (Inactive)
              tim.bradgate Tim Bradgate (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty