Description
As identified by TSAN on multiple occurrences, the implementation of connection_stats within connections.cc is not thread safe when dumping the connection stats.
Adding some debug code into the method:
connection_stats |
void connection_stats(ADD_STAT add_stats, const void* cookie, const int64_t fd) {
|
std::lock_guard<std::mutex> lock(connections.mutex);
|
std::cout << "FD: " << fd << std::endl;
|
std::cout << "Thread: " << std::this_thread::get_id() << std::endl;
|
for (auto *c : connections.conns) {
|
std::cout << "Connection Thread: " << c->getThread()->thread_id << std::endl;
|
std::cout << "Connection FD: " << c->getSocketDescriptor() << std::endl;
|
if (c->getSocketDescriptor() == fd || fd == -1) {
|
auto stats = c->toJSON();
|
// no key, JSON value contains all properties of the connection.
|
auto stats_str = to_string(stats, false);
|
add_stats(nullptr, 0, stats_str.data(), uint32_t(stats_str.size()),
|
cookie);
|
}
|
}
|
}
|
yields the following output:
debug output |
FD: -1
|
Thread: 0x70000b43c000
|
Connection Thread: 0x70000b3b9000
|
Connection FD: 34
|
Connection Thread: 0x70000b43c000
|
Connection FD: 33
|
when running the TEST_F(HelloTest, AgentName) test. It seems this behaviour only occurs when the fd variable is -1 hence dumping details for all connections, as this is causing concurrent access from differing threads to the connection itself.
The root cause of the problem is that it tries to look at connection structures owned by a different worker thread than the calling connection. This is because the locking model in the core is in such a way that each connection object use the mutex of the working thread in order to protect it's members. A worker thread is NOT allowed to try to lock another worker thread (as that may lead to deadlocks)