Spurious segfaults due to query metrics

Description

It appears that under certain circumstances, lcb_QUERY_HANDLE's are left active after the associated instance is destroyed. This causes the query destructor which indirectly calls the metrics interface to fail and segfault. Interestingly, there is a related bug I've seen (but also cannot reliably reproduce) where executed queries never invoke the callback, even though wireshark shows the full response was received.

Environment

None

Gerrit Reviews

None

Release Notes Description

None

Attachments

1

Activity

Show:

CB robot October 8, 2021 at 8:26 AM

Build couchbase-server-7.1.0-1450 contains libcouchbase commit 1d2e0e4 with commit message:
: always invoke callbacks for pending HTTP operations

Sergey Auseyau August 13, 2021 at 10:07 PM

libcouchbase does not invoke Query callback when the instance destroyed, so in case of external IO (like couchnode) the IO object lives longer that lcb_INSTANCE, and tries to execute N1QL timeout handler (by default 75 seconds) but at that point there is no lcb_INSTANCE around. Before metrics we were lucky not to touch any memory from lcb_INSTANCE, and the issue didn't generate segfault, but metrics made it visible.

Sergey Auseyau August 13, 2021 at 9:40 PM

So I've managed to build standalone reproduction of the issue by modifying libuv example

Matt Ingenthron August 12, 2021 at 11:29 PM

Note, per discussion earlier today, the Ottoman tests regularly trigger this issue. According to , 90% of the time running that test suite will run into this.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Fix versions

Sprint

Story Points

Components

Reporter

Affects versions

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created August 12, 2021 at 4:12 PM
Updated October 8, 2021 at 8:26 AM
Resolved August 18, 2021 at 4:49 PM
Instabug