[System Test] cbq-engine crash "panic: runtime error: invalid memory address or nil pointer dereference" when a KV node was failed over

Description

Build : 7.1.0-2347
Test : -test tests/integration/neo/test_neo_couchstore_milestone4.yml -scope tests/integration/neo/scope_couchstore.yml
Scale : 2
Iteration : 1st

In the longevity test, while a KV node 172.23.106.100 was being failed over and a rebalance operation was running, query service on 172.23.104.157 crashed :

2022-02-19T06:38:31.836-08:00 [WARN] (TXGOCBCORE) memdClient read failure on conn `52a99c9ac2dcd82b/346b54e0c9efe9dc` : read tcp 172.23.104.157:36210->172.23.106.100:11210: use of closed network connection 2022-02-19T06:38:31.836-08:00 [WARN] (TXGOCBCORE) Failed to shut down client connection (close tcp 172.23.104.157:36210->172.23.106.100:11210: use of closed network connection) panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x202eaa6] goroutine 33054796 [running]: github.com/couchbase/gocbcore/v10.(*memdConnWrap).Release(0xc0017543c0) /home/couchbase/.cbdepscache/gomodcache/pkg/mod/github.com/couchbase/gocbcore/v10@v10.0.8/memdconn.go:120 +0x26 github.com/couchbase/gocbcore/v10.(*memdClient).closeConn(0xc0011d0f00, 0x0, 0x0, 0xdb12c1) /home/couchbase/.cbdepscache/gomodcache/pkg/mod/github.com/couchbase/gocbcore/v10@v10.0.8/memdclient.go:561 +0x6a github.com/couchbase/gocbcore/v10.(*memdClient).resolveRequest.func1(0xc0011d0f00) /home/couchbase/.cbdepscache/gomodcache/pkg/mod/github.com/couchbase/gocbcore/v10@v10.0.8/memdclient.go:280 +0x34 created by github.com/couchbase/gocbcore/v10.(*memdClient).resolveRequest /home/couchbase/.cbdepscache/gomodcache/pkg/mod/github.com/couchbase/gocbcore/v10@v10.0.8/memdclient.go:279 +0xadc 2022-02-19T06:38:32.312-08:00 [INFO] Current nofiles rlimit: 200000 (max: 200000) 2022-02-19T06:38:32.314-08:00 [INFO] Initialization of cbauth succeeded

172.23.104.137 is the other query node in the cluster at the time.

Environment

None

Gerrit Reviews

None

Release Notes Description

None

Activity

Show:

CB robot February 28, 2022 at 9:13 PM

Build couchbase-server-7.1.0-2402 contains query commit 75750f1 with commit message:
pick https://couchbasecloud.atlassian.net/browse/GOCBC-1248#icft=GOCBC-1248

CB robot February 28, 2022 at 9:13 PM

Build couchbase-server-7.1.0-2402 contains n1fty commit 0568b36 with commit message:
pick https://couchbasecloud.atlassian.net/browse/GOCBC-1248#icft=GOCBC-1248

Sitaram Vemulapalli February 28, 2022 at 7:02 PM

Query picked above commit. Will be part of next build.

Charles Dixon February 28, 2022 at 5:48 PM

This should now be fixed with https://github.com/couchbase/gocbcore/commit/170b8c650e3fa02f0f1a79e51367bd80c196d479 unsure who to reassign to but guess query need to pick up the change.

Charles Dixon February 28, 2022 at 9:20 AM

I suspect that https://github.com/couchbase/gocbcore/commit/030c41f063019cd3459e443f4e7c622962bedb98 introduced this issue. I will add a nil check in release but that's a bit of treating the symptom rather than the cause (none the less a good thing to do). Graceful close has been triggered and then we see a read failure which is triggering a "hard" close of the memdclient (which closes and releases the connection) whilst graceful shutdown is in progress. That linked change has a behavior change which bypasses the closed check during graceful shutdown (incorrectly) which is leading to this panic after the read failure.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Labels

Story Points

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created February 25, 2022 at 7:01 AM
Updated March 1, 2022 at 9:01 AM
Resolved March 1, 2022 at 9:01 AM
Instabug