Uploaded image for project: 'Couchbase Go SDK'
  1. Couchbase Go SDK
  2. GOCBC-1248

[System Test] cbq-engine crash "panic: runtime error: invalid memory address or nil pointer dereference" when a KV node was failed over

    XMLWordPrintable

Details

    • 1

    Description

      Build : 7.1.0-2347
      Test : -test tests/integration/neo/test_neo_couchstore_milestone4.yml -scope tests/integration/neo/scope_couchstore.yml
      Scale : 2
      Iteration : 1st

      In the longevity test, while a KV node 172.23.106.100 was being failed over and a rebalance operation was running, query service on 172.23.104.157 crashed :

      2022-02-19T06:38:31.836-08:00 [WARN] (TXGOCBCORE) memdClient read failure on conn `52a99c9ac2dcd82b/346b54e0c9efe9dc` : read tcp 172.23.104.157:36210->172.23.106.100:11210: use of closed network connection
      2022-02-19T06:38:31.836-08:00 [WARN] (TXGOCBCORE) Failed to shut down client connection (close tcp 172.23.104.157:36210->172.23.106.100:11210: use of closed network connection)
      panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x202eaa6]
       
      goroutine 33054796 [running]:
      github.com/couchbase/gocbcore/v10.(*memdConnWrap).Release(0xc0017543c0)
      	/home/couchbase/.cbdepscache/gomodcache/pkg/mod/github.com/couchbase/gocbcore/v10@v10.0.8/memdconn.go:120 +0x26
      github.com/couchbase/gocbcore/v10.(*memdClient).closeConn(0xc0011d0f00, 0x0, 0x0, 0xdb12c1)
      	/home/couchbase/.cbdepscache/gomodcache/pkg/mod/github.com/couchbase/gocbcore/v10@v10.0.8/memdclient.go:561 +0x6a
      github.com/couchbase/gocbcore/v10.(*memdClient).resolveRequest.func1(0xc0011d0f00)
      	/home/couchbase/.cbdepscache/gomodcache/pkg/mod/github.com/couchbase/gocbcore/v10@v10.0.8/memdclient.go:280 +0x34
      created by github.com/couchbase/gocbcore/v10.(*memdClient).resolveRequest
      	/home/couchbase/.cbdepscache/gomodcache/pkg/mod/github.com/couchbase/gocbcore/v10@v10.0.8/memdclient.go:279 +0xadc
      2022-02-19T06:38:32.312-08:00 [INFO] Current nofiles rlimit: 200000 (max: 200000)
      2022-02-19T06:38:32.314-08:00 [INFO] Initialization of cbauth succeeded
      

      172.23.104.137 is the other query node in the cluster at the time.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Moved to GOCBCORE as crash therein.  Seems validation of s.baseConn is needed in Release().

          Donald.haggart Donald Haggart added a comment - Moved to GOCBCORE as crash therein.  Seems validation of s.baseConn is needed in Release().

          Adding affects-neo-testing label so that it can be tracked in the Neo dashboard. We need this to be fixed for Neo.

          mihir.kamdar Mihir Kamdar (Inactive) added a comment - Adding affects-neo-testing label so that it can be tracked in the Neo dashboard. We need this to be fixed for Neo.

          I suspect that https://github.com/couchbase/gocbcore/commit/030c41f063019cd3459e443f4e7c622962bedb98 introduced this issue. I will add a nil check in release but that's a bit of treating the symptom rather than the cause (none the less a good thing to do). Graceful close has been triggered and then we see a read failure which is triggering a "hard" close of the memdclient (which closes and releases the connection) whilst graceful shutdown is in progress. That linked change has a behavior change which bypasses the closed check during graceful shutdown (incorrectly) which is leading to this panic after the read failure.

          charles.dixon Charles Dixon added a comment - I suspect that https://github.com/couchbase/gocbcore/commit/030c41f063019cd3459e443f4e7c622962bedb98 introduced this issue. I will add a nil check in release but that's a bit of treating the symptom rather than the cause (none the less a good thing to do). Graceful close has been triggered and then we see a read failure which is triggering a "hard" close of the memdclient (which closes and releases the connection) whilst graceful shutdown is in progress. That linked change has a behavior change which bypasses the closed check during graceful shutdown (incorrectly) which is leading to this panic after the read failure.

          This should now be fixed with https://github.com/couchbase/gocbcore/commit/170b8c650e3fa02f0f1a79e51367bd80c196d479 unsure who to reassign to but guess query need to pick up the change.

          charles.dixon Charles Dixon added a comment - This should now be fixed with https://github.com/couchbase/gocbcore/commit/170b8c650e3fa02f0f1a79e51367bd80c196d479 unsure who to reassign to but guess query need to pick up the change.

          Query picked above commit. Will be part of next build.

          Sitaram.Vemulapalli Sitaram Vemulapalli added a comment - Query picked above commit. Will be part of next build.

          Build couchbase-server-7.1.0-2402 contains n1fty commit 0568b36 with commit message:
          pick GOCBC-1248

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-2402 contains n1fty commit 0568b36 with commit message: pick GOCBC-1248

          Build couchbase-server-7.1.0-2402 contains query commit 75750f1 with commit message:
          pick GOCBC-1248

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-2402 contains query commit 75750f1 with commit message: pick GOCBC-1248

          People

            mihir.kamdar Mihir Kamdar (Inactive)
            mihir.kamdar Mihir Kamdar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty