Uploaded image for project: 'Couchbase Go SDK'
  1. Couchbase Go SDK
  2. GOCBC-1147

Bootstrap can deadlock if get error map fails

    XMLWordPrintable

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • core-10.0.2, core-9.1.6
    • None
    • None
    • 1

    Description

      We don't return error in the case that get error map fails. This means that the error map channel is nil and we don't check for that so we wait on the channel and hang forever.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          "I found a deadlock that I was able to trace back to the `Bootstrap` function:

          ```

          goroutine 2470 [chan receive (nil chan), 156 minutes]:

          github.com/couchbase/gocbcore/v9.(*memdClient).Bootstrap(0xc000570c40, 0xc000edb6e0, 0x1baf1d0, 0x7, 0x1bb5d00, 0xb, 0xc000482e10, 0x3, 0x3, 0xc0009f72c0, ...)

          external/com_github_couchbase_gocbcore_v9/memdclient.go:750 +0xeff

          github.com/couchbase/gocbcore/v9.(*memdClientDialerComponent).SlowDialMemdClient(0xc00020f320, 0xc000edb6e0, 0xc0005be870, 0x30, 0xc000efb3d0, 0x3, 0x1, 0x1c04bc9)

          external/com_github_couchbase_gocbcore_v9/memdclientdialer_component.go:98 +0x3e5

          github.com/couchbase/gocbcore/v9.(*kvMux).newKVMuxState.func1(0xc000edb6e0, 0x2, 0x2, 0xc0007703c0)

          external/com_github_couchbase_gocbcore_v9/kvmux.go:545 +0x9d

          github.com/couchbase/gocbcore/v9.(*memdPipelineClient).Run.func1(0xc0000b5d10, 0xc000c58c00, 0xc000c58c60)

          external/com_github_couchbase_gocbcore_v9/memdpipelineclient.go:220 +0x38

          created by github.com/couchbase/gocbcore/v9.(*memdPipelineClient).Run

          external/com_github_couchbase_gocbcore_v9/memdpipelineclient.go:219 +0x28d

          ```

          I found that after profiling the goroutines, when I was investigating this error (result of enalbing the verbose logs in `gocb` package with `gocb.SetLogger(gocb.VerboseStdioLogger())`):

          ```

          GOCB 23:14:36.745572 ???:0: Orphaned responses observed:

          {"service":"kv","count":1,"top":[

          {"c":"3c4354d070ecd8c1/8c13f620c6c0a62c","i":"0x5","r":"<<REDACTED>>:11210","d":145,"s":"kv:CMD_SASLSTEP"}

          ]}

          ```

          In summary, the context where I found this issue, was in a case where I had to initialize/create a `gocb.Cluster` client multiple times (as I can't take the server for granted), and after some long period of time, this deadlock happens.

          I have tested these changes for that scenario and was able to solve it.

          Obs(1): Currently I'm using `v9.1.5` version of this package and the server `6.5.0` (enterprise);

          Obs(2): I have an internal ticket that I can share with all the details that lead to this change and the troubleshooting process with all the behaviors observed;"

          charles.dixon Charles Dixon added a comment - "I found a deadlock that I was able to trace back to the `Bootstrap` function: ``` goroutine 2470 [chan receive (nil chan), 156 minutes] : github.com/couchbase/gocbcore/v9.(*memdClient).Bootstrap(0xc000570c40, 0xc000edb6e0, 0x1baf1d0, 0x7, 0x1bb5d00, 0xb, 0xc000482e10, 0x3, 0x3, 0xc0009f72c0, ...) external/com_github_couchbase_gocbcore_v9/memdclient.go:750 +0xeff github.com/couchbase/gocbcore/v9.(*memdClientDialerComponent).SlowDialMemdClient(0xc00020f320, 0xc000edb6e0, 0xc0005be870, 0x30, 0xc000efb3d0, 0x3, 0x1, 0x1c04bc9) external/com_github_couchbase_gocbcore_v9/memdclientdialer_component.go:98 +0x3e5 github.com/couchbase/gocbcore/v9.(*kvMux).newKVMuxState.func1(0xc000edb6e0, 0x2, 0x2, 0xc0007703c0) external/com_github_couchbase_gocbcore_v9/kvmux.go:545 +0x9d github.com/couchbase/gocbcore/v9.(*memdPipelineClient).Run.func1(0xc0000b5d10, 0xc000c58c00, 0xc000c58c60) external/com_github_couchbase_gocbcore_v9/memdpipelineclient.go:220 +0x38 created by github.com/couchbase/gocbcore/v9.(*memdPipelineClient).Run external/com_github_couchbase_gocbcore_v9/memdpipelineclient.go:219 +0x28d ``` I found that after profiling the goroutines, when I was investigating this error (result of enalbing the verbose logs in `gocb` package with `gocb.SetLogger(gocb.VerboseStdioLogger())`): ``` GOCB 23:14:36.745572 ???:0: Orphaned responses observed: {"service":"kv","count":1,"top":[ {"c":"3c4354d070ecd8c1/8c13f620c6c0a62c","i":"0x5","r":"<<REDACTED>>:11210","d":145,"s":"kv:CMD_SASLSTEP"} ]} ``` In summary, the context where I found this issue, was in a case where I had to initialize/create a `gocb.Cluster` client multiple times (as I can't take the server for granted), and after some long period of time, this deadlock happens. I have tested these changes for that scenario and was able to solve it. Obs(1): Currently I'm using `v9.1.5` version of this package and the server `6.5.0` (enterprise); Obs(2): I have an internal ticket that I can share with all the details that lead to this change and the troubleshooting process with all the behaviors observed;"

          Build sync_gateway-3.0.0-346 contains gocbcore commit e0632bb with commit message:
          GOCBC-1147: Fix deadlock in kv error map

          build-team Couchbase Build Team added a comment - Build sync_gateway-3.0.0-346 contains gocbcore commit e0632bb with commit message: GOCBC-1147 : Fix deadlock in kv error map

          Build couchbase-server-7.1.0-2094 contains gocbcore commit 8ab6086 with commit message:
          GOCBC-1147: Fix deadlock in kv error map

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-2094 contains gocbcore commit 8ab6086 with commit message: GOCBC-1147 : Fix deadlock in kv error map

          People

            charles.dixon Charles Dixon
            charles.dixon Charles Dixon
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty