Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-32301

During KV fetch ON connection error retry non-bulkGet

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 6.5.0, 6.0.0, 5.5.2
    • 6.0.1
    • query
    • None
    • Untriaged
    • Unknown

    Description

      Fetch more than one key uses doBulkGet() All others uses Do() (Fetch one key or INSERT/UPDATE/DELETE,....) The connection might have disconnected by memached. This can happen while we are using or connection is in the idle in the pool The connection needs to handle any of this transient errors and retry. We already do this in doBulkGet(), Not in Do(). The DoNoDeadline() code is same as Do() except caller takes care of deadline setting. Combined both by passing the flag and removed DoNoDeadline()

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Build couchbase-server-5.5.4-4302 contains go-couchbase commit 3ca3995 with commit message:
            MB-32301. During KV fetch ON connection error retry non-bulkGet

            build-team Couchbase Build Team added a comment - Build couchbase-server-5.5.4-4302 contains go-couchbase commit 3ca3995 with commit message: MB-32301 . During KV fetch ON connection error retry non-bulkGet

            No regressions seen in RC2 build (6.0.1-2031) in functional or system testing

            mihir.kamdar Mihir Kamdar (Inactive) added a comment - No regressions seen in RC2 build (6.0.1-2031) in functional or system testing

            I've produced a partial reproduction showing the different behaviour between the operation performed when performing a single document fetch and the operation performed when performing a multiple document fetch.

            I blocked connections between memcached and cbq-engine using the tcpkill tool. This was to simulate an attempt on a connection which was 'dead'.

            When performing a single document fetch the query instantly failed with the error:

            Error performing bulk get operation - cause: read tcp 10.112.183.102:47222->10.112.183.101:11210: read: connection reset by peer

            Whereas when performing a multi-document fetch the query kept retrying until the connection eventually timed out.

            This is what I had expected to see.

            I then upgraded to a 5.5.4 build which currently has this fix implemented and the below error is what was returned:

            "Error performing bulk get operation - cause: unable to complete action after 2 attemps"

            It appears the number of attempts is defined by the following code maxTries := len(b.Nodes()) * 2. Can someone confirm that this will ensure that the connection is successful and will not hit a dead connection on both attempts?

            jacques.rascagneres Jacques Rascagneres added a comment - I've produced a partial reproduction showing the different behaviour between the operation performed when performing a single document fetch and the operation performed when performing a multiple document fetch. I blocked connections between memcached and cbq-engine using the tcpkill tool. This was to simulate an attempt on a connection which was 'dead'. When performing a single document fetch the query instantly failed with the error: Error performing bulk get operation - cause: read tcp 10.112.183.102:47222->10.112.183.101:11210: read: connection reset by peer Whereas when performing a multi-document fetch the query kept retrying until the connection eventually timed out. This is what I had expected to see. I then upgraded to a 5.5.4 build which currently has this fix implemented and the below error is what was returned: "Error performing bulk get operation - cause: unable to complete action after 2 attemps" It appears the number of attempts is defined by the following code maxTries := len(b.Nodes()) * 2 . Can someone confirm that this will ensure that the connection is successful and will not hit a dead connection on both attempts?

            At present noway to know until you try in the connection pool and when it returns error destroy it and retry another one(This may be existing one or new one). Maximum existing connection per host per bucket in the pool will be 80.
            Each Fetch each vBucket uses its own connection.

            Sitaram.Vemulapalli Sitaram Vemulapalli added a comment - At present noway to know until you try in the connection pool and when it returns error destroy it and retry another one(This may be existing one or new one). Maximum existing connection per host per bucket in the pool will be 80. Each Fetch each vBucket uses its own connection.

            Build sync_gateway-2.7.0-36 contains go-couchbase commit bd8e994 with commit message:
            MB-32301. During KV fetch ON connection error retry non-bulkGet

            build-team Couchbase Build Team added a comment - Build sync_gateway-2.7.0-36 contains go-couchbase commit bd8e994 with commit message: MB-32301 . During KV fetch ON connection error retry non-bulkGet

            People

              ajay.bhullar Ajay Bhullar
              Sitaram.Vemulapalli Sitaram Vemulapalli
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty