Uploaded image for project: 'Couchbase .NET client library'
  1. Couchbase .NET client library
  2. NCBC-1729

Operation timeout when server(spock) restarts

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 2.7.1
    • None
    • None

    Description

      with commit f1e01cb24eab23fe963186141aae4d1a27f8a11e which is top of master as of 7/9 and candidate for version 2.6.0, when all Spock(5.0) servers are restarting one by one, couple of timeout are detected about 70 seconds after servers are started (servers started means service start couchbase command placed, so it would take some time to actually server is up and running since then)

      Attaching SDK log and graph link is http://sdk-testresults.couchbase.com.s3.amazonaws.com/SDK-SDK/CB-5.0.0-xxxx/SvcRestartAll-HYBRID/07-09-18/021208/2ea9aab5aca092bd8b571a0e6e2ef9be-MC.html

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          jaekwon.park Jae Park [X] (Inactive) created issue -
          jaekwon.park Jae Park [X] (Inactive) made changes -
          Field Original Value New Value
          Attachment log.zip [ 54830 ]
          jmorris Jeff Morris added a comment -

          Log file isn't correct; its not logging the SDK - its the SDKD log file I think.

          jmorris Jeff Morris added a comment - Log file isn't correct; its not logging the SDK - its the SDKD log file I think.

          Sorry, I will get log and attach again.

          jaekwon.park Jae Park [X] (Inactive) added a comment - Sorry, I will get log and attach again.
          jaekwon.park Jae Park [X] (Inactive) made changes -
          Attachment log.zip [ 54830 ]
          jaekwon.park Jae Park [X] (Inactive) made changes -
          Attachment log.zip [ 54835 ]
          jaekwon.park Jae Park [X] (Inactive) made changes -
          Description with commit f1e01cb24eab23fe963186141aae4d1a27f8a11e which is top of master as of 7/9 and candidate for version 2.6.0, when all Spock(5.0) servers are restarting one by one, couple of timeout are detected about 70 seconds after servers are started (servers started means service start couchbase command placed, so it would take some time to actually server is up and running since then)

          Attaching SDK log
          with commit f1e01cb24eab23fe963186141aae4d1a27f8a11e which is top of master as of 7/9 and candidate for version 2.6.0, when all Spock(5.0) servers are restarting one by one, couple of timeout are detected about 70 seconds after servers are started (servers started means service start couchbase command placed, so it would take some time to actually server is up and running since then)

          Attaching SDK log and graph link is http://sdk-testresults.couchbase.com.s3.amazonaws.com/SDK-SDK/CB-5.0.0-xxxx/SvcRestartAll-HYBRID/07-09-18/021208/2ea9aab5aca092bd8b571a0e6e2ef9be-MC.html

          attached correct log and linked the graph for easier view of errors

          jaekwon.park Jae Park [X] (Inactive) added a comment - attached correct log and linked the graph for easier view of errors
          jmorris Jeff Morris made changes -
          Link This issue relates to SDKQE-1364 [ SDKQE-1364 ]
          jmorris Jeff Morris made changes -
          Status New [ 10003 ] Open [ 1 ]
          jmorris Jeff Morris made changes -
          Attachment ncbc-1729-op-timeouts.PNG [ 56181 ]
          jmorris Jeff Morris added a comment - - edited

          Opening this as a bug because it needs to be looked into deeper. There is quite a bit of symmetry in the graph - the timeout happen ~10s and at each interval there are 4 timeouts. This is possibly related to the ConfigMonitor Node Unavailable Check which wakes up every 10s and checks the server status if a node is determined to be offline. 

           

          jmorris Jeff Morris added a comment - - edited Opening this as a bug because it needs to be looked into deeper. There is quite a bit of symmetry in the graph - the timeout happen ~10s and at each interval there are 4 timeouts. This is possibly related to the ConfigMonitor Node Unavailable Check  which wakes up every 10s and checks the server status if a node is determined to be offline.   
          jmorris Jeff Morris made changes -
          Fix Version/s 2.6.2 [ 15319 ]
          jmorris Jeff Morris added a comment - - edited

          Each timeout corresponds to a send timeout - the request is sent but the response is not received within the ClientConfiguration.SendTimeout of 15s:

          2018-07-09 22:56:20,978 [68] DEBUG Couchbase.Core.Server - Sending Set`1 with key SimpleKeyREP3930REP3930REP3930REP3930REP3930 using server 172.23.109.150:11210
          2018-07-09 22:56:20,978 [44] DEBUG Couchbase.IO.Services.PooledIOService - Couchbase.IO.SendTimeoutExpiredException: The operation has timed out. {"s":"kv","i":"e8c8","c":"0a1ef0fb9d86f17c/8621a1b4d93e85ae","b":"default","l":"172.23.109.139:51255","r":"172.23.109.142:11210","t":15000000}
           at Couchbase.IO.MultiplexingConnection.Send(Byte[] request)
           at Couchbase.IO.Services.PooledIOService.Execute[T](IOperation`1 operation)
          2018-07-09 22:56:20,978 [44] INFO Couchbase.Core.Server - Checking if node 172.23.109.142:11210 should be down - last: 22:54:28.5243695, current: 22:56:20.9785403, count: 3
          2018-07-09 22:56:20,978 [44] DEBUG Couchbase.Core.Buckets.CouchbaseRequestExecuter - Operation doesn't support retries for key SimpleKeyREP4970REP4970REP4970REP4970REP4970
          2018-07-09 22:56:20,978 [44] DEBUG Couchbase.Core.Buckets.RequestExecuterBase - Operation for key SimpleKeyREP4970REP4970REP4970REP4970REP4970 failed after 1 retries using vb0 from rev43 and opaque59592. Reason: The operation has timed out.

          jmorris Jeff Morris added a comment - - edited Each timeout corresponds to a send timeout - the request is sent but the response is not received within the ClientConfiguration.SendTimeout of 15s: 2018 - 07 - 09 22 : 56 : 20 , 978 [ 68 ] DEBUG Couchbase.Core.Server - Sending Set` 1 with key SimpleKeyREP3930REP3930REP3930REP3930REP3930 using server 172.23 . 109.150 : 11210 2018 - 07 - 09 22 : 56 : 20 , 978 [ 44 ] DEBUG Couchbase.IO.Services.PooledIOService - Couchbase.IO.SendTimeoutExpiredException: The operation has timed out. { "s" : "kv" , "i" : "e8c8" , "c" : "0a1ef0fb9d86f17c/8621a1b4d93e85ae" , "b" : "default" , "l" : "172.23.109.139:51255" , "r" : "172.23.109.142:11210" , "t" : 15000000 } at Couchbase.IO.MultiplexingConnection.Send(Byte[] request) at Couchbase.IO.Services.PooledIOService.Execute[T](IOperation` 1 operation) 2018 - 07 - 09 22 : 56 : 20 , 978 [ 44 ] INFO Couchbase.Core.Server - Checking if node 172.23 . 109.142 : 11210 should be down - last: 22 : 54 : 28.5243695 , current: 22 : 56 : 20.9785403 , count: 3 2018 - 07 - 09 22 : 56 : 20 , 978 [ 44 ] DEBUG Couchbase.Core.Buckets.CouchbaseRequestExecuter - Operation doesn't support retries for key SimpleKeyREP4970REP4970REP4970REP4970REP4970 2018 - 07 - 09 22 : 56 : 20 , 978 [ 44 ] DEBUG Couchbase.Core.Buckets.RequestExecuterBase - Operation for key SimpleKeyREP4970REP4970REP4970REP4970REP4970 failed after 1 retries using vb0 from rev43 and opaque59592. Reason: The operation has timed out.
          jmorris Jeff Morris added a comment - - edited

          Jae Park [X] -

          Based on my last comment, it looks like the server isn't responding in a timely manner and then the client times out the operation. We might want to do a TCP dump and/or correlate with server logs.

          Jeff

          jmorris Jeff Morris added a comment - - edited Jae Park [X] - Based on my last comment, it looks like the server isn't responding in a timely manner and then the client times out the operation. We might want to do a TCP dump and/or correlate with server logs. Jeff
          jmorris Jeff Morris made changes -
          Assignee Jeff Morris [ jmorris ] Jae Park [ jaekwon.park ]
          jmorris Jeff Morris made changes -
          Fix Version/s 2.6.3 [ 15410 ]
          Fix Version/s 2.6.2 [ 15319 ]
          jmorris Jeff Morris added a comment -

          Moving this up a release - need more time and info from QE

          jmorris Jeff Morris added a comment - Moving this up a release - need more time and info from QE
          jmorris Jeff Morris added a comment -

          Jae Park [X] - can you verify that this is resolved?

          jmorris Jeff Morris added a comment - Jae Park [X]  - can you verify that this is resolved?
          jmorris Jeff Morris made changes -
          Link This issue relates to SDKQE-1434 [ SDKQE-1434 ]
          jmorris Jeff Morris made changes -
          Fix Version/s 2.7.1 [ 15435 ]
          Fix Version/s 2.7.0 [ 15410 ]

          verified 2.7.0

          jaekwon.park Jae Park [X] (Inactive) added a comment - verified 2.7.0
          jaekwon.park Jae Park [X] (Inactive) made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          jaekwon.park Jae Park [X] (Inactive) made changes -
          Actual Start 2018-11-01 22:18 (issue has been started)
          jaekwon.park Jae Park [X] (Inactive) made changes -
          Resolution Fixed [ 1 ]
          Status In Progress [ 3 ] Resolved [ 5 ]
          jaekwon.park Jae Park [X] (Inactive) made changes -
          Actual End 2018-11-01 22:18 (issue has been resolved)
          jaekwon.park Jae Park [X] (Inactive) made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

          People

            jaekwon.park Jae Park [X] (Inactive)
            jaekwon.park Jae Park [X] (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty