Uploaded image for project: 'Couchbase Gateway'
  1. Couchbase Gateway
  2. CBG-650

Intermittent java.net.SocketTimeoutException | readTimeout during user creation

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.7.0
    • None
    • SyncGateway
    • Security Level: Public
    • None
    • Hide
      Mandatory:
       - CBL / SG Version:
         - SG Config:
       - Steps to Reproduce:
       - Actual Result:
       - Expected Result:
       - Logs :
            SGW LOGS: sgcollect info
            CBL LOGS:
            Logcat LOGS: for Android tickets
       - Github link for the code:
       - Jenkins job failure link:
       - Pytest Command
       - What is the last build this test passed:
      Show
      Mandatory:  - CBL / SG Version:    - SG Config:  - Steps to Reproduce:  - Actual Result:  - Expected Result:  - Logs :       SGW LOGS: sgcollect info       CBL LOGS:       Logcat LOGS: for Android tickets  - Github link for the code:  - Jenkins job failure link:  - Pytest Command  - What is the last build this test passed:

    Description

      I am seeing java.net.SocketTimeoutException | readTimeOut exception during user creation 

      This issue is seen only when Mad-Hatter Couchbase build is used and is not seen in any pre MH version .

      Client side readTimeout is set to 5 seconds .
      requestBuilder = requestBuilder.setSocketTimeout(readTimeout);
      private int readTimeout = 5000;
       

      Here is a snippet of timeout error observed in sg_logs :

      "2019-12-23T11:32:52.304-08:00 [INF] Query: N1QL Query("roleAccess") took 1m15.000275961s

      2019-12-23T11:32:52.304-08:00 [WRN] channelComputer.ComputeRolesForUser failed on user sg-user-585: Timeout performing Query – auth.(*Authenticator).rebuildRoles() at auth.go:199

      2019-12-23T11:32:52.304-08:00 [ERR] c:#115 #115: Timeout performing Query – rest.(*handler).writeError() at handler.go:724

      2019-12-23T11:32:52.304-08:00 [INF] HTTP: #115:     --> 503 Timeout performing Query  (75030.1 ms)"

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Here is the sg_logs for analysis:

            sglogs_socketTimeout_CBG_650.zip

            sharath.sulochana Sharath Sulochana (Inactive) added a comment - - edited Here is the sg_logs for analysis: sglogs_socketTimeout_CBG_650.zip
            ben.brooks Ben Brooks added a comment - - edited

            This seems likely to be a Mad Hatter regression than a Mercury issue.

            The N1QL query responsible for the timeout seems to be:

            SELECT meta(`bucketname`).xattrs._sync.role_access.`bbrks2` as `value` FROM `bucketname` WHERE any op in object_pairs(meta(`bucketname`).xattrs._sync.role_access) satisfies op.name = "bbrks2" end;
            

            ben.brooks Ben Brooks added a comment - - edited This seems likely to be a Mad Hatter regression than a Mercury issue. The N1QL query responsible for the timeout seems to be: SELECT meta(`bucketname`).xattrs._sync.role_access.`bbrks2` as `value` FROM `bucketname` WHERE any op in object_pairs(meta(`bucketname`).xattrs._sync.role_access) satisfies op.name = "bbrks2" end;

            looking at the log - Is it syncgateway calling it a TimeOut after waiting for 75secs for this query ?

            sharath.sulochana Sharath Sulochana (Inactive) added a comment - looking at the log - Is it syncgateway calling it a TimeOut after waiting for 75secs for this query ?

            I saw another failure in init_user phase . This time it's below query .

            I captured cbcollect_info log from couchbase servers . I do see some error related indexer service . 

            log: [s3://cb-customers-secure/sg_log/2019-12-23/sglogs_initi_user_failure.zip]

            sg_info snippet:

            2019-12-23T13:27:07.074-08:00 [INF] Query: N1QL Query("access") took 1m15.000328622s

            2019-12-23T13:27:07.074-08:00 [WRN] QueryAccess returned error: Timeout performing Query – db.(*DatabaseContext).ComputeSequenceChannelsForPrincipal() at crud.go:1711

            2019-12-23T13:27:07.074-08:00 [WRN] channelComputer.ComputeChannelsForPrincipal returned error for &{roleImpl:

            {Name_:sg-user-172 ExplicitChannels_:channel-0:179 Channels_: Sequence_:179 PreviousChannels_: vbNo:<nil> cas:0}

            userImplBody:

            {Email_: Disabled_:false PasswordHash_:[36 50 97 36 49 48 36 106 119 104 110 111 122 106 113 67 54 103 47 114 50 82 118 107 90 117 113 76 101 55 69 106 103 110 85 51 76 46 104 77 106 71 115 79 82 106 101 107 87 117 68 122 55 99 69 65 50 76 67 75] OldPasswordHash_:<nil> ExplicitRoles_: RolesSince_: OldExplicitRoles_:[]}

            auth:<nil> roles:[]}: Timeout performing Query – auth.(*Authenticator).rebuildChannels() at auth.go:174

            2019-12-23T13:27:07.074-08:00 [WRN] RebuildChannels returned error: Timeout performing Query – auth.(*Authenticator).getPrincipal.func1() at auth.go:117

            2019-12-23T13:27:07.074-08:00 [ERR] c:#1001346 #1001346: Timeout performing Query – rest.(*handler).writeError() at handler.go:724

            2019-12-23T13:27:07.074-08:00 [INF] HTTP: #1001346:     --> 503 Timeout performing Query  (75000.8 ms)

             

             

             

            sharath.sulochana Sharath Sulochana (Inactive) added a comment - - edited I saw another failure in init_user phase . This time it's below query . I captured cbcollect_info log from couchbase servers . I do see some error related indexer service .  log:  [s3://cb-customers-secure/sg_log/2019-12-23/sglogs_initi_user_failure.zip] sg_info snippet: 2019-12-23T13:27:07.074-08:00 [INF] Query: N1QL Query("access") took 1m15.000328622s 2019-12-23T13:27:07.074-08:00 [WRN] QueryAccess returned error: Timeout performing Query – db.(*DatabaseContext).ComputeSequenceChannelsForPrincipal() at crud.go:1711 2019-12-23T13:27:07.074-08:00 [WRN] channelComputer.ComputeChannelsForPrincipal returned error for &{roleImpl: {Name_:sg-user-172 ExplicitChannels_:channel-0:179 Channels_: Sequence_:179 PreviousChannels_: vbNo:<nil> cas:0} userImplBody: {Email_: Disabled_:false PasswordHash_:[36 50 97 36 49 48 36 106 119 104 110 111 122 106 113 67 54 103 47 114 50 82 118 107 90 117 113 76 101 55 69 106 103 110 85 51 76 46 104 77 106 71 115 79 82 106 101 107 87 117 68 122 55 99 69 65 50 76 67 75] OldPasswordHash_:<nil> ExplicitRoles_: RolesSince_: OldExplicitRoles_:[]} auth:<nil> roles:[]}: Timeout performing Query – auth.(*Authenticator).rebuildChannels() at auth.go:174 2019-12-23T13:27:07.074-08:00 [WRN] RebuildChannels returned error: Timeout performing Query – auth.(*Authenticator).getPrincipal.func1() at auth.go:117 2019-12-23T13:27:07.074-08:00 [ERR] c:#1001346 #1001346: Timeout performing Query – rest.(*handler).writeError() at handler.go:724 2019-12-23T13:27:07.074-08:00 [INF] HTTP: #1001346:     --> 503 Timeout performing Query  (75000.8 ms)      

             this is due to indexer scan timeout on couchbase cluster . Opened this MB-37340 separately for indexer issue .

            sharath.sulochana Sharath Sulochana (Inactive) added a comment - - edited  this is due to indexer scan timeout on couchbase cluster . Opened this MB-37340  separately for indexer issue .

            snapshot of failed query on couchbase (just for reference from a different example than logs attached ) . Index scan took 1 min 14 seconds and query failed to complete .

            sharath.sulochana Sharath Sulochana (Inactive) added a comment - - edited snapshot of failed query on couchbase (just for reference from a different example than logs attached ) . Index scan took 1 min 14 seconds and query failed to complete .
            ben.brooks Ben Brooks added a comment -

            Untagging Mercury as the fix version. If there are SG fixes required for Mad Hatter, that might end up being in a 2.7.1 - but let's find that out from Query first.

            ben.brooks Ben Brooks added a comment - Untagging Mercury as the fix version. If there are SG fixes required for Mad Hatter, that might end up being in a 2.7.1 - but let's find that out from Query first.

            Issue resolved with MB-37340 in MH . Closing this ticket .

            sharath.sulochana Sharath Sulochana (Inactive) added a comment - Issue resolved with  MB-37340  in MH . Closing this ticket .

            People

              sharath.sulochana Sharath Sulochana (Inactive)
              sharath.sulochana Sharath Sulochana (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty