Uploaded image for project: 'Couchbase Gateway'
  1. Couchbase Gateway
  2. CBG-458

cobalt - error count stats does not get incremented

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Not a Bug
    • Affects Version/s: 2.6.0
    • Fix Version/s: 2.6.0
    • Component/s: SyncGateway
    • Security Level: Public
    • Labels:

      Description

      SGW Version : All cobalt builds 

      Steps to reproduce :

      1. Set up one server and 2 sync gateways with the sgw provided 
      2. delete sync gateway dbs
      3. Do push replication from sgw1.DB1 to SGW.DB2
      4. As DB are not existed, it should have errors and error count should get incremented.

      Expected :

      error count stats should get incremented while checking on both sgws using _expvars API. 

      Actual:

      Error count gets incremented on sgw1, but not on sgw2 . 

      Error is thrown and get http exception , bug error counts stats are not getting incremented, warn count stats getting incremented though

       

      SGW test on github: https://github.com/couchbaselabs/mobile-testkit/blob/master/testsuites/syncgateway/functional/topology_specific_tests/multiple_sync_gateways/test_sg_replicate.py#L315

      command to run : 

      pytest -s -rsx --mode=cc --server-version=6.0.1-2037 --sync-gateway-version=2.6.0-117 testsuites/syncgateway/functional/topology_specific_tests/ -k test_sg_replicate_non_existent_db

      SGW logs : Attached .  errorcountstats.zip

      Link to Jenkins failure : http://uberjenkins.sc.couchbase.com:8080/job/cen7-sync-gateway-functional-tests-topology-specific-cc/1805/testReport/testsuites.syncgateway.functional.topology_specific_tests.multiple_sync_gateways/test_sg_replicate/test_sg_replicate_non_existent_db/

       

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          ben.brooks Ben Brooks added a comment -

          Hi Sridevi Saragadam, this looks expected based on changes in CBG-309

           

          We only log errors for HTTP statuses 500 or more. In the case of missing database requests returning 404, this will be logged as an info-level log.

           

          It looks like the logs provided are only for sgw1, so I can't see what is incrementing the warn stat, but it shows why you're getting the error count increased in sgw1, and not sgw2.

              2019-07-24T06:00:41.631Z [INF] HTTP:  #002: POST /_replicate (as ADMIN)
              2019-07-24T06:00:41.631Z [INF] Replicate: Creating replication with parameters {LogFn:0x9a3910 ReplicationId:522618004e1c361fe2732db16254b534 Source:http://192.168.33.11:4985 SourceDb:db1 Channels:[] Target:http://192.168.33.12:4985 TargetDb:db2 ChangesFeedLimit:50 Lifecycle:0 Disabled:false Async:false Stats:<nil>}
              2019-07-24T06:00:41.631Z [INF] Replicate: Started one-shot replication: &{Parameters:{LogFn:0x9a3910 ReplicationId:522618004e1c361fe2732db16254b534 Source:http://192.168.33.11:4985 SourceDb:db1 Channels:[] Target:http://192.168.33.12:4985 TargetDb:db2 ChangesFeedLimit:50 Lifecycle:0 Disabled:false Async:false Stats:0xc0005308c0} Stats:0xc0005308c0 EventChan:0xc0007d0180 NotificationChan:0xc0007d0120 FetchedTargetCheckpoint:{LastSequence: Revision: Id:} Changes:{Results:[] LastSequence:<nil>} RevsDiff:map[] Documents:[] PushedBulkDocs:[]}
          --> 2019-07-24T06:00:41.634Z [INF] HTTP:  #003: GET /db1/_changes?feed=normal&limit=50&heartbeat=30000&style=all_docs&since=0 (as ADMIN)
          --> 2019-07-24T06:00:41.634Z [INF] HTTP: #003:     --> 404 no such database "db1"  (0.1 ms)
              2019-07-24T06:00:41.635Z [ERR] Replication Aborted due to error: FETCH_CHECKPOINT_FAILED -- rest.(*handler).writeError() at handler.go:694
              2019-07-24T06:00:41.635Z [INF] HTTP: #002:     --> 500 Internal error: Replication Aborted due to error: FETCH_CHECKPOINT_FAILED  (3.4 ms)

          The missing databases show info log for 404, as expected. And the actual _repliate request is the one that logged an error, which is only being run on sgw1.

          Show
          ben.brooks Ben Brooks added a comment - Hi Sridevi Saragadam , this looks expected based on changes in CBG-309   We only log errors for HTTP statuses 500 or more. In the case of missing database requests returning 404, this will be logged as an info-level log.   It looks like the logs provided are only for sgw1, so I can't see what is incrementing the warn stat, but it shows why you're getting the error count increased in sgw1, and not sgw2. 2019 - 07 -24T06: 00 : 41 .631Z [INF] HTTP: # 002 : POST /_replicate (as ADMIN) 2019 - 07 -24T06: 00 : 41 .631Z [INF] Replicate: Creating replication with parameters {LogFn: 0x9a3910 ReplicationId:522618004e1c361fe2732db16254b534 Source:http: //192.168.33.11:4985 SourceDb:db1 Channels:[] Target:http://192.168.33.12:4985 TargetDb:db2 ChangesFeedLimit:50 Lifecycle:0 Disabled:false Async:false Stats:<nil>} 2019 - 07 -24T06: 00 : 41 .631Z [INF] Replicate: Started one-shot replication: &{Parameters:{LogFn: 0x9a3910 ReplicationId:522618004e1c361fe2732db16254b534 Source:http: //192.168.33.11:4985 SourceDb:db1 Channels:[] Target:http://192.168.33.12:4985 TargetDb:db2 ChangesFeedLimit:50 Lifecycle:0 Disabled:false Async:false Stats:0xc0005308c0} Stats:0xc0005308c0 EventChan:0xc0007d0180 NotificationChan:0xc0007d0120 FetchedTargetCheckpoint:{LastSequence: Revision: Id:} Changes:{Results:[] LastSequence:<nil>} RevsDiff:map[] Documents:[] PushedBulkDocs:[]} --> 2019 - 07 -24T06: 00 : 41 .634Z [INF] HTTP: # 003 : GET /db1/_changes?feed=normal&limit= 50 &heartbeat= 30000 &style=all_docs&since= 0 (as ADMIN) --> 2019 - 07 -24T06: 00 : 41 .634Z [INF] HTTP: # 003 : --> 404 no such database "db1" ( 0.1 ms) 2019 - 07 -24T06: 00 : 41 .635Z [ERR] Replication Aborted due to error: FETCH_CHECKPOINT_FAILED -- rest.(*handler).writeError() at handler.go: 694 2019 - 07 -24T06: 00 : 41 .635Z [INF] HTTP: # 002 : --> 500 Internal error: Replication Aborted due to error: FETCH_CHECKPOINT_FAILED ( 3.4 ms) The missing databases show info log for 404, as expected. And the actual  _repliate request is the one that logged an error, which is only being run on sgw1.
          Hide
          sridevi.saragadam Sridevi Saragadam added a comment -

          So, I need to make changes to the test , right?

          Show
          sridevi.saragadam Sridevi Saragadam added a comment - So, I need to make changes to the test , right?
          Hide
          ben.brooks Ben Brooks added a comment -

          Assuming your test mostly relies on the exception checking of the replicate request, I would suggest updating your error counting stat to be limited to >= "2.5.0" and < "2.6.0", or remove the stat checking completely.

          Show
          ben.brooks Ben Brooks added a comment - Assuming your test mostly relies on the exception checking of the replicate request, I would suggest updating your error counting stat to be limited to >= "2.5.0" and < "2.6.0" , or remove the stat checking completely.

            People

            • Assignee:
              sridevi.saragadam Sridevi Saragadam
              Reporter:
              sridevi.saragadam Sridevi Saragadam
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty

                  Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.