Uploaded image for project: 'Couchbase Gateway'
  1. Couchbase Gateway
  2. CBG-520

Initialization race for channel cache validFrom

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.7.0
    • Component/s: SyncGateway
    • Security Level: Public
    • Labels:
      None
    • Required Mobile Fields:
      Hide
      Mandatory:
       - CBL / SG Version:
         - SG Config:
       - Steps to Reproduce:
       - Actual Result:
       - Expected Result:
       - Logs :
            SGW LOGS: sgcollect info
            CBL LOGS:
            Logcat LOGS: for Android tickets
       - Github link for the code:
       - Jenkins job failure link:
       - Pytest Command
       - What is the last build this test passed:
      Show
      Mandatory:  - CBL / SG Version:    - SG Config:  - Steps to Reproduce:  - Actual Result:  - Expected Result:  - Logs :       SGW LOGS: sgcollect info       CBL LOGS:       Logcat LOGS: for Android tickets  - Github link for the code:  - Jenkins job failure link:  - Pytest Command  - What is the last build this test passed:
    • Sprint:
      CBG Sprint 32, CBG Sprint 33, CBG Sprint 34, CBG Sprint 35
    • Story Points:
      3

      Description

      TestContinuousChangesBackfill is failing intermittently due to doc-1 not being sent over changes.

      http://mobile.jenkins.couchbase.com/job/sgw-unix-build/9909/consoleFull#283334476db2b220c-562c-42fc-9783-a473165a58a0

        Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          adamf Adam Fraser created issue -
          adamf Adam Fraser made changes -
          Field Original Value New Value
          Rank Ranked higher
          Hide
          adamf Adam Fraser added a comment -

          The test logging suggests that doc-1 (seq #1) is being received over DCP as expected (based on sequence buffering messages), but isn't ending up in the cache (based on Cache: GetCachedChanges logging. Suggests a potential race between per-channel cache initialization (triggered by the changes request), and doc-1 arriving over DCP. SG should handle all variations of timing here - need to identify which scenario isn't covered properly.

          Show
          adamf Adam Fraser added a comment - The test logging suggests that doc-1 (seq #1) is being received over DCP as expected (based on sequence buffering messages), but isn't ending up in the cache (based on Cache: GetCachedChanges logging. Suggests a potential race between per-channel cache initialization (triggered by the changes request), and doc-1 arriving over DCP. SG should handle all variations of timing here - need to identify which scenario isn't covered properly.
          Hide
          adamf Adam Fraser added a comment -

          There's another failure in TestChangesAfterChannelAdded that could be potentially related - needs review.
          http://mobile.jenkins.couchbase.com/view/Sync_Gateway/job/sgw-unix-build/9915/consoleFull#-49449668db2b220c-562c-42fc-9783-a473165a58a0

          Show
          adamf Adam Fraser added a comment - There's another failure in TestChangesAfterChannelAdded that could be potentially related - needs review. http://mobile.jenkins.couchbase.com/view/Sync_Gateway/job/sgw-unix-build/9915/consoleFull#-49449668db2b220c-562c-42fc-9783-a473165a58a0
          adamf Adam Fraser made changes -
          Assignee The One [ the one ] Adam Fraser [ adamf ]
          adamf Adam Fraser made changes -
          Sprint CBG Sprint 32 [ 899 ]
          adamf Adam Fraser made changes -
          Rank Ranked lower
          adamf Adam Fraser made changes -
          Sprint CBG Sprint 32 [ 899 ] CBG Sprint 32, CBG Sprint 33 [ 899, 912 ]
          adamf Adam Fraser made changes -
          Rank Ranked lower
          adamf Adam Fraser made changes -
          Sprint CBG Sprint 32, CBG Sprint 33 [ 899, 912 ] CBG Sprint 32, CBG Sprint 33, CBG Sprint 34 [ 899, 912, 918 ]
          adamf Adam Fraser made changes -
          Rank Ranked lower
          adamf Adam Fraser made changes -
          Sprint CBG Sprint 32, CBG Sprint 33, CBG Sprint 34 [ 899, 912, 918 ] CBG Sprint 32, CBG Sprint 33, CBG Sprint 34, CBG Sprint 35 [ 899, 912, 918, 925 ]
          adamf Adam Fraser made changes -
          Status Open [ 1 ] In Progress [ 3 ]
          adamf Adam Fraser made changes -
          Summary TestContinuousChangesBackfill failure Initialization race for channel cache validFrom
          Hide
          adamf Adam Fraser added a comment -

          There was an initialization race related to validFrom when instantiating a new channel cache, between AddToCache and addChannelCache.

          A. AddToCache:
          1. If any of the mutation's channels are active in the channel cache map, add entry to the cache
          2. Acquire the seqLock, and update highCachedSequence

          B. addChannelCache:
          1. Acquire the seqLock. Add a new entry to the channel cache map, with validFrom = highCachedSequence + 1

          In the case where B1 happens between A1 and A2, the validFrom for the new singleChannelCache will be invalid, as the sequence associated with the mutation being processed is greater than validFrom, but won't be in the cache.

          To address, addToCache will need to hold a mutex across A1 and A2. It's preferred to not use seqLock for this, as it will introduce contention with changes processing that needs to retrieve highCachedSequence (independent of changes to the channel cache map). Instead, a new 'validFromLock' can be used. This shouldn't introduce significant contention on the caching side, as addToCache is already single-threaded (inside processEntry).

          Show
          adamf Adam Fraser added a comment - There was an initialization race related to validFrom when instantiating a new channel cache, between AddToCache and addChannelCache. A. AddToCache: 1. If any of the mutation's channels are active in the channel cache map, add entry to the cache 2. Acquire the seqLock, and update highCachedSequence B. addChannelCache: 1. Acquire the seqLock. Add a new entry to the channel cache map, with validFrom = highCachedSequence + 1 In the case where B1 happens between A1 and A2, the validFrom for the new singleChannelCache will be invalid, as the sequence associated with the mutation being processed is greater than validFrom, but won't be in the cache. To address, addToCache will need to hold a mutex across A1 and A2. It's preferred to not use seqLock for this, as it will introduce contention with changes processing that needs to retrieve highCachedSequence (independent of changes to the channel cache map). Instead, a new 'validFromLock' can be used. This shouldn't introduce significant contention on the caching side, as addToCache is already single-threaded (inside processEntry).
          Automated transition triggered when Adam Fraser created pull request #4346 in GitHub -
          Status In Progress [ 3 ] In Review [ 10107 ]
          Automated transition triggered when Adam Fraser merged pull request #4346 in GitHub -
          Resolution Fixed [ 1 ]
          Status In Review [ 10107 ] Resolved [ 5 ]
          Hide
          build-team Couchbase Build Team added a comment -

          Build sync_gateway-2.7.0-138 contains sync_gateway commit c582819 with commit message:
          CBG-520 Add mutex for validFrom handling (#4346)

          Show
          build-team Couchbase Build Team added a comment - Build sync_gateway-2.7.0-138 contains sync_gateway commit c582819 with commit message: CBG-520 Add mutex for validFrom handling (#4346)
          adamf Adam Fraser made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            Assignee:
            adamf Adam Fraser
            Reporter:
            adamf Adam Fraser
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty