Uploaded image for project: 'Couchbase Lite'
  1. Couchbase Lite
  2. CBL-4445

Replicator is stuck in busy state when there is an error thrown while applying delta to create full fleece doc

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 3.1.1
    • 3.0.2
    • LiteCore
    • Security Level: Public
    • LiteCore 108
    • 2

    Description

      According to CBSE-14057, the replicator was stuck in the busy state and from the log there are 3 Invalid delta Exception throw as the following example:

       

      2023-03-27 09:54:03.747320-0700 InspectQA[86248:5418129] CouchbaseLite Replicator Verbose: {IncomingRev#166} Need to apply delta immediately for 'corrective-assignment-gc-120598123' #3-8f07875796c39d780a151b87ae8895ae ...
      2023-03-27 09:54:03.747316-0700 InspectQA[86248:5418133] CouchbaseLite Network Verbose:     (...sent 136 bytes)
      2023-03-27 09:54:03.747493-0700 InspectQA[86248:5392713] CouchbaseLite Replicator Verbose: {IncomingRev#164} Received revision 'corrective-assignment-gc-122506241' #3-b481f098e7056e8865f4bc1314429f42 (seq '59105725')
      2023-03-27 09:54:04.422017-0700 InspectQA[86248:5418130] CouchbaseLite Network Verbose: {Connection#4} Finished receiving 'changes' REQ #15382 Z
      2023-03-27 09:54:04.422196-0700 InspectQA[86248:5417861] CouchbaseLite Database Verbose: {DB#22} commit transaction
      2023-03-27 09:54:04.422388-0700 InspectQA[86248:5418129] CouchbaseLite Replicator ERROR: {IncomingRev#166} Threw C++ exception: Invalid delta

      From the log, after a while, here is the busy status:

      2023-03-27 10:01:05.079499-0700 InspectQA[86248:5429221] CouchbaseLite Replicator Info: CBLReplicator[<*> URL[wss://mradqa.digicat.cloud.pge.com/sync_gateway/asset360]] is busy, progress 15160/15163, error: (null) 

      Analysis:

      1. Here is the line that throw the exception. I do not know from the log about the root cause of "invalid delta"

      https://github.com/couchbase/couchbase-lite-core/blob/release/lithium/Replicator/DBAccess.cc#L384

      2. From the log message "Need to apply delta immediately for", the code path are as follows:

      https://github.com/couchbase/couchbase-lite-core/blob/release/lithium/Replicator/IncomingRev.cc#L174-L176

      3. To reproduce the issue, instead of trying to make the replicator failed with delta error, I manually change the code to throw an error from the inside IncomingRev::parseAndInsert(alloc_slice jsonBody) function here. 

      https://github.com/couchbase/couchbase-lite-core/blob/release/lithium/Replicator/IncomingRev.cc#L159

      After that running a pull replication to pull a few docs from db-to-db replication, the replicator will end up stuck in busy state.

      4. If there is an exception thrown from `IncomingRev::parseAndInsert(alloc_slice jsonBody) function `, the error doesn't get handled properly. Normally, when there is an error, not exception, the error and the pull rev will be handled in failWithError().

      Things to discuss:

      1. When there is an exception thrown from applyDelta(), is it permanent or recoverable error?

      2. When there is an exception thrown from parseAndInsert() besides from applyDelta(), is it permanent or recoverable error?

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              jianmin.zhao Jianmin Zhao
              jianmin.zhao Jianmin Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty