Uploaded image for project: 'Couchbase Lite'
  1. Couchbase Lite
  2. CBL-1920

Crash when a query is destructed and unregistered from the database

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.8.0
    • 3.0
    • iOS
    • Security Level: Public
    • None
    • Jay 51, Jay 52, Jay 53
    • 3

    Description

      There are a few users reporting the same issue that they experience multiple crashes in the query when the query is released. All of them said that they don't close the database in their application either. Currently, they don't have reproducible steps and mostly the issue happens to their end-users.

      As this happens to multiple users, I think we should at least review the issue and code from our side.

      Issue: https://github.com/couchbase/couchbase-lite-ios/issues/2781

      Crash Log: https://pastebin.com/raw/dBwFGR4d

      Exception : EXC_BAD_ACCESS KERN_INVALID_ADDRESS

      Forum Post: https://forums.couchbase.com/t/c4error-code-2-ios-application-crash/29921/5

      Attachments

        1. bt-all.txt
          13 kB
        2. swift.zip
          32.50 MB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            From Jim Borden

            > It looks like what happened is that _keyStore->dataFile() returned null

            > Nevermind about the null there, it looks like someone in the middle was posting a different issue
             
            > It's still a possibility though since that last frame is inlined
             
            > It's a reference so it cannot exactly become null but it might have been deleted for some reason....)

            > I get a similar error by calling unregisterQuery on a null data file

            pasin Pasin Suriyentrakorn added a comment - From Jim Borden > It looks like what happened is that  _keyStore->dataFile()  returned null > Nevermind about the null there, it looks like someone in the middle was posting a different issue   > It's still a possibility though since that last frame is inlined   > It's a reference so it cannot exactly become null but it might have been deleted for some reason....) > I get a similar error by calling unregisterQuery on a null data file

            One user suspects that the issue might have something related to the closing of the database own by the replicator. From my understanding is that the replicator has its own copy (reopenned) of the database so the closing of the database by the replicator shouldn't be related. BUT I want to post the info here as I don't know if there is anything that could be shared between the original database and the copied version. 

            pasin Pasin Suriyentrakorn added a comment - One user suspects that the issue might have something related to the closing of the database own by the replicator. From my understanding is that the replicator has its own copy (reopenned) of the database so the closing of the database by the replicator shouldn't be related. BUT I want to post the info here as I don't know if there is anything that could be shared between the original database and the copied version. 
            Jayahari.Vavachan Jay Vavachan added a comment - - edited

            Attached the todo app swift.zip to reproduce the issue reported.

            • only on device
            • disable logs
            • if ran via debug build, change the optimizations to match the release build.

            1. logged in
            2. create a list
            3. create some tasks.
            4. keep the tasks list page open

            Jayahari.Vavachan Jay Vavachan added a comment - - edited Attached the todo app swift.zip to reproduce the issue reported. only on device disable logs if ran via debug build, change the optimizations to match the release build. 1. logged in 2. create a list 3. create some tasks. 4. keep the tasks list page open
            Jayahari.Vavachan Jay Vavachan added a comment -

            Attached the back trace with details bt-all.txt

            Jayahari.Vavachan Jay Vavachan added a comment - Attached the back trace with details bt-all.txt
            pasin Pasin Suriyentrakorn added a comment - - edited

            As the crash also happens in the unordered_set when creating a new query as well. I feel like the issue happens due to concurrent access to the unordered_set which is not thread-safe by default.

            Jay Vavachan You can try to synchronize this line with the database's lock and see if that helps. If that works, we may need to look to see if there is a better way to fix this or not.

            pasin Pasin Suriyentrakorn added a comment - - edited As the crash also happens in the unordered_set when creating a new query as well. I feel like the issue happens due to concurrent access to the unordered_set which is not thread-safe by default. Jay Vavachan You can try to synchronize this line with the database's lock and see if that helps. If that works, we may need to look to see if there is a better way to fix this or not.

            I have tried adding a small delay when calling the `performDBQuery` from within, which seems to be resolving the issue.

                private func performDBQuery() {
                     DBManager.shared.runQuery()
             
            //// add a small delay 0.01
                    DispatchQueue.global().asyncAfter(deadline: .now() + 0.01, execute: {
                        self.performDBQuery()
                    })
                 }
            

            Jayahari.Vavachan Jay Vavachan added a comment - I have tried adding a small delay when calling the `performDBQuery` from within, which seems to be resolving the issue. private func performDBQuery() { DBManager.shared.runQuery()   //// add a small delay 0.01 DispatchQueue.global().asyncAfter(deadline: .now() + 0.01, execute: { self.performDBQuery() }) }

            Adding performDBQuery() into the queue without delay should be OK for this test. I think it helps to prevent the crash because it makes the calls to performDBQuery() become serial.

            pasin Pasin Suriyentrakorn added a comment - Adding performDBQuery() into the queue without delay should be OK for this test. I think it helps to prevent the crash because it makes the calls to performDBQuery() become serial.

            From the crash log, I think the problem is about concurrent calls to DataFile::registerQuery() and DataFile::unregisterQuery(). Currently, we lock when creating c4query and when releasing c4query so those two operations should be fine. I'm suspecting that we will need to lock when releasing the query's enumerator (https://github.com/couchbase/couchbase-lite-ios/blob/master/Objective-C/CBLQueryResultSet.mm#L46) as well as the C4QueryEnumeratorImpl also retains the Query object.

            pasin Pasin Suriyentrakorn added a comment - From the crash log, I think the problem is about concurrent calls to DataFile::registerQuery() and DataFile::unregisterQuery(). Currently, we lock when creating c4query and when releasing c4query so those two operations should be fine. I'm suspecting that we will need to lock when releasing the query's enumerator ( https://github.com/couchbase/couchbase-lite-ios/blob/master/Objective-C/CBLQueryResultSet.mm#L46 ) as well as the C4QueryEnumeratorImpl also retains the Query object.

            Jay Vavachan I'm reopening this issue as I think this is the real issue not because of the user's test code. Also, there are multiple people getting this issue.

            pasin Pasin Suriyentrakorn added a comment - Jay Vavachan I'm reopening this issue as I think this is the real issue not because of the user's test code. Also, there are multiple people getting this issue.
            pasin Pasin Suriyentrakorn added a comment - - edited

            From Thread Safety Doc (https://github.com/couchbase/couchbase-lite-core/wiki/Thread-Safety) c4queryenum_release is not included in the database exclusive list. Given that the enumerator also retains the query object, I suspect that c4queryenum_release() might need to be in the database exclusive list as well to prevent concurrent access to DataFile's queries (unsorted_set) object when DataFile::registerQuery() and DataFile::unregisterQuery() are called.

            pasin Pasin Suriyentrakorn added a comment - - edited From Thread Safety Doc ( https://github.com/couchbase/couchbase-lite-core/wiki/Thread-Safety ) c4queryenum_release is not included in the database exclusive list. Given that the enumerator also retains the query object, I suspect that c4queryenum_release() might need to be in the database exclusive list as well to prevent concurrent access to DataFile's queries (unsorted_set) object when DataFile::registerQuery() and DataFile::unregisterQuery() are called.
            jianmin.zhao Jianmin Zhao added a comment -

            Agree with Pasin Suriyentrakorn: the exclusive list in the doc should include c4queryenum_release. Consequently, 

            "c4queryenum_release(_enumerator)" needs to be protected.

            jianmin.zhao Jianmin Zhao added a comment - Agree with Pasin Suriyentrakorn : the exclusive list in the doc should include c4queryenum_release. Consequently,  "c4queryenum_release(_enumerator)" needs to be protected.

            Build couchbase-lite-ios-3.0.0-238 contains couchbase-lite-ios commit bce4217 with commit message:
            CBL-1920: Protect the c4queryenum_release from multiple threads accessing it (#2873)

            build-team Couchbase Build Team added a comment - Build couchbase-lite-ios-3.0.0-238 contains couchbase-lite-ios commit bce4217 with commit message: CBL-1920 : Protect the c4queryenum_release from multiple threads accessing it (#2873)

            People

              Jayahari.Vavachan Jay Vavachan
              pasin Pasin Suriyentrakorn
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty