Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
2.5.2, 2.5.0
-
Security Level: Public
-
CBL platforms - iOS(2.5.2-3), Xamarin-ios (iPhone-plus-8), Xamarin-android (Google Pixel XL API25) and UWP
CBL build - 2.5.2-5
-
Critical
Description
I'm seeing few failures in functional testing jobs. Tests failed with delta-sync enabled and continuous replication.
Command to reproduce the issue
pytest --timeout 1800 --liteserv-version=2.5.2-5 --liteserv-host=localhost --liteserv-port=8080 --delta-sync --no-conflicts --sg-ssl --enable-file-logging --sync-gateway-version=2.5.0 --mode=cc --server-version=6.0.0-1693 --liteserv-platform=xamarin-ios --create-db-per-test=cbl-test testsuites/CBLTester/CBL_Functional_tests/TestSetup_FunctionalTests -k test_replication_configuration_valid_values[1000-True]
I've tried the test with 2.5.0 (xamarin-ios) and I see the same failure. This is a mis from QA as we didn't run other tests with delta-sync flags for 2.5.0.
*Test cases failed - *
test_replication_configuration_valid_values[100-True]
test_replication_configuration_valid_values[1000-True]
test_default_conflict_scenario_delete_wins[sg-True-1]
test_default_conflict_scenario_delete_wins[cbl-True-1]
test_replication_access_revoke_event[10]
test_replication_access_revoke_event[100]
test_replication_filter_access_revoke_document[10]
test_replication_filter_access_revoke_document[100]
test_doc_removal_with_multipleChannels
test_replication_filter_access_revoke_document[1000]
Attachments
- android.zip
- 352 kB
- cbcollect_info.zip
- 7.14 MB
- CBL-2.5.3-2_logs.zip
- 25.01 MB
- python_client.log
- 156 kB
- sgcollect_info.zip
- 11.71 MB
- x-ios.zip
- 357 kB
Activity
Just uploaded x-ios logs, which I have run 2 test cases:
test_replication_configuration_valid_values
test_replication_filter_access_revoke_document
and in the zip file, there are cbl logs, sg logs and pytest logs.
logs are in directories under the test cases
I am seeing the same failures on Android 2.5.2-3. Since I am having some problems to make a local build of TestServer, I used the build from LatestBuilds, adb logcat may not provide enough detail, if needed, I can collect more once I get rid of TestServer build issue on my local env
Jens Alfke should be aware of this, but it looks like the problem is that the base revision for the delta is being calculated incorrectly. The gen 3 revision is coming in and trying to apply itself on the gen 1 revision instead of the gen 2 revision and so the body does not exist anymore. This happens once in a while in Iridium but in Cobalt it is almost every change that this happens on. Fairly easily reproducible and after lunch I will debug some more.
I think I have an idea about what is happening. From what I remember when a new revision comes in the older revisions before them have their bodies removed. This is going to interfere with the replication process in the following way:
1. There are two rapid updates to a single document on Sync Gateway (gen 2 and gen 3)
2. Sync Gateway proposes the gen 2 change to CBL
- [[106,"cbl_2","2-87508ce48fd49c23d9a0823a2eb73477"]]
3. CBL responds with its current revision
- [["1-47f386a444bf0e9fdeffc4402VHY53f0e9df6CyU"]]
4. Sync Gateway sends a delta
- deltaSrc:1-47f386a444bae8cfdeffc4402bf253f0e9df626b:Content-Type:application/json:Profile:rev:id:cbl_2:rev:2-87508ce48fd49c23d9a0823a2eb73477:sequence:106:history:1-47f386a444bae8cfdeffc4402bf253f0e9df626b
5. CBL begins asynchronously inserting the delta
6. Sync Gateway proposes the gen 3 change to CBL
- [[107,"cbl_2","3-83a4076fc0b2b1df7a0c058372d2cf35"]]
7. CBL responds with its (still) current revision
- [["1-47f386a444bf0e9fdeffc4402VHY53f0e9df6CyU"]]
8. CBL finishes inserted the gen 2 revision
9. Sync Gateway sends a delta
- Content-Type:application/json:Profile:rev:id:cbl_2:rev:3-83a4076fc0b2b1df7a0c058372d2cf35:sequence:107:history:2-87508ce48fd49c23d9a0823a2eb73477,1-47f386a444bae8cfdeffc4402bf253f0e9df626b:deltaSrc:1-47f386a444bae8cfdeffc4402bf253f0e9df626b
10. The insertion of generation 2 has invalidated generation 1's body and so the delta recreation is no longer possible.
I'll assign this over to see if Jens Alfke has an idea about where to put the fix for this.
Note: The above sequence has been verified with Wireshark capture (sidenote: Compression on long conversations inside of Wireshark is utterly broken and Sync Gateway does not completely obey the replicator_compression property)
Fix for review: https://github.com/couchbase/couchbase-lite-core/pull/803
(Not tested as a fix for this bug, but the LiteCore unit tests pass.)
I confirm that with this fix my setup no longer shows the issue. A similar PR will be needed for Cobalt because this area was refactored in between so I'll see if I can craft that tomorrow.
Build couchbase-lite-android-2.5.3-5 contains couchbase-lite-android commit a30f341 with commit message:
Update LiteCore submodule (CBL-136)
Build couchbase-lite-android-2.5.3-5 contains couchbase-lite-android-ee commit d1eb6ac with commit message:
Update couchbase-lite-android submodule (CBL-136)
Build couchbase-lite-android-2.5.3-5 contains couchbase-lite-core commit f3fb70e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-android-2.5.3-5 contains couchbase-lite-core commit f3fb70e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-core-2.5.2-17 contains couchbase-lite-core commit f3fb70e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-core-2.5.2-17 contains couchbase-lite-core commit f3fb70e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-log-2.5.0-139 contains couchbase-lite-core commit f3fb70e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-log-2.5.0-139 contains couchbase-lite-core commit f3fb70e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-core-2.6.0-2187 contains couchbase-lite-core commit 779491e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-core-2.6.0-2187 contains couchbase-lite-core commit 779491e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-android-2.6.0-138 contains couchbase-lite-core commit 779491e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-android-2.6.0-138 contains couchbase-lite-core commit 779491e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-net-2.7.0-7 contains couchbase-lite-core commit 779491e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-net-2.7.0-7 contains couchbase-lite-core commit 779491e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-net-2.5.3-2 contains couchbase-lite-core commit f3fb70e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-net-2.5.3-2 contains couchbase-lite-core commit f3fb70e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-net-2.6.0-123 contains couchbase-lite-core commit 779491e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-net-2.6.0-123 contains couchbase-lite-core commit 779491e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
I'm still seeing the failure with delta sync enable with 2.5.3-2 build for iOS. Added logs for run -
pytest -rsx -s --skip-provisioning --timeout 3600 --liteserv-version=2.5.3-2 --liteserv-host=localhost --liteserv-port=8080 --no-conflicts --enable-file-logging --delta-sync --sg-ssl --sync-gateway-version=2.6.0-110 --mode=cc --server-version=6.0.1-2037 --liteserv-platform=ios --create-db-per-test=cbl-test -k test_doc_removal_with_multipleChannels testsuites/CBLTester/CBL_Functional_tests/TestSetup_FunctionalTests
I don't see any evidence that the logs you have attached correspond to the same issue. The error message that indicated CBL-136 is not present in the client logs. Do any of the other tests mentioned in this ticket fail with 2.5.3-2?
Hemant Rajput Are all the test cases mentioned above in the description still failing with the 2.5.3 build?
Build couchbase-lite-cblite-2.6.0-106 contains couchbase-lite-core commit 792b070 with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-cblite-2.6.0-106 contains couchbase-lite-core commit 792b070 with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-android-2.7.0-5 contains couchbase-lite-core commit 792b070 with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-android-2.7.0-5 contains couchbase-lite-core commit 792b070 with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Hemant Rajput Could you please provide more info about the failed tests that Priya Rajagopal and Jim Borden have asked?
Build couchbase-lite-core-2.7.0-11 contains couchbase-lite-core commit 792b070 with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-core-2.7.0-11 contains couchbase-lite-core commit 792b070 with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-log-2.6.0-138 contains couchbase-lite-core commit 792b070 with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-log-2.6.0-138 contains couchbase-lite-core commit 792b070 with commit message:
Fix for super-fast pulls with deltas (CBL-136)
I'm seeing failure for these tests
test_default_conflict_scenario_delete_wins[sg-True-1] - STILL FAILING WITH 2.5.3-2
test_default_conflict_scenario_delete_wins[cbl-True-1] - STILL FAILING WITH 2.5.3-2
test_doc_removal_with_multipleChannels - STILL FAILING WITH 2.5.3-2
test_replication_access_revoke_event[10] - STILL FAILING WITH 2.5.3-2
test_replication_access_revoke_event[100] - STILL FAILING WITH 2.5.3-2
test_replication_filter_access_revoke_document[10] - STILL FAILING WITH 2.5.3-2
test_replication_filter_access_revoke_document[100] - STILL FAILING WITH 2.5.3-2
test_replication_filter_access_revoke_document[1000] - STILL FAILING WITH 2.5.3-2
From the mentioned failure in Test summary only below tests passed -
test_replication_configuration_valid_values[100-True] - PASSED
test_replication_configuration_valid_values[1000-True] - PASSED
Added logs for
pytest -rsx -s --timeout 3600 --liteserv-version=2.5.3-2 --liteserv-host=localhost --liteserv-port=8080 --no-conflicts --enable-file-logging --delta-sync --sg-ssl --sync-gateway-version=2.6.0-110 --mode=cc --server-version=6.0.1-2037 --liteserv-platform=ios --create-db-per-test=cbl-test -k test_default_conflict_scenario_delete_wins[sg-True-1] testsuites/CBLTester/CBL_Functional_tests/TestSetup_FunctionalTests/
Added logs for rest of test failure as well. Also, failures are not restricted to just iOS, we are seeing them in all other platforms as well.
Pasin Suriyentrakorn There is no evidence of any error in the logs that I looked at. I modified my previous test to run the scenario in the test_default_conflict_scenario_delete_wins case ('sg', True, 1) and there was no problem (delta sync was enabled and I confirmed deltas were being generated via Wireshark). Let me know what you find out tomorrow as well.
I have looked at both CBL and SG logs for the test_default_conflict_scenario_delete_wins test and I didn't see any issues from the logs. All the re-created docs (rev 4-xxx) were pushed to the SG but somehow (from the python_client.log), the result of getting _all_docs from SG returns an empty rows result. I think we need to look at the SG and to see why.
Can QE reproduce the issue by running test_default_conflict_scenario_delete_wins test and get the raw doc info of (one) document (:4985/[db]/_raw/[docid])?
Actually this is starting to look like CBL-110 (which was not fixed for Iridium but for Cobalt right?)
There are two new issues here based on the new logs provided.
1. test_default_conflict_scenario_delete_wins: This seems to be a duplicate of CBL-110.
2. test_replication_access_revoke_event and test_replication_filter_access_revoke_document: It seems like there is an issue related to delta sync and _removed revision. (Please create a new issue and make sure to enable debug log on CBL side).
For (2) above in Pasin's comment, it looks like a bug on the sync gateway side: I filed CBG-449 so that the SG team can have a look.
Changed issue summary, as the bug is not related to only continuous replication.
Added logs for test_default_conflict_scenario_delete_wins with raw docs.
Attachement name - test_default_conflict_scenario_delete_wins[sg-True-1] _logs_with_sg_raw_docs.zip
PS: We can't run CBL Release app with debug log. Confirmed it Pasin.
Hemant Rajput, Pasin Suriyentrakorn, can we close this ticket as CBG-449 is filed?
I want to have Ben Brooks or Adam Fraser confirm that it is legitimate first.
Looks like this change did fix some issues so shouldn't close this as a duplicate to the SG bug.
Build couchbase-lite-ios-2.7.0-12 contains couchbase-lite-core commit 792b070 with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-ios-2.7.0-12 contains couchbase-lite-core commit 792b070 with commit message:
Fix for super-fast pulls with deltas (CBL-136)
I still see failures on delete_wins tests. Revoke access tests passed
Did you use a build of SG with a fix for CBG-449? Because this issue is not a CBL only fix.
Oh I read the comment backwards. The delete wins test should work for Cobalt but not for 2.5.3.
Yeap Jim, forgot to update here that tests passed on 2.6.0, but not 2.5.3. It is not a new issue
Build couchbase-lite-net-2.7.0-11 contains couchbase-lite-core commit 792b070 with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-ios-2.6.0-140 contains couchbase-lite-core commit 779491e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-cblite-2.6.0-213 contains couchbase-lite-core commit 779491e with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-log-2.7.0-79 contains couchbase-lite-core commit 792b070 with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Build couchbase-lite-cblite-2.7.0-5 contains couchbase-lite-core commit 792b070 with commit message:
Fix for super-fast pulls with deltas (CBL-136)
Where are the Couchbase Lite logs?