Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.6.0
-
Untriaged
-
0
-
Unknown
Description
Seeing the following 500 unexpected server error on recent runs of XDCR collection suite:
2023-11-15 23:21:32 | ERROR | MainProcess | test_thread | [on_prem_rest_client._http_request] POST http://172.23.109.195:8091/controller/createReplication body: replicationType=continuous&toBucket=default&fromBucket=default&toCluster=remote_cluster_C1-C2&type=xmem headers: {'Content-Type': 'application/x-www-form-urlencoded', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Accept': '*/*'} error: 500 reason: unknown b'["Unexpected server error, request logged."]' auth: Administrator:password |
Log file: http://qa.sc.couchbase.com/job/test_suite_executor/638143/consoleText
Test: xdcr.checkpointXDCR.XDCRCheckpointUnitTest.test_source_bucket_delete_recreate Input params:
{'rdirection': 'unidirection', 'topology': 'chain', 'replication_type': 'xmem', 'java_sdk_client': 'True', 'fail_on_errors': '1', 'get-cbcollect-info': 'False', 'sirius_url': 'http://172.23.120.103:4000', 'ini': '/data/workspace/debian-p0-xdcr-vset00-00-collections_7.0_P0/testexec.37151.ini', 'cluster_name': 'testexec.37151', 'spec': 'py-xdcr-collections-P0', 'conf_file': 'xdcr/py-xdcr-collections-P0.conf', 'num_nodes': 8, 'case_number': 1, 'total_testcases': 27, 'last_case_fail': 'False', 'teardown_run': 'False', 'logs_folder': '/data/workspace/debian-p0-xdcr-vset00-00-collections_7.0_P0/logs/testrunner-23-Nov-15_23-14-19/test_1'} |
Number of clusters: 8
Source node: 172.23.109.195:8091
Target node: 172.23.109.33:8091
Steps to reproduce:
1. Setup 2 clusters of 4 nodes each.
2. Create bucket default, with a scope (scope_1) and collection (collection_1) in each cluster.
3. Add remote cluster reference of target cluster on source cluster
4. Create a continuous replication (xmem) from C1->C2
5. Update checkpoint interval to 60 on bucket default in source cluster
6. Wait 10 seconds and grep for "num_failedckpts", result should be 0
7. Add mutation in vb0 in source cluster, and verify checkpoint record:
{'failover_uuid': 194689296992387, 'seqno': 7, 'dcp_snapshot_seqno': 7, 'dcp_snapshot_end_seqno': 7, 'target_vb_opaque': {'target_vb_uuid': 254685919091719}, 'target_seqno': 7, 'filtered_items_cnt': 0, 'filtered_failed_cnt': 0, 'expirations_filtered_cnt': 0, 'deletions_filtered_cnt': 0, 'set_filtered_cnt': 0, 'expiry_stripped_cnt': 0, 'binary_docs_filtered_cnt': 0, 'ATR_docs_filtered_cnt': 0, 'client_txn_records_filtered_cnt': 0, 'docs_with_txn_xattrs_filtered_cnt': 0, 'mobile_records_filtered_cnt': 0, 'docs_filtered_on_user_defined_filters_cnt': 0, 'source_manifest_dcp': 3, 'source_manifest_backfill_mgr': 3, 'target_manifest': 0, 'brokenCollectionsMapSha256': '', 'creationTime': 1700119073, 'guardrail_resident_ratio_cnt': 0, 'guardrail_data_size_cnt': 0, 'guardrail_disk_space_cnt': 0} |
8. Delete bucket default on source node.
9. Wait for 60 seconds and create a new default bucket with 1 scope and 1 collection.
10. Try to create replication from source node to target node.
11. Replication creation fails with internal server error:
2023-11-15 23:21:32 | ERROR | MainProcess | test_thread | [on_prem_rest_client._http_request] POST http://172.23.109.195:8091/controller/createReplication body: replicationType=continuous&toBucket=default&fromBucket=default&toCluster=remote_cluster_C1-C2&type=xmem headers: {'Content-Type': 'application/x-www-form-urlencoded', 'Authorization': 'Basic QWRtaW5pc3RyYXRvcjpwYXNzd29yZA==', 'Accept': '*/*'} error: 500 reason: unknown b'["Unexpected server error, request logged."]' auth: Administrator:password |
This has been coming up consistently in recent runs of 7.6.0-1767.
Attachments
Issue Links
- is a backport of
-
MB-59696 XDCR - Unexpected server error (500) on collections suite
- Closed