Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
7.6.0
-
Untriaged
-
0
-
Unknown
Description
Windows XDCR tests hit this issue consistently that xdcr_changes_left_total can't reach 0 for hours. The runs were aborted after running for 10 hours.
I saw the following error messages in the log.
Service 'goxdcr' exited with status 3221226505. Restarting. Messages: 2023-10-09T15:23:55.256-07:00 INFO GOXDCR.PipelineMgr: checking pipeline spec=Id: 6e8f258c854d7383633a5859bb294794/bucket-1/bucket-1 InternalId: wF0TRce_IvQhZq5563cewA== SourceBucketName: bucket-1 SourceBucketUUID: e271402181523e3cd61765d3ba1f9a87 TargetClusterUUID: 6e8f258c854d7383633a5859bb294794 TargetBucketName: bucket-1 TargetBucketUUID: 90f7340424e20a6d3fa7f5ed1335ae00 Settings: map[CollectionsMgtMulti:ExplicitMapping: false Mirroring: false Migration: false OSO: true active:true backlogThreshold:50 bandwidth_limit:0 checkpoint_interval:600 ckptSvcCacheEnabled:true colMappingRules:map[] collectionsSkipSrcValidation:false compression_type:3 dcpEnablePurgeRollback:false delAllBackfills:false delSpecificBackfillForVb:-1 dismissEvent:-1 doc_batch_size_kb:2048 failure_restart_interval:10 filterSystemScope:true filter_exp_del:0 filter_expression: filter_expression_version:0 filter_skip_restream:false hlvPruningWindowSec:259200 jsFunctionTimeoutMs:20000 log_level:Info manualBackfill: mergeFunctionMapping:map[] mobile:1 optimistic_replication_threshold:256 preReplicateVBMasterCheck:true priority:High replicateCkptIntervalMin:20 replication_type:xmem retryOnErrExceptAuthErrMaxWaitSec:360 retryOnRemoteAuthErr:true retryOnRemoteAuthErrMaxWaitSec:360 source_nozzle_per_node:2 stats_interval:1000 target_nozzle_per_node:4 worker_batch_size:500 xdcrDevBackfillReplUpdateDelayMs:0 xdcrDevBackfillRollbackTo0VB:-1 xdcrDevBackfillSendDelayMs:0 xdcrDevBucketTopologyLegacyDelay:0 xdcrDevCkptMgrForceGCWaitSec:0 xdcrDevColManifestSvcDelaySec:0 xdcrDevMainRollbackTo0VB:-1 xdcrDevMainSendDelayMs:0 xdcrDevNsServerPort:0], source bucket uuid=e271402181523e3cd61765d3ba1f9a87 2023-10-09T15:23:55.256-07:00 INFO GOXDCR.ResourceMgr: Resource Manager State = <nil> 2023-10-09T15:23:55.258-07:00 INFO GOXDCR.ResourceMgr: backlogCount=0, noBacklogCount=0 extraQuota=false cpuNotMaxedCount=0 throughputDropCount=0 2023-10-09T15:23:55.258-07:00 INFO GOXDCR.ResourceMgr: DcpPriorityMap=map[] ongoingReplMap=map[]
Build: 7.6.0-1607
Job:
https://perf.jenkins.couchbase.com/job/zeus/11249/
https://perf.jenkins.couchbase.com/job/zeus/11228/
Logs:
source:
https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2023-10-09T222332-ns_1%40zeus-srv-01.perf.couchbase.com.zip
https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2023-10-09T222332-ns_1%40zeus-srv-02.perf.couchbase.com.zip
target:
https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2023-10-09T222428-ns_1%40zeus-srv-03.perf.couchbase.com.zip
https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2023-10-09T222428-ns_1%40zeus-srv-04.perf.couchbase.com.zip
2023-10-06T18:21:55 [INFO] Monitoring XDCR queues: bucket-1
2023-10-06T18:22:18 [INFO] xdcr_changes_left_total = 12,071,937
...
2023-10-07T03:55:52 [INFO] xdcr_changes_left_total = 10,542,397
Build timed out (after 600 minutes). Marking the build as aborted.
The last good run we have was running on 7.6.0-1419. The run finished within 90 minutes.
Attachments
Issue Links
- duplicates
-
MB-60021 [XDCR][Windows] : Service 'goxdcr' exited with status 3221226505
- Closed
- is duplicated by
-
MB-60351 [Windows] : Many dumps seen due to process js-evaluator.exe
- Closed
-
MB-59928 goxdcr crashes repeatedly due to js-evaluator
- Closed
-
MB-60020 [Query][Windows] : panic: runtime error: index out of range [3] with length 0
- Closed
-
MB-60022 [Query][Windows] : panic: runtime error: slice bounds out of range [251658240:60]
- Closed
-
MB-60057 [Query][Windows] panic: runtime error: invalid memory address or nil pointer dereference
- Closed
- relates to
-
MB-59931 js-evaluator causes goxdcr to repeatedy crash
- Closed