Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-59031

xdcr_changes_left_total in Windows XDCR tests can't reach 0 for hours

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Critical
    • 7.6.0
    • 7.6.0
    • XDCR
    • Untriaged
    • 0
    • Unknown

    Description

      Windows XDCR tests hit this issue consistently that xdcr_changes_left_total can't reach 0 for hours. The runs were aborted after running for 10 hours.

      I saw the following error messages in the log.

      Service 'goxdcr' exited with status 3221226505. Restarting. Messages: 2023-10-09T15:23:55.256-07:00 INFO GOXDCR.PipelineMgr: checking pipeline spec=Id: 6e8f258c854d7383633a5859bb294794/bucket-1/bucket-1 InternalId: wF0TRce_IvQhZq5563cewA== SourceBucketName: bucket-1 SourceBucketUUID: e271402181523e3cd61765d3ba1f9a87 TargetClusterUUID: 6e8f258c854d7383633a5859bb294794 TargetBucketName: bucket-1 TargetBucketUUID: 90f7340424e20a6d3fa7f5ed1335ae00 Settings: map[CollectionsMgtMulti:ExplicitMapping: false Mirroring: false Migration: false OSO: true active:true backlogThreshold:50 bandwidth_limit:0 checkpoint_interval:600 ckptSvcCacheEnabled:true colMappingRules:map[] collectionsSkipSrcValidation:false compression_type:3 dcpEnablePurgeRollback:false delAllBackfills:false delSpecificBackfillForVb:-1 dismissEvent:-1 doc_batch_size_kb:2048 failure_restart_interval:10 filterSystemScope:true filter_exp_del:0 filter_expression: filter_expression_version:0 filter_skip_restream:false hlvPruningWindowSec:259200 jsFunctionTimeoutMs:20000 log_level:Info manualBackfill: mergeFunctionMapping:map[] mobile:1 optimistic_replication_threshold:256 preReplicateVBMasterCheck:true priority:High replicateCkptIntervalMin:20 replication_type:xmem retryOnErrExceptAuthErrMaxWaitSec:360 retryOnRemoteAuthErr:true retryOnRemoteAuthErrMaxWaitSec:360 source_nozzle_per_node:2 stats_interval:1000 target_nozzle_per_node:4 worker_batch_size:500 xdcrDevBackfillReplUpdateDelayMs:0 xdcrDevBackfillRollbackTo0VB:-1 xdcrDevBackfillSendDelayMs:0 xdcrDevBucketTopologyLegacyDelay:0 xdcrDevCkptMgrForceGCWaitSec:0 xdcrDevColManifestSvcDelaySec:0 xdcrDevMainRollbackTo0VB:-1 xdcrDevMainSendDelayMs:0 xdcrDevNsServerPort:0], source bucket uuid=e271402181523e3cd61765d3ba1f9a87 2023-10-09T15:23:55.256-07:00 INFO GOXDCR.ResourceMgr: Resource Manager State = <nil> 2023-10-09T15:23:55.258-07:00 INFO GOXDCR.ResourceMgr: backlogCount=0, noBacklogCount=0 extraQuota=false cpuNotMaxedCount=0 throughputDropCount=0 2023-10-09T15:23:55.258-07:00 INFO GOXDCR.ResourceMgr: DcpPriorityMap=map[] ongoingReplMap=map[]

      Build: 7.6.0-1607

      Job:

      https://perf.jenkins.couchbase.com/job/zeus/11249/

      https://perf.jenkins.couchbase.com/job/zeus/11228/ 

      Logs:

      source:
      https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2023-10-09T222332-ns_1%40zeus-srv-01.perf.couchbase.com.zip
      https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2023-10-09T222332-ns_1%40zeus-srv-02.perf.couchbase.com.zip
      target:
      https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2023-10-09T222428-ns_1%40zeus-srv-03.perf.couchbase.com.zip
      https://s3.amazonaws.com/bugdb/jira/qe/collectinfo-2023-10-09T222428-ns_1%40zeus-srv-04.perf.couchbase.com.zip

      2023-10-06T18:21:55 [INFO] Monitoring XDCR queues: bucket-1

      2023-10-06T18:22:18 [INFO] xdcr_changes_left_total = 12,071,937

      ...

      2023-10-07T03:55:52 [INFO] xdcr_changes_left_total = 10,542,397

      Build timed out (after 600 minutes). Marking the build as aborted.

      The last good run we have was running on 7.6.0-1419. The run finished within 90 minutes.

      Job: https://perf.jenkins.couchbase.com/job/zeus/10975/

      Attachments

        Issue Links

          Activity

            People

              bo-chun.wang Bo-Chun Wang
              bo-chun.wang Bo-Chun Wang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty