Details
-
Bug
-
Resolution: Fixed
-
Critical
-
7.1.0
-
7.1.0-2438
-
Untriaged
-
-
1
-
Unknown
Description
Steps:
- Step 1: Create a 3 node cluster
2022-03-04 16:41:58,838 | test | INFO | pool-3-thread-26 | [task:check:474] Rebalance completed with progress: 100% in 15.0710000992 sec - Step 1*: Create a 3 node XDCR remote cluster
2022-03-04 16:42:38,523 | test | INFO | pool-3-thread-28 | [task:check:474] Rebalance completed with progress: 100% in 25.0929999352 sec - Step 2: Create required buckets and collections.
- Step 2*: Create required buckets and collections on XDCR remote.
- Step 1: Create 10000000 items sequentially
- Step 2: Update 10000000 RandonKey keys to create 50 percent fragmentation
- Step 3: Create 10000000 items sequentially
- Step 4: Update 10000000 RandonKey keys to create 50 percent fragmentation
- Step 5: Rebalance in with Loading of docs
2022-03-05 20:01:13,065 | test | INFO | pool-3-thread-22 | [task:check:474] Rebalance completed with progress: 100% in 19683.95 sec - Step 6: Rebalance Out with Loading of docs
2022-03-06 03:16:48,865 | test | INFO | pool-3-thread-19 | [task:check:474] Rebalance completed with progress: 100% in 26089.494 sec - Step 7: Rebalance In_Out with Loading of docs
2022-03-06 06:41:04,989 | test | INFO | pool-3-thread-18 | [task:check:474] Rebalance completed with progress: 100% in 12209.365 sec - Step 8: Swap with Loading of docs
2022-03-06 09:39:43,861 | test | INFO | pool-3-thread-21 | [task:check:474] Rebalance completed with progress: 100% in 10676.6900001 sec - Step 9: Failover 2 node and RebalanceOut that node with loading in parallel
- Step 10: Rebalance in with Loading of docs
2022-03-06 16:07:05,046 | test | INFO | pool-3-thread-25 | [task:check:474] Rebalance completed with progress: 100% in 14182.8969998 sec - Step 11: Failover a node and FullRecovery that node
2022-03-06 23:56:31,661 | test | INFO | pool-3-thread-26 | [task:check:474] Rebalance completed with progress: 100% in 27306.309 sec
XDCR crash is seen at 6 Mar 7:35 PM:
172.23.121.74 at 7:35:13 PM 6 Mar, 2022 |
Service 'goxdcr' exited with status 137. Restarting. Messages:
|
2022-03-06T19:34:56.819-08:00 INFO GOXDCR.PipelineMgr: Replication Status = map[4f89306bc90199f5b722458fb4c62d2b/GleamBookUsers0/GleamBookUsers0:name={4f89306bc90199f5b722458fb4c62d2b/GleamBookUsers0/GleamBookUsers0}, status={Replicating}, errors={[]}, oldProgress={All incoming nozzles have been opened}, progress={Pipeline is running}, oldBackfillProgress={Source nozzles have been closed}, backfillProgress={Pipeline has been stopped}]
|
2022-03-06T19:34:57.279-08:00 INFO GOXDCR.TopoChangeDet: TopologyChangeDetectorSvc for pipeline 4f89306bc90199f5b722458fb4c62d2b/GleamBookUsers0/GleamBookUsers0 handleTargetTopologyChange completed
|
2022-03-06T19:35:00.105-08:00 INFO GOXDCR.StatsMgr: 4f89306bc90199f5b722458fb4c62d2b/GleamBookUsers0/GleamBookUsers0 total_docs=464055518, docs_processed=237476905, changes_left=226578613
|
2022-03-06T19:35:00.941-08:00 WARN GOXDCR.ThrSeqTrackSvc: 4f89306bc90199f5b722458fb4c62d2b/GleamBookUsers0/GleamBookUsers0_ThroughSeqnoTracker GetThroughSeqnos completed after 737.439322ms
|
2022-03-06T19:35:02.445-08:00 INFO GOXDCR.TopoChangeDet: TopologyChangeDetectorSvc for pipeline 4f89306bc90199f5b722458fb4c62d2b/GleamBookUsers0/GleamBookUsers0 handleTargetTopologyChange completed
|
2022-03-06T19:35:08.124-08:00 INFO GOXDCR.StatsMgr: 4f89306bc90199f5b722458fb4c62d2b/GleamBookUsers0/GleamBookUsers0 total_docs=464055518, docs_processed=238603165, changes_left=225452353
|
Logs from src cluster where crash is seen collected at 7 Mar 00:02 AM are attached.
Current time logs are linked in the ticket.
QE Test |
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i /tmp/magma_temp_job1.ini -p bucket_storage=magma,bucket_eviction_policy=fullEviction,rerun=False -t aGoodDoctor.Hospital.Murphy.ClusterOpsVolume,nodes_init=3,graceful=True,skip_cleanup=True,num_items=10000000,num_buckets=1,bucket_names=GleamBook,doc_size=1024,bucket_type=membase,eviction_policy=fullEviction,iterations=1,batch_size=1000,sdk_timeout=60,log_level=debug,infra_log_level=debug,rerun=False,skip_cleanup=True,key_size=18,randomize_doc_size=False,randomize_value=True,assert_crashes_on_load=True,num_collections=50,maxttl=10,num_indexes=5,pc=25,index_nodes=0,xdcr_collections=50,xdcr_remote_nodes=3,cbas_nodes=0,fts_nodes=0,ops_rate=80000,ramQuota=10240,doc_ops=create:update:delete:read,rebl_ops_rate=20000,key_type=RandomKey,vbuckets=1024,mutation_perc=30,replicas=2 -m rest'
|
Attachments
Issue Links
- is duplicated by
-
MB-51384 Rebalance in of a node failed due to wait_seqno_persisted_failed.
- Closed