Details
Description
Steps:
1. Create a 3 node cluster
2. Create buckets and 50 collections.
3. Create 40000000 items in each collection:
Read Start: 0
|
Read End: 0
|
Update Start: 0
|
Update End: 0
|
Expiry Start: 0
|
Expiry End: 0
|
Delete Start: 0
|
Delete End: 0
|
Create Start: 0
|
Create End: 40000000
|
Final Start: 0
|
Final End: 40000000
|
4. Update 40000000 keys to create 50 percent fragmentation:
Read Start: 0
|
Read End: 0
|
Update Start: 0
|
Update End: 40000000
|
Expiry Start: 0
|
Expiry End: 0
|
Delete Start: 0
|
Delete End: 0
|
Create Start: 0
|
Create End: 0
|
Final Start: 0
|
Final End: 40000000
|
5. Create another 40000000 items:
Read Start: 0
|
Read End: 0
|
Update Start: 0
|
Update End: 0
|
Expiry Start: 0
|
Expiry End: 0
|
Delete Start: 0
|
Delete End: 0
|
Create Start: 40000000
|
Create End: 80000000
|
Final Start: 40000000
|
Final End: 80000000
|
6. Update 40000000 keys (created in step 5) to maintain 50 percent fragmentation
Read Start: 0
|
Read End: 0
|
Update Start: 40000000
|
Update End: 80000000
|
Expiry Start: 0
|
Expiry End: 0
|
Delete Start: 0
|
Delete End: 0
|
Create Start: 0
|
Create End: 0
|
Final Start: 40000000
|
Final End: 80000000
|
7. Start ASYNC load:
Read Start: 0
|
Read End: 40000000
|
Update Start: 0
|
Update End: 40000000
|
Expiry Start: 0
|
Expiry End: 0
|
Delete Start: 40000000
|
Delete End: 80000000
|
Create Start: 80000000
|
Create End: 120000000
|
Final Start: 80000000
|
Final End: 120000000
|
8. Rebalance IN with Loading of docs in step 7
9. Rebalance OUT with Loading of docs in step 7
10. Rebalance SWAP with Loading of docs in step 7
11. Rebalance IN/OUT with Loading of docs in step 7
12. Rebalance OUT/IN with Loading of docs in step 7
13. Validate all docs mutated in step 7. All is well until here.
14. Repeat the test from step 7-13. After 3 iterations add xdcr dstn cluster.
15. Repeat the test from step 7-13 with XDCR connected.
16. All rebalances passed. 2 more iterations passed.
17. Stopped the test and checked the xdcr destination cluster.
18. Data in XDCR destination cluster is more that the src cluster and it is may be due to 27B mutations pending. But, there is no more items replication running and mutation pending remain at 27B.
scr = 5 collections * 80M in each collection = 400M items
dstn = 595,084,885
19. Another observation is while there is no mutation on the scr cluster the remaining xdcr mutation are increasing on their own.
Seeing error in xdcr:
2021-10-22 02:51:57 172.23.110.67:genericPipeline.RunP2PProtocol:Execution timed out |
Attachments
Issue Links
- is caused by
-
MB-49101 XDCR - backfillSpec grows leading to set failure and stuck printing errMsg
-
- Closed
-
The XDCR process seems to be stuck printing a gigantic piece of log on node 67.
heap profile: 26846: 11892151624 [7378024: 3579106433776] @ heap/1048576
1: 3658186752 [1: 3658186752] @ 0x50542e 0x50557e 0x509e4e 0xa0b81d 0xa0b714 0x933cd6 0x933a8e 0x9da5d2 0x9da145 0x9c065f 0x9c0274 0xabb518 0xab84e5 0x471981
# 0x50542d log.(*Logger).Output+0x38d /home/couchbase/.cbdepscache/exploded/x86_64/go-1.15.8/go/src/log/log.go:177
# 0x50557d log.(*Logger).Printf+0x7d /home/couchbase/.cbdepscache/exploded/x86_64/go-1.15.8/go/src/log/log.go:188
# 0x509e4d github.com/couchbase/goxdcr/log.(*CommonLogger).logMsgf+0x12d /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/log/logger.go:170
# 0xa0b81c github.com/couchbase/goxdcr/log.(*CommonLogger).Warnf+0x27c /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/log/logger.go:189
# 0xa0b713 github.com/couchbase/goxdcr/metadata_svc.(*MetaKVMetadataSvc).set.func2+0x173 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/metakv_metadata_service.go:210
# 0x933cd5 github.com/couchbase/goxdcr/utils.(*Utilities).ExponentialBackoffExecutorWithOriginalError+0x75 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/utils/utils.go:2481
# 0x933a8d github.com/couchbase/goxdcr/utils.(*Utilities).ExponentialBackoffExecutor+0x8d /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/utils/utils.go:2469
# 0x9da5d1 github.com/couchbase/goxdcr/metadata_svc.(*MetaKVMetadataSvc).set+0x371 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/metakv_metadata_service.go:215
# 0x9da144 github.com/couchbase/goxdcr/metadata_svc.(*MetaKVMetadataSvc).Set+0xa4 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/metakv_metadata_service.go:165
# 0x9c065e github.com/couchbase/goxdcr/metadata_svc.(*BackfillReplicationService).setBackfillSpecUsingMarshalledData+0x17e /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/backfill_repl_service.go:505
# 0x9c0273 github.com/couchbase/goxdcr/metadata_svc.(*BackfillReplicationService).SetBackfillReplSpec+0x153 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/backfill_repl_service.go:490
# 0xabb517 github.com/couchbase/goxdcr/backfill_manager.(*BackfillRequestHandler).metaKvOp+0x57 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/backfill_manager/backfill_request_handler.go:682
# 0xab84e4 github.com/couchbase/goxdcr/backfill_manager.(*BackfillRequestHandler).run+0x1064 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/backfill_manager/backfill_request_handler.go:308
1: 3658186752 [1: 3658186752] @ 0x50542e 0x50557e 0x509e4e 0x9da73b 0x9da63b 0x9da145 0x9c065f 0x9c0274 0xabb518 0xab84e5 0x471981
# 0x50542d log.(*Logger).Output+0x38d /home/couchbase/.cbdepscache/exploded/x86_64/go-1.15.8/go/src/log/log.go:177
# 0x50557d log.(*Logger).Printf+0x7d /home/couchbase/.cbdepscache/exploded/x86_64/go-1.15.8/go/src/log/log.go:188
# 0x509e4d github.com/couchbase/goxdcr/log.(*CommonLogger).logMsgf+0x12d /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/log/logger.go:170
# 0x9da73a github.com/couchbase/goxdcr/log.(*CommonLogger).Errorf+0x4da /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/log/logger.go:185
# 0x9da63a github.com/couchbase/goxdcr/metadata_svc.(*MetaKVMetadataSvc).set+0x3da /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/metakv_metadata_service.go:220
# 0x9da144 github.com/couchbase/goxdcr/metadata_svc.(*MetaKVMetadataSvc).Set+0xa4 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/metakv_metadata_service.go:165
# 0x9c065e github.com/couchbase/goxdcr/metadata_svc.(*BackfillReplicationService).setBackfillSpecUsingMarshalledData+0x17e /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/backfill_repl_service.go:505
# 0x9c0273 github.com/couchbase/goxdcr/metadata_svc.(*BackfillReplicationService).SetBackfillReplSpec+0x153 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/metadata_svc/backfill_repl_service.go:490
# 0xabb517 github.com/couchbase/goxdcr/backfill_manager.(*BackfillRequestHandler).metaKvOp+0x57 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/backfill_manager/backfill_request_handler.go:682
# 0xab84e4 github.com/couchbase/goxdcr/backfill_manager.(*BackfillRequestHandler).run+0x1064 /home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/goxdcr/backfill_manager/backfill_request_handler.go:308
I can’t even VIM the file.
It looks like this:
==============================================================================
couchbase logs (goxdcr.log)
cbbrowse_logs goxdcr.log
==============================================================================
108 108 44 110 117 108 108 44 110 117 108 108 44 110 117 108 108 44 110 117 108 108 44 110 117 108 108 44 110 117 108 108 44 123 34 84 105 109 101 115 116 97 109 112 115 34 58 123 34 83 116 97 114 116 105 110 103 84 105 109 101 115 116 97 109 112 34 58 123 34 86 98 110
111 34 58 56 54 49 44 34 86 98 117 117 105 100 34 58 48 44 34 83 101 113 110 111 34 58 51 50 55 56 49 55 49 51 44 34 83 110 97 112 115 104 111 116 83 116 97 114 116 34 58 48 44 34 83 110 97 112 115 104 111 116 69 110 100 34 58 48 44 34 77 97 110 105 102 101 115 116 73 6
8 115 34 58 123 34 83 111 117 114 99 101 77 97 110 105 102 101 115 116 73 100 34 58 48 44 34 84 97 114 103 101 116 77 97 110 105 102 101 115 116 73 100 34 58 48 125 125 44 34 69 110 100 105 110 103 84 105 109 101 115 116 97 109 112 34 58 123 34 86 98 110 111 34 58 56 54
49 44 34 86 98 117 117 105 100 34 58 48 44 34 83 101 113 110 111 34 58 51 52 54 57 51 50 54 51 44 34 83 110 97 112 115 104 111 116 83 116 97 114 116 34 58 48 44 34 83 110 97 112 115 104 111 116 69 110 100 34 58 48 44 34 77 97 110 105 102 101 115 116 73 68 115 34 58 123
34 83 111 117 114 99 101 77 97 110 105 102 101 115 116 73 100 34 58 48 44 34 84 97 114 103 101 116 77 97 110 105 102 101 115 116 73 100 34 58 48 125 125 125 44 34 82 101 113 117 101 115 116 101 100 67 111 108 108 101 99 116 105 111 110 115 83 104 97 115 34 58 91 34 51
56 49 53 98 50 55 51 100 102 102 102 50 53 48 102 53 50 50 55 53 100 49 48 50 53 97 56 57 54 55 52 49 49 49 97 56 100 54 48 53 101 99 49 56 102 50 100 50 100 98 97 53 98 52 99 102 55 97 101 55 101 55 54 34 93 125 44 110 117 108 108 44 110 117 108 108 44 110 117 108 108
44 110 117 108 108 44 110 117 108 108 44 110 117 108 108 44 110 117 108 108 44 110 117 108 108 44 110 117 108 108 44 123 34 84 105 109 101 115 116 97 109 112 115 34 58 123 34 83 116 97 114 116 105 110 103 84 105 109 101 115 116 97 109 112 34 58 123 34 86 98 110 111 34 5
8 56 54 49 44 34 86 98 117 117 105 100 34 58 48 44 34 83 101 113 110 111 34 58 51 50 55 56 49 55 49 51 44 34 83 110 97 112 115 104 111 116 83 116 97 114 116 34 58 48 44 34 83 110 97 112 115 104 111 116 69 110 100 34 58 48 44 34 77 97 110 105 102 101 115 116 73 68 115 34
58 123 34 83 111 117 114 99 101 77 97 110 105 102 101 115 116 73 100 34 58 48 44 34 84 97 114 103 101 116 77 97 110 105 102 101 115 116 73 100 34 58 48 125 125 44 34 69 110 100 105 110 103 84 105 109 101 115 116 97 109 112 34 58 123 34 86 98 110 111 34 58 56 54 49 44 3
4 86 98 117 117 105 100 34 58 48 44 34 83 101 113 110 111 34 58 51 52 54 57 51 50 54 51 44 34 83 110 97 112 115 104 111 116 83 116 97 114 116 34 58 48 44 34 83 110 97 112 115 104 111 116 69 110 100 34 58 48 44 34 77 97 110 105 102 101 115 116 73 68 115 34 58 123 34 83 1
11 117 114 99 101 77 97 110 105 102 101 115 116 73 100 34 58 48 44 34 84 97 114 103 101 116 77 97 110 105 102 101 115 116 73 100 34 58 48 125 125 125 44 34 82 101 113 117 101 115 116 101 100 67 111 108 108 101 99 116 105 111 110 115 83 104 97 115 34 58 91 34 51 56 49 53
98 50 55 51 100 102 102 102 50 53 48 102 53 50 50 55 53 100 49 48 50 53 97 56 57 54 55 52 49 49 49 97 56 100 54 48 53 101 99 49 56 102 50 100 50 100 98 97 53 98 52 99 102
…
The line It looks like metakv.Set failed and it’s printing the values out…
meta_svc.logger.Warnf("metakv.Set failed. key=%v, value=%v, err=%v\n", key, valueToPrint, err)
I’ll tag a MB to investigate. In the meantime please turn off P2P with preReplicateVBMasterCheck set to false.