The changes between the builds include
- Remove staged transaction XATTR and replicate data, and add a switch to turn off this behavior
- Enforce TLS
- A bug fix for compressed binary data
- Coordinate calls to pools/default endpoint
- An file descriptor leak fix
I don't see any change that could cause this performance drop.
Looking at each node:
- 105 reached 0 for changes_left at 2022-01-08T16:03:10.405
- 106 reached 0 for changes_left at 2022-01-08T15:59:26.433
- 107 reached 0 for changes_left at 2022-01-08T15:58:32.535
- 108 reached 0 for changes_left at 2022-01-08T15:59:52.491
- 109 reached 0 for changes_left at 2022-01-08T15:59:51.557
Node 105 takes about 4 minutes more than the other nodes to finish replication. That would account for the performance regression.
There is no error or warning in the logs for all nodes. Replication started quickly on all nodes.
In node 105, there is big drop in "rate received from DCP" around last 5 minutes of the test and then recovered (see attachment). That does not happen in other nodes of the same test, neither in the same node of the previous test. However, I am not able to find the reason for this drop yet.
Changes between the 2 builds for XDCR
http://changelog.build.couchbase.com/?product=couchbase-server&fromVersion=6.6.3&fromBuild=9808&toVersion=6.6.5&toBuild=10068&f_analytics-dcp-client=off&f_asterixdb=off&f_backup=off&f_cbas-core=off&f_cbft=off&f_cbgt=off&f_couchbase-cli=off&f_couchdb=off&f_eventing=off&f_go-couchbase=off&f_go_json=off&f_goutils=off&f_goxdcr=on&f_indexing=off&f_kv_engine=off&f_n1fty=off&f_ns_server=off&f_query=off&f_testrunner=off&f_tlm=off&f_voltron=off