Fixed
Pinned fields
Click on the next to a field label to start pinning.
Details
Assignee
Tor ColvinTor ColvinReporter
Tor ColvinTor ColvinLabels
Story Points
8Components
Fix versions
Priority
MajorInstabug
Open Instabug
Details
Details
Assignee
Tor Colvin
Tor ColvinReporter
Tor Colvin
Tor ColvinLabels
Story Points
8
Components
Fix versions
Priority
Instabug
Open Instabug
PagerDuty
PagerDuty
PagerDuty
Sentry
Sentry
Sentry
Zendesk Support
Zendesk Support
Zendesk Support
Created February 13, 2024 at 2:04 AM
Updated September 27, 2024 at 3:19 PM
Resolved February 14, 2024 at 10:37 PM
The affected tests are E2E and SGReplicate Multicluster(Blackholepuller/Newdocpusher) tests: http://showfast.sc.couchbase.com/#/timeline/Linux/syncgateway/sgreplicate/Multi-cluster
Tested on SG 3.2.0-242 and 4.0.0-3. SG version doesn't affect throughput, and only server version causes regression.
In summary: 7.6.0 uses more CPU and memory for SG and more goroutines, but waits more. It also has higher CPU usage for Beam.smp and Indexer. "base.(*Collection).WriteUpdateWithXattr" takes a lot more CPU time.
7.2.2 has higher server memcached usage, higher SG heapinuse, while the average queue size is close to 0.
Looking at these tests:
SG 3.2.0-242, CB 7.6.0-1887: https://perf.jenkins.couchbase.com/job/sg_hebe_sgreplicate_multicluster/2732/consoleFull. Artifacts: https://perf.jenkins.couchbase.com/job/sg_hebe_sgreplicate_multicluster/2732/artifact/. Throughput: 68,767 docs pushed/sec
SG 3.2.0-242, CB 7.2.2-6401: https://perf.jenkins.couchbase.com/job/sg_hebe_sgreplicate_multicluster/2719/consoleFull. Artiffacts: https://perf.jenkins.couchbase.com/job/sg_hebe_sgreplicate_multicluster/2719/artifact/. Throughput 83,010 docs pushed/sec
Cbmonitor comparison: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hebe_320-242_run_bp_test_333f&label=7.6.0&snapshot=hebe_320-242_run_bp_test_bc6b&label=7.2.2
Differences between the runs, on the SG side:
CPU utilization is up by almost 30% in the test with 7.6.0: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hebe_320-242_run_bp_test_333f&label=7.6.0&snapshot=hebe_320-242_run_bp_test_bc6b&label=7.2.2#034ba0a7d72a19ba5b487943020edc6c
SG memory usage is slightly higher for 7.6.0: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hebe_320-242_run_bp_test_333f&label=7.6.0&snapshot=hebe_320-242_run_bp_test_bc6b&label=7.2.2#ea232aa23dc9bc563c7feb6f02df365d
The number of goroutines and goroutines high watermark is up by about 10% for 7.6.0: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hebe_320-242_run_bp_test_333f&label=7.6.0&snapshot=hebe_320-242_run_bp_test_bc6b&label=7.2.2#2acbcb0023d0e2f7e48f1b2ba7077262
Heapalloc and heapinuse are slightly higher for 7.2.2, but heapidle is higher for 7.6.0: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hebe_320-242_run_bp_test_333f&label=7.6.0&snapshot=hebe_320-242_run_bp_test_bc6b&label=7.2.2#8d9cc6ea168d85997db4d06b82dc85d9
Pausetotalns is about 4 times higher for 7.6.0: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hebe_320-242_run_bp_test_333f&label=7.6.0&snapshot=hebe_320-242_run_bp_test_bc6b&label=7.2.2#1c48bfbf6aead72445f25cae23d87e3a
On the server side:
Beam.smp RSS is higher by about 10% for 7.2.2: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hebe_320-242_run_bp_test_333f&label=7.6.0&snapshot=hebe_320-242_run_bp_test_bc6b&label=7.2.2#f50f2c78d4dcb8696d90ffe49958d25b, but Beam.smp CPU is higher for 7.6.0 http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hebe_320-242_run_bp_test_333f&label=7.6.0&snapshot=hebe_320-242_run_bp_test_bc6b&label=7.2.2#fdf2ee304164bba7f03f0e591ce3961d
Something similar seems to be happening for Indexer, where CPU is higher for 7.6.0: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hebe_320-242_run_bp_test_333f&label=7.6.0&snapshot=hebe_320-242_run_bp_test_bc6b&label=7.2.2#dd4c380f69814327c80a1dafb5a9e4c4
For memcached, CPU usage is higher in 7.2.2: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hebe_320-242_run_bp_test_333f&label=7.6.0&snapshot=hebe_320-242_run_bp_test_bc6b&label=7.2.2#3ce41eff4117be874f32415e764956f6
Another interesting datapoint is data_avgqusz, which is close to 0 at all points for 7.2.2, but is consistently higher for 7.6.0: http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hebe_320-242_run_bp_test_333f&label=7.6.0&snapshot=hebe_320-242_run_bp_test_bc6b&label=7.2.2#47265a3b63dbdc6f140385c6911739fd
I also tried looking at 2 sg_cpu profiles:
7.2.2: https://perf.jenkins.couchbase.com/job/sg_hebe_sgreplicate_multicluster/2719/artifact/172.23.100.205_syncgateway_sg_cpu_231209024220_1c6b69.pprof
7.6.0: https://perf.jenkins.couchbase.com/job/sg_hebe_sgreplicate_multicluster/2732/artifact/172.23.100.205_syncgateway_sg_cpu_231213113212_835b76.pprof
The main seems to be that CPU time for "go-blip.(*Message).asyncRead.func1" is higher for 7.6.0 (35.82%) compared to 7.2.2 (21.73%). This increase seems to mainly come from "base.(*Collection).WriteUpdateWithXattr", which takes 26.80% of CPU time, compared to 15.87% for 7.2.2
Comparing 2 sg_mutex profiles:
7.2.2: https://perf.jenkins.couchbase.com/job/sg_hebe_sgreplicate_multicluster/2719/artifact/172.23.100.205_syncgateway_sg_mutex_231209024119_0c97d8.pprof
7.6.0: https://perf.jenkins.couchbase.com/job/sg_hebe_sgreplicate_multicluster/2732/artifact/172.23.100.205_syncgateway_sg_mutex_231213113110_00b70e.pprof
"cbgt.(*Manager).JanitorLoop" takes a lot longer for 7.6.0 (25.59%) compared to 7.2.2 (1.47%). Similarly, "gocbcore.(*memdClient).run.func2" is higher for 7.6.0 (11.93%) compared to 7.2.2 (1.14%)
I also looked at sg_block, sg_heap and goroutines profiles, but there weren't any big differences between the profiles.
I ran a couple more tests to get server cbcollect_info. Linking them below
7.2.2: https://perf.jenkins.couchbase.com/job/sg_hebe_sgreplicate_multicluster/2735/console
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-sg_hebe_sgreplicate_multicluster-2735/172.23.100.190.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-sg_hebe_sgreplicate_multicluster-2735/172.23.100.191.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-sg_hebe_sgreplicate_multicluster-2735/172.23.100.192.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-sg_hebe_sgreplicate_multicluster-2735/172.23.100.193.zip
7.6.0: https://perf.jenkins.couchbase.com/job/sg_hebe_sgreplicate_multicluster/2736/console
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-sg_hebe_sgreplicate_multicluster-2736/172.23.100.190.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-sg_hebe_sgreplicate_multicluster-2736/172.23.100.191.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-sg_hebe_sgreplicate_multicluster-2736/172.23.100.192.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-sg_hebe_sgreplicate_multicluster-2736/172.23.100.193.zip