Details
-
Task
-
Resolution: Fixed
-
Major
-
2.0
-
Security Level: Public
-
Should be any environment, but:
c1.xlarge
us-east-1c
2 nodes, one client, one server
RightImage_CentOS_5.8_x64_v5.8.8.3_EBS (ami-100e8a79)
Description
System under test:
c1.xlarge
us-east-1c
2 nodes, one client, one server
RightImage_CentOS_5.8_x64_v5.8.8.3_EBS (ami-100e8a79)
Baseline:
iperf regularly shows 943Mbits/s throughput possible between these two systems. That's 117MByte/s possible.
I carried out three tests:
Test #1 Workload generator with the .properties slightly modified to use 32 client objects and simulate 128 users, 1KByte document size
Test #1 Same, with just the document size modified to 1MByte
Test #3 Same, with the document size modified to 128Byte
(various settings for number of documents to get sufficiently large runtimes)
Test #1 on ec2 63MByte/s at peak. One server, one client. Interesting there is the packets received are 1.1-1.4x the number sent. No mpstat data was captured for this.
Test #2 on ec2 was 112MByte/s at peak. One server, one client. The number of packets are 1/100th that in the previous version, but the number of packets received is 2x the number of packets sent. This is particularly interesting since at peak, the throughput is 112MByte/s sending and 421KByte/s receiving. In that time window, it's receiving 6453 packets and sending 3336. The test is sending 1M documents and it's actually receiving more packets in acknowledgement.
Test #3 on ec2 was a mere 22MByte/s at peak. One server, one client still. In this test, the number of packets are regularly 1.6-1.8x the number of packets sent. This test also has mpstat data, which shows interrupt processing not quite consuming core 0. I suspect it actually is consuming that core, but the mpstat resolution is sloppy enough that we can't really tell.
Data attached.
Note that the client will see the CPU usage if the server is too "noisy". We should investigate improvements to our binary protocol implementation at the server (this has been a known problem) or allowing for TCP_NODELAY to be tuneable.
I want to credit Mark Nunberg and his recent performance tests for really finding and validating this. We'll be reviewing this in our meeting next week on the topic.