Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7912

couchbase servers sends memcached protocol responses in far too many pieces

    XMLWordPrintable

Details

    • Task
    • Resolution: Fixed
    • Major
    • 2.2.0
    • 2.0
    • Security Level: Public
    • Should be any environment, but:
      c1.xlarge
      us-east-1c
      2 nodes, one client, one server
      RightImage_CentOS_5.8_x64_v5.8.8.3_EBS (ami-100e8a79)

    Description

      System under test:
      c1.xlarge
      us-east-1c
      2 nodes, one client, one server
      RightImage_CentOS_5.8_x64_v5.8.8.3_EBS (ami-100e8a79)

      Baseline:
      iperf regularly shows 943Mbits/s throughput possible between these two systems. That's 117MByte/s possible.

      I carried out three tests:
      Test #1 Workload generator with the .properties slightly modified to use 32 client objects and simulate 128 users, 1KByte document size
      Test #1 Same, with just the document size modified to 1MByte
      Test #3 Same, with the document size modified to 128Byte
      (various settings for number of documents to get sufficiently large runtimes)

      Test #1 on ec2 63MByte/s at peak. One server, one client. Interesting there is the packets received are 1.1-1.4x the number sent. No mpstat data was captured for this.

      Test #2 on ec2 was 112MByte/s at peak. One server, one client. The number of packets are 1/100th that in the previous version, but the number of packets received is 2x the number of packets sent. This is particularly interesting since at peak, the throughput is 112MByte/s sending and 421KByte/s receiving. In that time window, it's receiving 6453 packets and sending 3336. The test is sending 1M documents and it's actually receiving more packets in acknowledgement.

      Test #3 on ec2 was a mere 22MByte/s at peak. One server, one client still. In this test, the number of packets are regularly 1.6-1.8x the number of packets sent. This test also has mpstat data, which shows interrupt processing not quite consuming core 0. I suspect it actually is consuming that core, but the mpstat resolution is sloppy enough that we can't really tell.

      Data attached.

      Note that the client will see the CPU usage if the server is too "noisy". We should investigate improvements to our binary protocol implementation at the server (this has been a known problem) or allowing for TCP_NODELAY to be tuneable.

      I want to credit Mark Nunberg and his recent performance tests for really finding and validating this. We'll be reviewing this in our meeting next week on the topic.

      Attachments

        1. 128btest-2_mpstat_-p_ALL_test.txt
          189 kB
        2. 128btest-2.txt
          23 kB
        3. 1ktest.txt
          5 kB
        4. 1mtest.txt
          7 kB
        5. notes.txt
          1 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            trond Trond Norbye
            ingenthr Matt Ingenthron
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty