Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7912

couchbase servers sends memcached protocol responses in far too many pieces

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.2.0
    • Security Level: Public
    • Environment:
      Should be any environment, but:
      c1.xlarge
      us-east-1c
      2 nodes, one client, one server
      RightImage_CentOS_5.8_x64_v5.8.8.3_EBS (ami-100e8a79)

      Description

      System under test:
      c1.xlarge
      us-east-1c
      2 nodes, one client, one server
      RightImage_CentOS_5.8_x64_v5.8.8.3_EBS (ami-100e8a79)

      Baseline:
      iperf regularly shows 943Mbits/s throughput possible between these two systems. That's 117MByte/s possible.

      I carried out three tests:
      Test #1 Workload generator with the .properties slightly modified to use 32 client objects and simulate 128 users, 1KByte document size
      Test #1 Same, with just the document size modified to 1MByte
      Test #3 Same, with the document size modified to 128Byte
      (various settings for number of documents to get sufficiently large runtimes)

      Test #1 on ec2 63MByte/s at peak. One server, one client. Interesting there is the packets received are 1.1-1.4x the number sent. No mpstat data was captured for this.

      Test #2 on ec2 was 112MByte/s at peak. One server, one client. The number of packets are 1/100th that in the previous version, but the number of packets received is 2x the number of packets sent. This is particularly interesting since at peak, the throughput is 112MByte/s sending and 421KByte/s receiving. In that time window, it's receiving 6453 packets and sending 3336. The test is sending 1M documents and it's actually receiving more packets in acknowledgement.

      Test #3 on ec2 was a mere 22MByte/s at peak. One server, one client still. In this test, the number of packets are regularly 1.6-1.8x the number of packets sent. This test also has mpstat data, which shows interrupt processing not quite consuming core 0. I suspect it actually is consuming that core, but the mpstat resolution is sloppy enough that we can't really tell.

      Data attached.

      Note that the client will see the CPU usage if the server is too "noisy". We should investigate improvements to our binary protocol implementation at the server (this has been a known problem) or allowing for TCP_NODELAY to be tuneable.

      I want to credit Mark Nunberg and his recent performance tests for really finding and validating this. We'll be reviewing this in our meeting next week on the topic.

      1. 128btest-2_mpstat_-p_ALL_test.txt
        189 kB
        Matt Ingenthron
      2. 128btest-2.txt
        23 kB
        Matt Ingenthron
      3. 1ktest.txt
        5 kB
        Matt Ingenthron
      4. 1mtest.txt
        7 kB
        Matt Ingenthron
      5. notes.txt
        1 kB
        Matt Ingenthron
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        ingenthr Matt Ingenthron added a comment -

        Lines worthy of investigation:
        $ git grep TCP_NODELAY
        ...
        daemon/memcached.c: error = setsockopt(sfd, IPPROTO_TCP, TCP_NODELAY, (void *)&flags, sizeof(flags));
        daemon/memcached.c: "setsockopt(TCP_NODELAY): %s",
        daemon/thread.c: TCP_NODELAY, (void *)&flags, sizeof(flags));

        Show
        ingenthr Matt Ingenthron added a comment - Lines worthy of investigation: $ git grep TCP_NODELAY ... daemon/memcached.c: error = setsockopt(sfd, IPPROTO_TCP, TCP_NODELAY, (void *)&flags, sizeof(flags)); daemon/memcached.c: "setsockopt(TCP_NODELAY): %s", daemon/thread.c: TCP_NODELAY, (void *)&flags, sizeof(flags));
        Hide
        ingenthr Matt Ingenthron added a comment -

        To be clear, I don't think turning on TCP_NODELAY is necessarily the solution here. I believe we're writing the responses in multiple small pieces, as this is something that had been discussed previously in the memcached list. If either memcahced or the engine were to buffer the response until the complete response were available (or enough for the client to act on it) then we'd probably be better off.

        That said, having TCP_NODELAY a tuneable may make sense. Some deployments care more about latency, some care more about throughput. We can't say for certain in advance.

        It'd only make sense alongside other changes, because writing a partial response that a client can't do anything with isn't very useful.

        Show
        ingenthr Matt Ingenthron added a comment - To be clear, I don't think turning on TCP_NODELAY is necessarily the solution here. I believe we're writing the responses in multiple small pieces, as this is something that had been discussed previously in the memcached list. If either memcahced or the engine were to buffer the response until the complete response were available (or enough for the client to act on it) then we'd probably be better off. That said, having TCP_NODELAY a tuneable may make sense. Some deployments care more about latency, some care more about throughput. We can't say for certain in advance. It'd only make sense alongside other changes, because writing a partial response that a client can't do anything with isn't very useful.
        Hide
        ingenthr Matt Ingenthron added a comment -

        Trond: you have a patch which may help here. Can you post it for review?

        Show
        ingenthr Matt Ingenthron added a comment - Trond: you have a patch which may help here. Can you post it for review?
        Hide
        ingenthr Matt Ingenthron added a comment -

        Trond: any update here?

        I have a project for which I may need to test this change.

        Show
        ingenthr Matt Ingenthron added a comment - Trond: any update here? I have a project for which I may need to test this change.
        Hide
        trond Trond Norbye added a comment -

        2.1 and 2.0.2 have the patch that disables it, but it is a hassle to enable (which is why I wanted the ioctl call )

        Show
        trond Trond Norbye added a comment - 2.1 and 2.0.2 have the patch that disables it, but it is a hassle to enable (which is why I wanted the ioctl call )
        Hide
        ingenthr Matt Ingenthron added a comment -

        So we've just disabled NODELAY? That seems concerning a bit since behavior may change.

        I guess for my project, I need to get 2.0.1, change it and then build?

        Show
        ingenthr Matt Ingenthron added a comment - So we've just disabled NODELAY? That seems concerning a bit since behavior may change. I guess for my project, I need to get 2.0.1, change it and then build?
        Hide
        trond Trond Norbye added a comment -

        The following piece of code exists in memcached-server and allows you to tune its behavior:

        settings.tcp_nodelay = getenv("MEMCACHED_DISABLE_TCP_NODELAY") == NULL;

        Please file another bug to make it set'able from the UI if that's desired.

        Please reopen the bug (or create another bug referring this one) if you want additional work done.

        Show
        trond Trond Norbye added a comment - The following piece of code exists in memcached-server and allows you to tune its behavior: settings.tcp_nodelay = getenv("MEMCACHED_DISABLE_TCP_NODELAY") == NULL; Please file another bug to make it set'able from the UI if that's desired. Please reopen the bug (or create another bug referring this one) if you want additional work done.

          People

          • Assignee:
            trond Trond Norbye
            Reporter:
            ingenthr Matt Ingenthron
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes