Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19955

Fast Failover: Decrease the default NOOP interval from 180 seconds to 1 second

    XMLWordPrintable

Details

    Description

      For the fast failover project, plan is to spy on the dcp_proxy traffic in ns_server and track liveliness of nodes communicating over it.

      In absence of mutations or other DCP messages, DCP NOOP messages will indicate liveliness of the nodes. For faster failure detection, we need NOOP messages at higher frequency. The default NOOP interval should be decreased from 180 seconds to 1 second.

      Related to this dependency are following sub-items:

      1. During tests, it was seen that DCP NOOP is sent only every 6 secs even when interval is set to 1 sec.
      I added some log messages to get more info and they showed a pattern - DcpProducer::step() (which calls maybeSendNoop()) gets called 3 times within a second every 6 seconds. As a result, the NOOP is sent only every 6 seconds.

      2. Modify the computation for dead connection detection:
      The consumer assumes the connection is dead if it has not seen any messages for 2 * noop_interval. It then disconnects the connection.
      With NOOP interval of 1 second, the consumer will disconnect the connection when no messages from producer for 2 seconds. This seems aggressive. But, leaving it to KV folks to decide whether the existing computation needs to be modified or is good enough.

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            poonam Poonam Dhavale
            poonam Poonam Dhavale
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              PagerDuty