Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19955

Fast Failover: Decrease the default NOOP interval from 180 seconds to 1 second




      For the fast failover project, plan is to spy on the dcp_proxy traffic in ns_server and track liveliness of nodes communicating over it.

      In absence of mutations or other DCP messages, DCP NOOP messages will indicate liveliness of the nodes. For faster failure detection, we need NOOP messages at higher frequency. The default NOOP interval should be decreased from 180 seconds to 1 second.

      Related to this dependency are following sub-items:

      1. During tests, it was seen that DCP NOOP is sent only every 6 secs even when interval is set to 1 sec.
      I added some log messages to get more info and they showed a pattern - DcpProducer::step() (which calls maybeSendNoop()) gets called 3 times within a second every 6 seconds. As a result, the NOOP is sent only every 6 seconds.

      2. Modify the computation for dead connection detection:
      The consumer assumes the connection is dead if it has not seen any messages for 2 * noop_interval. It then disconnects the connection.
      With NOOP interval of 1 second, the consumer will disconnect the connection when no messages from producer for 2 seconds. This seems aggressive. But, leaving it to KV folks to decide whether the existing computation needs to be modified or is good enough.


        For Gerrit Dashboard: MB-19955
        # Subject Branch Project Status CR V



            poonam Poonam Dhavale
            poonam Poonam Dhavale
            0 Vote for this issue
            5 Start watching this issue