Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5406

ep-engine drops connections too fast when rebalancing out a node

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 2.0
    • 1.8.0
    • ns_server
    • Security Level: Public
    • None
    • Release Note

    Description

      When investigating an issue on the java client library with retrying operations based on not-my-vbucket responses, I've noticed that at the end of a rebalance removing a server, the server being removed will drop the connection while operations are in flight.

      There would be a period of time when the bucket transitions from active to dead, after the takeover, when it would only respond with not-my-vbucket replies.

      Unfortunately, the current behavior makes application code, at best, need to handle more complex failure logic. At worst, unhandled by the application it could lead to data loss.

      The challenge here is determining the period of time. Some clients do not disconnect, and there is no server polite hangup.

      The attached log demonstrates the issue, and the attached test program will let one observe it. This test was carried out by:
      1) Set up 3 node cluster with a default bucket which is of the Couchbase type
      2) Start the test program, first argument is number of seconds to run, arguments after that are hostname/ips for the nodes in the cluster
      3) Remove a node from the cluster

      Expected behavior: All operations sent to the server receive a not-my-vbucket reply and are rescheduled as we receive config updates from the server.

      Observed behavior: At the end of the remove server/rebalance cycle, the connection is dropped and in-flight operations will be canceled by the client, since it doesn't really know the status of those operations.

      Attachments

        For Gerrit Dashboard: MB-5406
        # Subject Branch Project Status CR V

        Activity

          People

            andreibaranouski Andrei Baranouski
            ingenthr Matt Ingenthron
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty