Uploaded image for project: 'Couchbase C client library libcouchbase'
  1. Couchbase C client library libcouchbase
  2. CCBC-627

Poll regularly for config updates

    XMLWordPrintable

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.5.1
    • 2.7.5
    • None
    • Security Level: Public
    • build 3508 running cbc-n1qlback

    Description

      start a cluster with 2 query nodes
      start cbc-n1qlback with some query
      add a new query node to the cluster and rebalance
      observe the request/sec per node

      expected: topology changes should eb automatically picked up by the clients. after rebalance the new query node needs to be part of the round robin requests being sent to the cluster. however new node does not start taking traffic even after a long wait.

      if the load is stopped and restarted, the requests do go to the newly added node as well. However this means topology changes would require a restart of the app servers. that does cause admin overhead and possibly failed requests for the app and downtime.

      Attachments

        Issue Links

          For Gerrit Dashboard: CCBC-627
          # Subject Branch Project Status CR V

          Activity

            Resurrecting this for fast-failover.

            mnunberg Mark Nunberg (Inactive) added a comment - Resurrecting this for fast-failover.

            I seem to remember solving for both cases though Mark. Let's see what Brett's comments are on expected behavior.

            ingenthr Matt Ingenthron added a comment - I seem to remember solving for both cases though Mark. Let's see what Brett's comments are on expected behavior.
            mnunberg Mark Nunberg (Inactive) added a comment - - edited

            IIRC the issue with PHP was that it wasn't detecting when a node was removed, and would return with errors (non-200 HTTP return codes) when contacting that node. This was fixed by having LCB taking any view API request with a non-200 return code as a cue to refresh the config.

            The issue in this ticket however is that lcb is failing to take advantage of a new node added to the cluster: no errors are being returned, but existing instances aren't managing to take advantage of the newly added node.

            mnunberg Mark Nunberg (Inactive) added a comment - - edited IIRC the issue with PHP was that it wasn't detecting when a node was removed , and would return with errors (non-200 HTTP return codes) when contacting that node. This was fixed by having LCB taking any view API request with a non-200 return code as a cue to refresh the config. The issue in this ticket however is that lcb is failing to take advantage of a new node added to the cluster: no errors are being returned, but existing instances aren't managing to take advantage of the newly added node.

            The Java and .NET polling is via Carrier Publication, not HTTP. Thus, it does not bother port 8091 at all. The backstop is 10s IIRC on Java. I remember this first coming up on PHP where a user had a views only workload back in the 2.0 days.

            ingenthr Matt Ingenthron added a comment - The Java and .NET polling is via Carrier Publication, not HTTP. Thus, it does not bother port 8091 at all. The backstop is 10s IIRC on Java. I remember this first coming up on PHP where a user had a views only workload back in the 2.0 days.
            mnunberg Mark Nunberg (Inactive) added a comment - - edited

            Polling shouldn't be difficult to add to the C library. Please file a bug if you think this is the correct solution (rather than just using the streaming config). This will add ~8k of traffic every 10 seconds or so per client instance.

            mnunberg Mark Nunberg (Inactive) added a comment - - edited Polling shouldn't be difficult to add to the C library. Please file a bug if you think this is the correct solution (rather than just using the streaming config). This will add ~8k of traffic every 10 seconds or so per client instance.

            People

              mnunberg Mark Nunberg (Inactive)
              cihan Cihan Biyikoglu (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty