Uploaded image for project: 'Couchbase C client library libcouchbase'
  1. Couchbase C client library libcouchbase
  2. CCBC-627

Poll regularly for config updates

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.5.1
    • Fix Version/s: 2.7.5
    • Component/s: None
    • Security Level: Public
    • Environment:
      build 3508 running cbc-n1qlback

      Description

      start a cluster with 2 query nodes
      start cbc-n1qlback with some query
      add a new query node to the cluster and rebalance
      observe the request/sec per node

      expected: topology changes should eb automatically picked up by the clients. after rebalance the new query node needs to be part of the round robin requests being sent to the cluster. however new node does not start taking traffic even after a long wait.

      if the load is stopped and restarted, the requests do go to the newly added node as well. However this means topology changes would require a restart of the app servers. that does cause admin overhead and possibly failed requests for the app and downtime.

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            Hide
            mnunberg Mark Nunberg (Inactive) added a comment - - edited

            Polling shouldn't be difficult to add to the C library. Please file a bug if you think this is the correct solution (rather than just using the streaming config). This will add ~8k of traffic every 10 seconds or so per client instance.

            Show
            mnunberg Mark Nunberg (Inactive) added a comment - - edited Polling shouldn't be difficult to add to the C library. Please file a bug if you think this is the correct solution (rather than just using the streaming config). This will add ~8k of traffic every 10 seconds or so per client instance.
            Hide
            ingenthr Matt Ingenthron added a comment -

            The Java and .NET polling is via Carrier Publication, not HTTP. Thus, it does not bother port 8091 at all. The backstop is 10s IIRC on Java. I remember this first coming up on PHP where a user had a views only workload back in the 2.0 days.

            Show
            ingenthr Matt Ingenthron added a comment - The Java and .NET polling is via Carrier Publication, not HTTP. Thus, it does not bother port 8091 at all. The backstop is 10s IIRC on Java. I remember this first coming up on PHP where a user had a views only workload back in the 2.0 days.
            Hide
            mnunberg Mark Nunberg (Inactive) added a comment - - edited

            IIRC the issue with PHP was that it wasn't detecting when a node was removed, and would return with errors (non-200 HTTP return codes) when contacting that node. This was fixed by having LCB taking any view API request with a non-200 return code as a cue to refresh the config.

            The issue in this ticket however is that lcb is failing to take advantage of a new node added to the cluster: no errors are being returned, but existing instances aren't managing to take advantage of the newly added node.

            Show
            mnunberg Mark Nunberg (Inactive) added a comment - - edited IIRC the issue with PHP was that it wasn't detecting when a node was removed , and would return with errors (non-200 HTTP return codes) when contacting that node. This was fixed by having LCB taking any view API request with a non-200 return code as a cue to refresh the config. The issue in this ticket however is that lcb is failing to take advantage of a new node added to the cluster: no errors are being returned, but existing instances aren't managing to take advantage of the newly added node.
            Hide
            ingenthr Matt Ingenthron added a comment -

            I seem to remember solving for both cases though Mark. Let's see what Brett's comments are on expected behavior.

            Show
            ingenthr Matt Ingenthron added a comment - I seem to remember solving for both cases though Mark. Let's see what Brett's comments are on expected behavior.
            Hide
            mnunberg Mark Nunberg (Inactive) added a comment -

            Resurrecting this for fast-failover.

            Show
            mnunberg Mark Nunberg (Inactive) added a comment - Resurrecting this for fast-failover.

              People

              • Assignee:
                mnunberg Mark Nunberg (Inactive)
                Reporter:
                cihan Cihan Biyikoglu (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty

                    Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.