Uploaded image for project: 'Couchbase C client library libcouchbase'
  1. Couchbase C client library libcouchbase
  2. CCBC-801

add a function that executes a cluster healthcheck for keepalive purposes

    XMLWordPrintable

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.7.5
    • Fix Version/s: 2.8.4
    • Component/s: docs, library
    • Labels:
      None
    • Sprint:
      SDK 45: IPv6, HC, SDK 47: HC, Log Redact, SDK49: HC, Log Reda, CertAuth

      Description

      In some deployments, particularly cloud deployments that may have network setups that are beyond the user's control (ex.: Azure), a connection may be terminated after an amount of idle time.  Since it may be terminated without actually sending a FIN, recovery can be troublesome.

      Users have occasionally implemented a regular ping of a 'health check' by retrieving a single document.  The problem with that is that the single document does not check the health of all connections for a cluster.  

      The request here is to add a health check function that would, for the current configuration and all existing open connections and all services, dispatch a NOOP request and verify the response.

      If a connection is unhealthy (i.e., no response is received after a timeout), return that in the health check response.  Optionally, schedule a reconnection and even drive that reconnection.

      I think the signature here should probably be along the lines of:

      request: health_check(void)

      response => 

      {"services": {"kv": [{"10.1.2.3": true}, {"10.1.2.4": false}],  "query": [{"10.1.2.3": true}]} "details": {"kv": [{"10.1.2.4": "NOOP (0x69) operation timed out after 2500µsec"}]}}

      …but I'm flexible on this and it should be reviewed with others.  My thought on the above is that it makes it easy to iterate for failures (is anything false?), and also easy to find details.  Could have the details be inline with the failure though.

        Attachments

          Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            Activity

            ingenthr Matt Ingenthron created issue -
            ingenthr Matt Ingenthron made changes -
            Field Original Value New Value
            Link This issue relates to CBSE-3880 [ CBSE-3880 ]
            ingenthr Matt Ingenthron made changes -
            Link This issue blocks JSCBC-388 [ JSCBC-388 ]
            ingenthr Matt Ingenthron made changes -
            Link This issue blocks JSCBC-389 [ JSCBC-389 ]
            ingenthr Matt Ingenthron made changes -
            Link This issue blocks PCBC-498 [ PCBC-498 ]
            avsej Sergey Avseyev made changes -
            Status New [ 10003 ] Open [ 1 ]
            ingenthr Matt Ingenthron made changes -
            Assignee Matt Ingenthron [ ingenthr ] Sergey Avseyev [ avsej ]
            merrick.huang Merrick Huang (Inactive) made changes -
            Link This issue relates to CBSE-4016 [ CBSE-4016 ]
            Hide
            avsej Sergey Avseyev added a comment -

            The output, generated by 'cbc-ping' currently looks like this:

            $ cbc ping --details  -Ucouchbase://192.168.1.101
            {
               "services" : {
                  "fts" : [
                     {
                        "details" : "Success (Not an error)",
                        "latency" : "7.833ms",
                        "server" : "192.168.1.101:8094",
                        "status" : 0
                     },
                     {
                        "details" : "Success (Not an error)",
                        "latency" : "9.898ms",
                        "server" : "192.168.1.102:8094",
                        "status" : 0
                     },
                     {
                        "details" : "Success (Not an error)",
                        "latency" : "10.979ms",
                        "server" : "192.168.1.103:8094",
                        "status" : 0
                     },
                     {
                        "details" : "Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout",
                        "latency" : "75.000s",
                        "server" : "192.168.1.104:8094",
                        "status" : 23
                     }
                  ],
                  "kv" : [
                     {
                        "details" : "Success (Not an error)",
                        "latency" : "2.617ms",
                        "server" : "192.168.1.101:11210",
                        "status" : 0
                     },
                     {
                        "details" : "Success (Not an error)",
                        "latency" : "19.330ms",
                        "server" : "192.168.1.102:11210",
                        "status" : 0
                     },
                     {
                        "details" : "Success (Not an error)",
                        "latency" : "19.334ms",
                        "server" : "192.168.1.103:11210",
                        "status" : 0
                     },
                     {
                        "details" : "Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout",
                        "latency" : "2.505s",
                        "server" : "192.168.1.104:11210",
                        "status" : 23
                     }
                  ],
                  "n1ql" : [
                     {
                        "details" : "Success (Not an error)",
                        "latency" : "5.671ms",
                        "server" : "192.168.1.102:8093",
                        "status" : 0
                     },
                     {
                        "details" : "Success (Not an error)",
                        "latency" : "6.595ms",
                        "server" : "192.168.1.103:8093",
                        "status" : 0
                     },
                     {
                        "details" : "Success (Not an error)",
                        "latency" : "11.106ms",
                        "server" : "192.168.1.101:8093",
                        "status" : 0
                     },
                     {
                        "details" : "Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout",
                        "latency" : "75.000s",
                        "server" : "192.168.1.104:8093",
                        "status" : 23
                     }
                  ],
                  "views" : [
                     {
                        "details" : "Success (Not an error)",
                        "latency" : "7.001ms",
                        "server" : "192.168.1.101:8092",
                        "status" : 0
                     },
                     {
                        "details" : "Success (Not an error)",
                        "latency" : "9.022ms",
                        "server" : "192.168.1.102:8092",
                        "status" : 0
                     },
                     {
                        "details" : "Success (Not an error)",
                        "latency" : "10.866ms",
                        "server" : "192.168.1.103:8092",
                        "status" : 0
                     },
                     {
                        "details" : "Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout",
                        "latency" : "75.000s",
                        "server" : "192.168.1.104:8092",
                        "status" : 23
                     }
                  ]
               }
            }
            

            Show
            avsej Sergey Avseyev added a comment - The output, generated by 'cbc-ping' currently looks like this: $ cbc ping --details -Ucouchbase://192.168.1.101 { "services" : { "fts" : [ { "details" : "Success (Not an error)", "latency" : "7.833ms", "server" : "192.168.1.101:8094", "status" : 0 }, { "details" : "Success (Not an error)", "latency" : "9.898ms", "server" : "192.168.1.102:8094", "status" : 0 }, { "details" : "Success (Not an error)", "latency" : "10.979ms", "server" : "192.168.1.103:8094", "status" : 0 }, { "details" : "Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout", "latency" : "75.000s", "server" : "192.168.1.104:8094", "status" : 23 } ], "kv" : [ { "details" : "Success (Not an error)", "latency" : "2.617ms", "server" : "192.168.1.101:11210", "status" : 0 }, { "details" : "Success (Not an error)", "latency" : "19.330ms", "server" : "192.168.1.102:11210", "status" : 0 }, { "details" : "Success (Not an error)", "latency" : "19.334ms", "server" : "192.168.1.103:11210", "status" : 0 }, { "details" : "Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout", "latency" : "2.505s", "server" : "192.168.1.104:11210", "status" : 23 } ], "n1ql" : [ { "details" : "Success (Not an error)", "latency" : "5.671ms", "server" : "192.168.1.102:8093", "status" : 0 }, { "details" : "Success (Not an error)", "latency" : "6.595ms", "server" : "192.168.1.103:8093", "status" : 0 }, { "details" : "Success (Not an error)", "latency" : "11.106ms", "server" : "192.168.1.101:8093", "status" : 0 }, { "details" : "Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout", "latency" : "75.000s", "server" : "192.168.1.104:8093", "status" : 23 } ], "views" : [ { "details" : "Success (Not an error)", "latency" : "7.001ms", "server" : "192.168.1.101:8092", "status" : 0 }, { "details" : "Success (Not an error)", "latency" : "9.022ms", "server" : "192.168.1.102:8092", "status" : 0 }, { "details" : "Success (Not an error)", "latency" : "10.866ms", "server" : "192.168.1.103:8092", "status" : 0 }, { "details" : "Client-Side timeout exceeded for operation. Inspect network conditions or increase the timeout", "latency" : "75.000s", "server" : "192.168.1.104:8092", "status" : 23 } ] } }
            avsej Sergey Avseyev made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Resolved [ 5 ]
            Hide
            avsej Sergey Avseyev added a comment -

            Reopening this ticket, as health check defined more broadly and might include ping, but by default it is passive function, exposing state about current network connections.

            Show
            avsej Sergey Avseyev added a comment - Reopening this ticket, as health check defined more broadly and might include ping, but by default it is passive function, exposing state about current network connections.
            avsej Sergey Avseyev made changes -
            Resolution Fixed [ 1 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            avsej Sergey Avseyev made changes -
            Fix Version/s 2.8.2 [ 14806 ]
            Fix Version/s 2.7.7 [ 14532 ]
            ingenthr Matt Ingenthron made changes -
            Epic Link CBD-2088 [ 72721 ]
            avsej Sergey Avseyev made changes -
            Fix Version/s 2.8.3 [ 14820 ]
            Fix Version/s 2.8.2 [ 14806 ]
            ingenthr Matt Ingenthron made changes -
            Sprint SDK 45: IPv6, HC, LRedact [ 482 ]
            mike.goldsmith Michael Goldsmith made changes -
            Link This issue blocks PCBC-514 [ PCBC-514 ]
            ingenthr Matt Ingenthron made changes -
            Link This issue blocks NCBC-1574 [ NCBC-1574 ]
            brett19 Brett Lawson made changes -
            Link This issue blocks GOCBC-245 [ GOCBC-245 ]
            avsej Sergey Avseyev made changes -
            Status Reopened [ 4 ] In Progress [ 3 ]
            avsej Sergey Avseyev made changes -
            Fix Version/s 2.8.4 [ 14917 ]
            Fix Version/s 2.8.3 [ 14820 ]
            ingenthr Matt Ingenthron made changes -
            Sprint SDK 45: IPv6, HC [ 482 ] SDK 45: IPv6, HC, SDK 47: HC, Serv ID, Log Red [ 482, 506 ]
            ingenthr Matt Ingenthron made changes -
            Rank Ranked higher
            avsej Sergey Avseyev made changes -
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Resolved [ 5 ]
            avsej Sergey Avseyev made changes -
            Resolution Fixed [ 1 ]
            Status Resolved [ 5 ] Reopened [ 4 ]
            ingenthr Matt Ingenthron made changes -
            Sprint SDK 45: IPv6, HC, SDK 47: HC, Log Redact [ 482, 506 ] SDK 45: IPv6, HC, SDK 47: HC, Log Redact, SDK49: HC, Log Reda, CertAuth [ 482, 506, 510 ]
            ingenthr Matt Ingenthron made changes -
            Rank Ranked lower
            avsej Sergey Avseyev made changes -
            Resolution Fixed [ 1 ]
            Status Reopened [ 4 ] Resolved [ 5 ]
            perry Perry Krug made changes -
            Link This issue relates to CBSE-4016 [ CBSE-4016 ]
            perry Perry Krug made changes -
            Link This issue blocks CBSE-4016 [ CBSE-4016 ]
            tyler.mitchell Tyler Mitchell (Inactive) made changes -
            Remote Link This issue links to "Page (Couchbase, Inc. Wiki)" [ 16291 ]

              People

              • Assignee:
                avsej Sergey Avseyev
                Reporter:
                ingenthr Matt Ingenthron
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty

                    Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.