Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-57688

RequestCancelledExceptions are seen when cbas node rebalance out nearing completion.

    XMLWordPrintable

Details

    Description

      1. Create a 3 nodes colocated services (k:q:i:a) cluster, a bucket, 2 collections. Load some data in collections. Build indexes, create datasets etc.
      2. Start n1ql/cbas query workload
      3. Scale up the cluster from 3 to 6 nodes, 6 to 9 nodes. Everything went fine.
      4. Scale down the cluster from 9 to 6 nodes and nodes started removing from the cluster 1 at a time.
      5. While node removal is nearing completion some cbas/n1ql queries started hitting RequestCancelledExceptions based on which service is rebalancing and traffic is switching.
      6. It looks like a race between cluster map shared to the client and client has already dispatched few requests to the outgoing node.

      First instance of Exception for CBAS:

      At 2023-06-29 22:36:57,984 PST OR 05:36:57 AM 30 Jun, 2023 GMT

      com.couchbase.client.core.error.RequestCanceledException: AnalyticsRequest, Reason: NO_MORE_RETRIES (CHANNEL_CLOSED_WHILE_IN_FLIGHT) {"cancelled":true,"completed":true,"coreId":"0x987046a500000002","idempotent":false,"lastDispatchedFrom":"172.23.107.120:40634","lastDispatchedTo":"svc-dqisa-node-007.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com:18095","reason":"NO_MORE_RETRIES (CHANNEL_CLOSED_WHILE_IN_FLIGHT)","requestId":538504,"requestType":"AnalyticsRequest","retried":0,"service":{"httpMethod":"POST","operationId":"query_thread_default04","priority":0,"statement":"select v.name, animal from default0_VolumeCollection0_ds1 as v unnest v.animals as animal where v.attributes.hair = \"Burgundy\" limit 10;","type":"analytics","uri":"/analytics/service"},"timeoutMs":75000,"timings":{"totalMicros":5037850}}
      

      Rebalance out start time 5:34:31 AM 30 Jun, 2023

      Starting rebalance, KeepNodes = ['ns_1@svc-dqisa-node-001.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
      'ns_1@svc-dqisa-node-002.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
      'ns_1@svc-dqisa-node-003.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
      'ns_1@svc-dqisa-node-004.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
      'ns_1@svc-dqisa-node-005.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
      'ns_1@svc-dqisa-node-006.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
      'ns_1@svc-dqisa-node-008.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
      'ns_1@svc-dqisa-node-009.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com'], EjectNodes = ['ns_1@svc-dqisa-node-007.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com'],
      

      Rebalance out complete time 5:37:18 AM 30 Jun, 2023

      Rebalance completed successfully.
      Rebalance Operation Id = 018a2576213a6310d43a42ff45d5a3d2
      ns_orchestrator 000
      ns_1@svc-dqisa-node-001.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com
      5:37:18 AM 30 Jun, 2023
      

      N1ql Exception at 2023-06-29 22:42:41,141 OR 5:42:41 AM 30 Jun, 2023 GMT

      2023-06-29 22:42:41,141 | infra | CRITICAL | query_thread_default09 | [hostedN1QL:_run_query:447] com.couchbase.client.core.error.RequestCanceledException: QueryRequest, Reason: NO_MORE_RETRIES (CHANNEL_CLOSED_WHILE_IN_FLIGHT) {"cancelled":true,"completed":true,"coreId":"0x987046a500000001","idempotent":false,"lastDispatchedFrom":"172.23.107.120:51262","lastDispatchedTo":"svc-dqisa-node-009.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com:18093","reason":"NO_MORE_RETRIES (CHANNEL_CLOSED_WHILE_IN_FLIGHT)","requestId":655084,"requestType":"QueryRequest","retried":0,"service":{"bucket":"default0","operationId":"query_thread_default09","scope":"_default","statement":"select name from VolumeCollection0 where age between 30 and 50 limit 100;","type":"query"},"timeoutMs":75000,"timings":{"totalMicros":20851}}
      

      This is seen during:

      Rebalance Started at 5:40:13 AM 30 Jun, 2023

      Starting rebalance, KeepNodes = ['ns_1@svc-dqisa-node-001.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
      'ns_1@svc-dqisa-node-002.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
      'ns_1@svc-dqisa-node-003.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
      'ns_1@svc-dqisa-node-004.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
      'ns_1@svc-dqisa-node-005.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
      'ns_1@svc-dqisa-node-006.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com',
      'ns_1@svc-dqisa-node-008.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com'], EjectNodes = ['ns_1@svc-dqisa-node-009.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com'], Failed over and being ejected nodes = []; no delta recovery nodes; Operation Id = 0337aebb09f83d6978dd19373021a094
      

      Rebalance completed at 5:42:59 AM 30 Jun, 2023

      Rebalance completed successfully.
      Rebalance Operation Id = 0337aebb09f83d6978dd19373021a094
      ns_orchestrator 000
      ns_1@svc-dqisa-node-001.t0kbgowmbawjgwwz.sandbox.nonprod-project-avengers.com
      5:42:59 AM 30 Jun, 2023
      

      Rebalance order by service on a node: KV -> FTS -> Index -> Analytics -> N1ql

      cc: Ritam Sharma

      Attachments

        1. testLogs.txt
          404 kB
          Ritesh Agarwal
        2. testLogs_7.2.0.txt
          668 kB
          Ritesh Agarwal
        3. TestLogs.txt
          530 kB
          Ritesh Agarwal
        4. image-2023-06-30-00-03-05-165.png
          396 kB
          Ritesh Agarwal

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              graham.pople Graham Pople
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty