Uploaded image for project: 'Java Couchbase JVM Core'
  1. Java Couchbase JVM Core
  2. JVMCBC-1384

100% queries timing out during cluster rebalance.

    XMLWordPrintable

Details

    • Bug
    • Resolution: Incomplete
    • Test Blocker
    • None
    • 2.4.10
    • None
    • Enterprise Edition 7.6.0 build 1493
      Java Client: 3.4.10
    • 0

    Description

      1. Create a 5 nodes cluster 3KV, 2GSI+N1QL, a bucket, 10 collections. Load some data in collections. Build indexes
      2. Start n1ql query workload
      3. Scale up the cluster by 1 node for each service group. Continue the same in next 4 iterations
      4. Scale down the cluster by 1 node for each service group. Continue the same in next 4 iterations
      5. While cluster is rebalancing all the queries ate timing out due to ENDPOINT_NOT_AVAILABLE

      The first scaling started at 2023-09-14 16:47:57,778:

      2023-09-14 16:47:57,778 | test  | CRITICAL | MainThread | [task:__init__:354] Scale_params: [{'count': 4, 'services': [{'type': 'kv'}], 'compute': {'type': 'n2-standard-16'}, 'disk': {'type': 'pd-ssd', 'sizeInGb': 1000}, 'diskAutoScaling': {'enabled': True}}, {'count': 3, 'services': [{'type': 'index'}, {'type': 'n1ql'}], 'compute': {'type': 'n2-standard-16'}, 'disk': {'type': 'pd-ssd', 'sizeInGb': 500}, 'diskAutoScaling': {'enabled': True}}]
      

      At 2023-09-14 16:49:52,890

      N1QL Query Statistics
      +----------+---------------+----------------+-----------------+------------------+-------------------+-----------------+-----------------+
      | Bucket   | Total Queries | Failed Queries | Success Queries | Rejected Queries | Cancelled Queries | Timeout Queries | Errored Queries |
      +----------+---------------+----------------+-----------------+------------------+-------------------+-----------------+-----------------+
      | default0 | count(14474)  | count(0)       | count(14427)    | count(0)         | count(0)          | count(30)       | count(0)        |
      +----------+---------------+----------------+-----------------+------------------+-------------------+-----------------+-----------------+
      

      At 2023-09-14 17:55:52,993

      N1QL Query Statistics
      +----------+---------------+----------------+-----------------+------------------+-------------------+-----------------+-----------------+
      | Bucket   | Total Queries | Failed Queries | Success Queries | Rejected Queries | Cancelled Queries | Timeout Queries | Errored Queries |
      +----------+---------------+----------------+-----------------+------------------+-------------------+-----------------+-----------------+
      | default0 | count(14994)  | count(0)       | count(14427)    | count(0)         | count(0)          | count(550)      | count(0)        |
      +----------+---------------+----------------+-----------------+------------------+-------------------+-----------------+-----------------+
      

      The cluster rebalance percentage at this moment: 56.56 %

      After a certain point during the test all the queries are timing out due to the same error.

      From:

      2023-09-15 02:35:53,832 | test  | INFO    | Thread-1   | [table_view:display:72] N1QL Query Statistics
      +----------+----------------+----------------+-----------------+------------------+-------------------+-----------------+-----------------+
      | Bucket   | Total Queries  | Failed Queries | Success Queries | Rejected Queries | Cancelled Queries | Timeout Queries | Errored Queries |
      +----------+----------------+----------------+-----------------+------------------+-------------------+-----------------+-----------------+
      | default0 | count(1016407) | count(0)       | count(1009327)  | count(0)         | count(0)          | count(6745)     | count(0)        |
      +----------+----------------+----------------+-----------------+------------------+-------------------+-----------------+-----------------+
      

      TO:

      2023-09-15 11:13:54,719 | test  | INFO    | Thread-1   | [table_view:display:72] N1QL Query Statistics
      +----------+----------------+----------------+-----------------+------------------+-------------------+-----------------+-----------------+
      | Bucket   | Total Queries  | Failed Queries | Success Queries | Rejected Queries | Cancelled Queries | Timeout Queries | Errored Queries |
      +----------+----------------+----------------+-----------------+------------------+-------------------+-----------------+-----------------+
      | default0 | count(1035153) | count(0)       | count(1009327)  | count(0)         | count(0)          | count(25415)    | count(0)        |
      +----------+----------------+----------------+-----------------+------------------+-------------------+-----------------+-----------------+
      

      Server Logs:
      http://supportal.couchbase.com/snapshot/38cc3f5f7e39b79e68a781fb6bf25839::0

      s3://cb-customers-secure/query_timeout/2023-09-15/collectinfo-2023-09-15t160922-ns_1@svc-d-node-001.vg-9bboethcxizu4.sandbox.nonprod-project-avengers.com.zip
      s3://cb-customers-secure/query_timeout/2023-09-15/collectinfo-2023-09-15t160922-ns_1@svc-d-node-002.vg-9bboethcxizu4.sandbox.nonprod-project-avengers.com.zip
      s3://cb-customers-secure/query_timeout/2023-09-15/collectinfo-2023-09-15t160922-ns_1@svc-d-node-003.vg-9bboethcxizu4.sandbox.nonprod-project-avengers.com.zip
      s3://cb-customers-secure/query_timeout/2023-09-15/collectinfo-2023-09-15t160922-ns_1@svc-d-node-012.vg-9bboethcxizu4.sandbox.nonprod-project-avengers.com.zip
      s3://cb-customers-secure/query_timeout/2023-09-15/collectinfo-2023-09-15t160922-ns_1@svc-qi-node-004.vg-9bboethcxizu4.sandbox.nonprod-project-avengers.com.zip
      s3://cb-customers-secure/query_timeout/2023-09-15/collectinfo-2023-09-15t160922-ns_1@svc-qi-node-005.vg-9bboethcxizu4.sandbox.nonprod-project-avengers.com.zip
      s3://cb-customers-secure/query_timeout/2023-09-15/collectinfo-2023-09-15t160922-ns_1@svc-qi-node-007.vg-9bboethcxizu4.sandbox.nonprod-project-avengers.com.zip

      SDK logs are attached.

      Attachments

        1. consoleText .txt
          4.23 MB
        2. JavaSDK.log
          1.04 MB
        3. JavaSDK.log.zip
          3.34 MB
        4. logsCollectionFailed
          73 kB
        5. screenshot-1.png
          screenshot-1.png
          74 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              ritesh.agarwal Ritesh Agarwal
              ritesh.agarwal Ritesh Agarwal
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty