Uploaded image for project: 'Java Couchbase JVM Core'
  1. Java Couchbase JVM Core
  2. JVMCBC-1531

AmbiguousTimeouts with retry reason KV_NOT_MY_VBUCKET and throughput drops to 0 while cluster is in rebalancing

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • 2.7.1
    • None
    • None
    • None
    • 0

    Description

      FIT-SIT runs on GitHub Actions have shown errors when executed against the latest commit of the Java SDK.

      Scaling test with private endpoint encounters SDK_AMBIGUOUS_TIMEOUT_EXCEPTIONwith retry reason KV_NOT_MY_VBUCKET.

      There is throughput drop to 0 seen before instance for test swapinstancetype and aws instance gets terminated. 

      Tests are passing with java version 3.6.2. 

      Passing on java version 3.6.2- 
          run: https://github.com/couchbaselabs/fit-app-deployment/actions/runs/9711640977/job/26804780918
           record: https://performance-sdk.couchbase.com:8080/situationalSingle?situationalRunId=7bd3a306-416c-4c00-ad15-ee27e82e6f93&runId=14f7f30d-3c85-43f4-8fef-4b9634115c57
        swapinstance test run: https://github.com/couchbaselabs/fit-app-deployment/actions/runs/9710305378/job/26800757748
        record: https://performance-sdk.couchbase.com:8080/situationalSingle?situationalRunId=5d51399a-0d13-4242-816a-d62e78a1212b&runId=412bd15e-efc7-44e1-8d29-7e8893543f4c

      Failing on latest commit - 

          run: https://github.com/couchbaselabs/fit-app-deployment/actions/runs/9711645957/job/26804794947

         record: https://performance-sdk.couchbase.com:8080/situationalSingle?situationalRunId=e26f9ecf-2e31-49e7-b10c-e87468f8cdaa&runId=41048cb4-df0c-4b6a-8ce2-b738818487e8
        swapinstance test run:[ https://github.com/couchbaselabs/fit-app-deployment/actions/runs/9739597859|https://github.com/couchbaselabs/fit-app-deployment/actions/runs/9739597859]
       record: was not able to save record, as instance get terminated before saving record. There is throughput drop to 0 seen before instance terminates. 

      08:01:36.900 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 6 ops/sec, 7176373 ops total, overall throughput: 3086 ops/sec
      1664808:01:41.900 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 3079 ops/sec
      1664908:01:46.900 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 3073 ops/sec
      1665008:01:51.901 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 3066 ops/sec
      1665108:01:56.901 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 3060 ops/sec
      1665208:02:01.901 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 3053 ops/sec
      1665308:02:06.901 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 3047 ops/sec
      1665408:02:11.901 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 3040 ops/sec
      1665508:02:16.901 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 3034 ops/sec
      1665608:02:21.901 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 3028 ops/sec
      1665708:02:26.901 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 3021 ops/sec
      1665808:02:31.901 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 3015 ops/sec
      1665908:02:36.902 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 3008 ops/sec
      1666008:02:41.902 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 3002 ops/sec
      1666108:02:46.902 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2996 ops/sec
      1666208:02:51.902 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2990 ops/sec
      1666308:02:56.902 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2983 ops/sec
      1666408:03:01.902 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2977 ops/sec
      1666508:03:06.902 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2971 ops/sec
      1666608:03:11.902 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2965 ops/sec
      1666708:03:16.903 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2959 ops/sec
      1666808:03:21.903 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2953 ops/sec
      1666908:03:26.903 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2947 ops/sec
      1667008:03:31.903 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2941 ops/sec
      1667108:03:36.903 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2935 ops/sec
      1667208:03:41.903 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2929 ops/sec
      1667308:03:46.903 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2923 ops/sec
      1667408:03:51.903 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2917 ops/sec
      1667508:03:56.903 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2911 ops/sec
      1667608:04:01.904 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2905 ops/sec
      1667708:04:06.904 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2899 ops/sec
      1667808:04:11.904 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2893 ops/sec
      1667908:04:16.904 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2887 ops/sec
      1668008:04:21.904 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2882 ops/sec
      1668108:04:26.904 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2876 ops/sec
      1668208:04:31.904 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2870 ops/sec
      1668308:04:36.904 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2864 ops/sec
      1668408:04:41.905 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2859 ops/sec
      1668508:04:46.905 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2853 ops/sec
      1668608:04:51.905 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2847 ops/sec
      1668708:04:56.905 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2842 ops/sec
      1668808:05:01.905 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2836 ops/sec
      1668908:05:06.905 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2830 ops/sec
      1669008:05:11.905 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2825 ops/sec
      1669108:05:16.905 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2819 ops/sec
      1669208:05:21.906 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2814 ops/sec
      1669308:05:26.906 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2808 ops/sec
      1669408:05:31.906 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2803 ops/sec
      1669508:05:36.906 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2797 ops/sec
      1669608:05:41.906 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2792 ops/sec
      1669708:05:46.906 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2786 ops/sec
      1669808:05:51.906 [ Thread-58] INFO  e.GrpcPerformanceMeasureThread - Driver received throughput over last 5.0 seconds: 0 ops/sec, 7176373 ops total, overall throughput: 2781 ops/sec
      1669908:05:51.906 [ Thread-58] ERROR e.GrpcPerformanceMeasureThread - Received nothing from the performer too often, something must be wrong, bailing 

      server and performer logs are attached for failing test. 

      Repro steps :- 

      1. Run this workflow with test CbDinoTest#rebalance4To3NodesMixedKvAndQuery on any env, to get the AmbigousTimeoutExceptions with retry reason KV_NOT_MY_VBUCKET during scaling. 
      2. Run this workflow with test CbDinoTest#swapInstanceTypeMixedKvAndQuery on any env, to repro the throughput drop to 0 and vm exit during scaling. 
      3. Server logs(if ran sbx env) and performer logs will get attached in the artifacts of the run.

      Analysis :-

      Get/Insert requests are getting timed out after retrying for max 7 times, while bucket open is in progress and server does not know about the vBucket. {"description":"Server does not know about this vBucket","name":"NOT_MY_VBUCKET"}.

      performer log:-

      11:17:47.606 cb-events DEBUG LoggingEventConsumer.java.122  [com.couchbase.request][RequestRetryScheduledEvent][1000ms] Request InsertRequest retry scheduled per RetryStrategy (Reason: KV_NOT_MY_VBUCKET) {"completed":false,"coreId":"0x4372461900000001","idempotent":false,"lastChannelId":"4372461900000001/00000000450FDE2F","lastDispatchedFrom":"172.18.0.2:41316","lastDispatchedTo":"private-endpoint.cfvcfpat0ndbf1u2.sandbox.nonprod-project-avengers.com:11208","requestId":17,"requestType":"InsertRequest","retried":6,"retryReasons":["ENDPOINT_NOT_AVAILABLE","KV_NOT_MY_VBUCKET","BUCKET_OPEN_IN_PROGRESS"],"service":{"bucket":"default","collection":"_default","documentId":"<ud>7afd32fb-4ced-411f-869f-9de5a29a48c9</ud>","errorCode":{"description":"Server does not know about this vBucket","name":"NOT_MY_VBUCKET"},"opaque":"0x30","rejectedWithNotMyVbucket":1,"scope":"_default","type":"kv","vbucket":475},"timeoutMs":2500,"timings":{"dispatchMicros":1500,"encodingMicros":1439,"totalDispatchMicros":1500,"totalServerMicros":0,"serverMicros":0}}
      11:17:47.606 cb-events TRACE LoggingEventConsumer.java.115  [com.couchbase.config][ConfigIgnoredEvent] The proposed configuration (default) was ignored because of: OLD_OR_SAME_REVISION {"coreId":"0x4372461900000001"}
      11:17:47.606 cb-events DEBUG LoggingEventConsumer.java.122  [com.couchbase.io][NotMyVbucketReceivedEvent] Received a NotMyVBucket response from the server for partition 191, will retry. {"bucket":"default","coreId":"0x4372461900000001","local":"/172.18.0.2:41316","remote":"private-endpoint.cfvcfpat0ndbf1u2.sandbox.nonprod-project-avengers.com/10.0.102.89:11208"} 

         

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              michael.reiche Michael Reiche
              saurabh.mishra Saurabh Mishra
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty