Uploaded image for project: 'Java Couchbase JVM Core'
  1. Java Couchbase JVM Core
  2. JVMCBC-1169

waitUntilReady intermittently times out

    XMLWordPrintable

Details

    • Bug
    • Resolution: Unresolved
    • Major
    • None
    • None
    • None
    • None
    • 0

    Description

      This is an issue intermittently but often seen on CI.

      I've seen three variants:

      Variant 1: Stuck waiting for... config?

          .doOnNext(v -> logger.info("waitUntilReady 1 {}", v))
            .onBackpressureDrop()
            .filter(i -> !(core.configurationProvider().bucketConfigLoadInProgress()
              || core.configurationProvider().globalConfigLoadInProgress()
              || (bucketName.isPresent() && core.configurationProvider().collectionRefreshInProgress())
              || (bucketName.isPresent() && core.clusterConfig().bucketConfig(bucketName.get()) == null))
            )
            .flatMap(i -> {
              logger.info("waitUntilReady 2 {} {} {} {}", i, serviceTypes, bucketName, state);

      Last line of logfrom WaitUntilReadyHelper is

      18:01:07 INFO  [com.couchbase.client.core.diagnostics.WaitUntilReadyHelper:74] waitUntilReady 1 2 

      To check: why are we stuck here?  What are we waiting for?  Why is that stuck?

       

      Variant 2: Sends request into core.  No response back.  Hangs.

      https://sdk.jenkins.couchbase.com/view/Nightly/job/jvm/job/jvm-clients/job/couchbase-jvm-clients-scipted-build-pipeline/1501/testReport/junit/com.couchbase.client.java/SubdocIntegrationTest/CE_testing__Linux__cbdyncluster_7_0_2__Oracle_JDK_8____com_couchbase_client_java_SubdocIntegrationTest/
      Last line from WaitUntilReadyHelper: 

      18:02:40 INFO  [com.couchbase.client.core.diagnostics.WaitUntilReadyHelper:82] waitUntilReady 2 2 null Optional[ec00a00f-cac9-4fad-b2ae-1f565b8b73da] {current_stage=CONFIG_LOAD, current_stage_since_ms=32, timings_ms={}, total_ms=32} 

      Check: timeout on request?   What do/should we do when the timeout fires?  Why are we stuck here?

      Variant 3: Getting stuck inside HealthPinger

      This is the most common variant.

      WaitUntilReadyHelper has progressed to the point of pinging services, and it's stuck somewhere inside here.

      https://sdk.jenkins.couchbase.com/view/Nightly/job/jvm/job/jvm-clients/job/couchbase-jvm-clients-scipted-build-pipeline/1501/testReport/junit/com.couchbase.client.java.manager.collection/CollectionManagerIntegrationTest/CE_testing__Linux__cbdyncluster_7_0_2__Oracle_JDK_8____com_couchbase_client_java_manager_collection_CollectionManagerIntegrationTest/

      ...[truncated 49704 chars]... 

      means we've lost crucial logs and need to back the logging down a bit, before we get some insight here.

       

       

      Attachments

        Issue Links

          For Gerrit Dashboard: JVMCBC-1169
          # Subject Branch Project Status CR V

          Activity

            People

              emilien.bevierre Emilien Bevierre
              graham.pople Graham Pople
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There is 1 open Gerrit change

                  PagerDuty