Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-36693

Rebalance operation for Analytics online upgrade for multiple nodes is stuck or very slow

    XMLWordPrintable

Details

    Description

      Build : 6.5.0-4676

      There is a test to perform an online upgrade on a cluster having analytics nodes, 2 nodes at a time, by using the rebalance out-upgrade-rebalance in method.

      The rebalance in operation of 2 new analytics nodes is stuck for over 17 hrs now.

      Steps
      1. 5 node cluster on 6.0.1 release, 4 of them have analytics service.
      2. Create some buckets and datasets.
      3. Rebalance out 2 nodes having analytics service
      4. Install 6.5 on both these nodes
      5. Rebalance in the 2 nodes with analytics service on them

      This rebalance operation is hung for 17+ hrs.

      This might be an intermittent failure, as it is not seen in the job that upgrades the cluster from 6.0.0 to 6.5.0

      On one of the node still on 6.0.1 (172.23.108.16) errors like these are seen in the logs:

      2019-10-29T06:51:11.992-07:00 WARN CBAS.server.QueryServiceServlet [HttpExecutor(port:9110)-11] handleException: unexpected exception: {"host":"127.0.0.1:9110","path":"/query/service","statement":"select * from `Metadata`.`Link`","pretty":false,"mode":null,"clientContextID":"e49f1a4c-fab1-48ca-b0a0-ba846952281b","format":null,"timeout":null,"maxResultReads":null,"planFormat":null,"expressionTree":false,"rewrittenExpressionTree":false,"logicalPlan":false,"optimizedLogicalPlan":false,"job":false,"signature":false,"multiStatement":false}
      java.lang.InterruptedException: null
      	at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:385) ~[?:?]
      	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022) ~[?:?]
      	at org.apache.asterix.api.http.server.NCQueryServiceServlet.executeStatement(NCQueryServiceServlet.java:98) [asterix-app.jar:6.0.1-2037]
      	at com.couchbase.analytics.servlet.AnalyticsQueryServiceServlet.executeStatement(AnalyticsQueryServiceServlet.java:44) ~[cbas-server.jar:6.0.1-2037]
      	at org.apache.asterix.api.http.server.QueryServiceServlet.handleRequest(QueryServiceServlet.java:556) [asterix-app.jar:6.0.1-2037]
      	at org.apache.asterix.api.http.server.QueryServiceServlet.post(QueryServiceServlet.java:119) [asterix-app.jar:6.0.1-2037]
      	at com.couchbase.analytics.servlet.DiagnosticsQueryServiceServlet.get(DiagnosticsQueryServiceServlet.java:25) [cbas-server.jar:6.0.1-2037]
      	at org.apache.hyracks.http.server.AbstractServlet.handle(AbstractServlet.java:88) [hyracks-http.jar:6.0.1-2037]
      	at com.couchbase.analytics.servlet.AuthenticatedServlet.handle(AuthenticatedServlet.java:79) [cbas-server.jar:6.0.1-2037]
      	at org.apache.hyracks.http.server.HttpRequestHandler.handle(HttpRequestHandler.java:80) [hyracks-http.jar:6.0.1-2037]
      	at org.apache.hyracks.http.server.HttpRequestHandler.call(HttpRequestHandler.java:65) [hyracks-http.jar:6.0.1-2037]
      	at org.apache.hyracks.http.server.HttpRequestHandler.call(HttpRequestHandler.java:37) [hyracks-http.jar:6.0.1-2037]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
      	at java.lang.Thread.run(Thread.java:834) [?:?]
      2019-10-29T06:51:11.993-07:00 WARN CBAS.server.QueryServiceServlet [HttpExecutor(port:9110)-11] Error flushing output writer
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              michael.blow Michael Blow
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty