Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-34928

CBAS Failing Scale Down to Single Node

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 6.0.3
    • 6.0.0, 6.0.1
    • analytics
    • GKE, stock docker images and Operator 2.0.0
    • Untriaged
    • Unknown
    • CX Sprint 159, CX Sprint 160, CX Sprint 161, CX Sprint 162, CX Sprint 163, CX Sprint 164, CX Sprint 165

    Description

      • Test starts with one node and default bucket
      • Creates datasets for all documents, documents of type 1 and documents of type 2
      • Continuously populates bucket with alternating type 1/2 documents during the test
      • Scales nodes from 1->2, 2->3, 3->2, 2->1
      • When going from 2->1 rebalance fails in the UI with

      Service 'cbas' exited with status 1. Restarting. Messages: at com.couchbase.analytics.servlet.AuthenticatedServlet.handle(AuthenticatedServlet.java:79) [cbas-server.jar:6.0.1-2037] at org.apache.hyracks.http.server.HttpRequestHandler.handle(HttpRequestHandler.java:80) [hyracks-http.jar:6.0.1-2037] at org.apache.hyracks.http.server.HttpRequestHandler.call(HttpRequestHandler.java:65) [hyracks-http.jar:6.0.1-2037] at org.apache.hyracks.http.server.HttpRequestHandler.call(HttpRequestHandler.java:37) [hyracks-http.jar:6.0.1-2037] at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:834) [?:?] 2019-07-08T11:07:28.085+00:00 ERRO CBAS.cbas Failed to remove replica Unexpected response 500 2019-07-08T11:07:28.086+00:00 FATA CBAS.cbas Unexpected failure while sync'ing replicas Unexpected response 500 [goport(/opt/couchbase/bin/cbas)] 2019/07/08 11:07:28 Timeout while flushing stderr [goport(/opt/couchbase/bin/cbas)] 2019/07/08 11:07:28 child process exited with status 1

      * Subsequent rebalance triggered by the Operator crawls up very, very, very slowly until the test times out.  In the logs attached node 0 even goes down :/

      {"level":"info","ts":1562584069.4712546,"logger":"couchbaseutil","msg":"Rebalancing","progress":50.00000133333333} {"level":"info","ts":1562584073.4745781,"logger":"couchbaseutil","msg":"Rebalancing","progress":50.00000266666667} {"level":"info","ts":1562584077.4778988,"logger":"couchbaseutil","msg":"Rebalancing","progress":50.000004} {"level":"info","ts":1562584081.4819095,"logger":"couchbaseutil","msg":"Rebalancing","progress":50.00000533333333} {"level":"info","ts":1562584085.4851,"logger":"couchbaseutil","msg":"Rebalancing","progress":50.00000666666666} {"level":"info","ts":1562584089.4883785,"logger":"couchbaseutil","msg":"Rebalancing","progress":50.00000799999999} {"level":"info","ts":1562584093.4917228,"logger":"couchbaseutil","msg":"Rebalancing","progress":50.00000933333334}

      * Mitigated on our end by not doing the scale down to one node

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              simon.murray Simon Murray
              simon.murray Simon Murray
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty