Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7661

[Done, RN 2.1.0] View query on Rebalance.Out fails with Reason: A view spec can not consist of merges exclusively.

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.2.0
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
      None
    • Environment:
      4 2.0.0 Machines, running on Linux.

      Description

      I've been running SDKQE stester runs for the Java SDK, specifically testing the following scenario:

      /stester -C 127.0.0.1:8050 -i 20devcluster.ini -c rebalance.Once --vdsw_dvname ddoc/vquery --rbcount 2 --dsw_timeres 1 --hdsw_mc_threads 10 --workload dsw.Hybrid --mode out --hdsw_http_threads 5 --hdsw_cb_threads 10 --rebound 90 -d -o rebalance-once.log

      My cluster is a 4 node cluster, and this scenario rebalances 2 nodes out of the cluster. During the rebalance, view queries are issued.

      Now it happens that during this rebalance (I assume this happens at the end of the rebalance) I get the following Exception:

      java.lang.RuntimeException: Failed to access the view
      at com.couchbase.client.CouchbaseClient.query(CouchbaseClient.java:838)
      at com.couchbase.sdkd.cbclient.ViewQueryCommandContext.execIter(ViewQueryCommandContext.java:252)
      at com.couchbase.sdkd.cbclient.CommandContext.execute(CommandContext.java:311)
      at com.couchbase.sdkd.server.SdkServer.executeCommand(SdkServer.java:135)
      at com.couchbase.sdkd.server.SdkServer.handleRequest(SdkServer.java:156)
      at com.couchbase.sdkd.server.SdkServer.run(SdkServer.java:212)
      Caused by: java.util.concurrent.ExecutionException: OperationException: SERVER: error Reason: A view spec can not consist of merges exclusively.
      at com.couchbase.client.internal.HttpFuture.waitForAndCheckOperation(HttpFuture.java:89)
      at com.couchbase.client.internal.HttpFuture.get(HttpFuture.java:73)
      at com.couchbase.client.internal.HttpFuture.get(HttpFuture.java:63)
      at com.couchbase.client.CouchbaseClient.query(CouchbaseClient.java:834)
      ... 5 more
      Caused by: OperationException: SERVER: error Reason: A view spec can not consist of merges exclusively.
      at com.couchbase.client.protocol.views.NoDocsOperationImpl.parseError(NoDocsOperationImpl.java:106)
      at com.couchbase.client.protocol.views.ViewOperationImpl.handleResponse(ViewOperationImpl.java:68)
      at com.couchbase.client.ViewNode$MyHttpRequestExecutionHandler.handleResponse(ViewNode.java:199)
      at org.apache.http.nio.protocol.AsyncNHttpClientHandler.processResponse(AsyncNHttpClientHandler.java:417)
      at org.apache.http.nio.protocol.AsyncNHttpClientHandler.inputReady(AsyncNHttpClientHandler.java:242)
      at com.couchbase.client.http.AsyncConnectionManager$ManagedClientHandler.inputReady(AsyncConnectionManager.java:244)
      at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:172)
      at org.apache.http.impl.nio.DefaultClientIOEventDispatch.inputReady(DefaultClientIOEventDispatch.java:155)
      at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:161)
      at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:335)
      at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
      at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:275)
      at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
      at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:542)
      at java.lang.Thread.run(Thread.java:680)
      Feb 1, 2013 1:04:30 PM com.couchbase.sdkd.cbclient.CommandResult warnAbout
      WARNING: Unknown exception encountered (for operation) future warnings will be suppressed
      java.lang.RuntimeException: Failed to access the view
      at com.couchbase.client.CouchbaseClient.query(CouchbaseClient.java:838)
      at com.couchbase.sdkd.cbclient.ViewQueryCommandContext.execIter(ViewQueryCommandContext.java:252)
      at com.couchbase.sdkd.cbclient.CommandContext.execute(CommandContext.java:311)
      at com.couchbase.sdkd.server.SdkServer.executeCommand(SdkServer.java:135)
      at com.couchbase.sdkd.server.SdkServer.handleRequest(SdkServer.java:156)
      at com.couchbase.sdkd.server.SdkServer.run(SdkServer.java:212)
      Caused by: java.util.concurrent.ExecutionException: OperationException: SERVER: no_active_vbuckets Reason: Cannot execute view query since the node has no active vbuckets
      at com.couchbase.client.internal.HttpFuture.waitForAndCheckOperation(HttpFuture.java:89)
      at com.couchbase.client.internal.HttpFuture.get(HttpFuture.java:73)
      at com.couchbase.client.internal.HttpFuture.get(HttpFuture.java:63)
      at com.couchbase.client.CouchbaseClient.query(CouchbaseClient.java:834)
      ... 5 more
      Caused by: OperationException: SERVER: no_active_vbuckets Reason: Cannot execute view query since the node has no active vbuckets
      at com.couchbase.client.protocol.views.NoDocsOperationImpl.parseError(NoDocsOperationImpl.java:106)
      at com.couchbase.client.protocol.views.ViewOperationImpl.handleResponse(ViewOperationImpl.java:68)
      at com.couchbase.client.ViewNode$MyHttpRequestExecutionHandler.handleResponse(ViewNode.java:199)
      at org.apache.http.nio.protocol.AsyncNHttpClientHandler.processResponse(AsyncNHttpClientHandler.java:417)
      at org.apache.http.nio.protocol.AsyncNHttpClientHandler.inputReady(AsyncNHttpClientHandler.java:242)
      at com.couchbase.client.http.AsyncConnectionManager$ManagedClientHandler.inputReady(AsyncConnectionManager.java:244)
      at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:172)
      at org.apache.http.impl.nio.DefaultClientIOEventDispatch.inputReady(DefaultClientIOEventDispatch.java:155)
      at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:161)
      at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:335)
      at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315)
      at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:275)
      at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
      at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:542)

      Note that the Java stuff is not of particular interest for this ticket, but the Error messages are. First, I get

      error Reason: A view spec can not consist of merges exclusively.
      and then
      no_active_vbuckets Reason: Cannot execute view query since the node has no active vbuckets

      The way the Java client currently implements is that it will remove the ViewNode during rebalance when it vanishes from the couchNodes list. I'll attach a rebalance log from one of the nodes that is getting rebalanced out, but as expected this happens at the very last step, its still in the list when it has no vbuckets anymore.

      Should the server not be able to handle view requests, even when it has no vbuckets attached?

      Thanks,
      Michael

        Issue Links

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          daschl Michael Nitschinger added a comment -

          For reference, IP-Addresses 192.168.56.101 - 104, 103 and 104 get removed. The log is from .104.

          Show
          daschl Michael Nitschinger added a comment - For reference, IP-Addresses 192.168.56.101 - 104, 103 and 104 get removed. The log is from .104.
          Hide
          alkondratenko Aleksey Kondratenko (Inactive) added a comment - - edited

          This is known problem. Some duplicate ticket should exist somewhere. We cannot easily fix it. Perhaps best thing we can do is send you back redirect to a node that likely works.

          Show
          alkondratenko Aleksey Kondratenko (Inactive) added a comment - - edited This is known problem. Some duplicate ticket should exist somewhere. We cannot easily fix it. Perhaps best thing we can do is send you back redirect to a node that likely works.
          Hide
          ingenthr Matt Ingenthron added a comment -

          Alk: but do we currently redirect, or do we currently error? It looks like we currently error.

          Show
          ingenthr Matt Ingenthron added a comment - Alk: but do we currently redirect, or do we currently error? It looks like we currently error.
          Hide
          alkondratenko Aleksey Kondratenko (Inactive) added a comment -

          No. We don't redirect AFAIK.

          You can try to mitigate on your side by either trying different node on any error, or by trying to detect this particular error.

          I cannot promise you yet any particular release when we'll start redirecting.

          Show
          alkondratenko Aleksey Kondratenko (Inactive) added a comment - No. We don't redirect AFAIK. You can try to mitigate on your side by either trying different node on any error, or by trying to detect this particular error. I cannot promise you yet any particular release when we'll start redirecting.
          Hide
          alkondratenko Aleksey Kondratenko (Inactive) added a comment -

          Too late for 2.0.2 but IMHO must have for 2.1

          Show
          alkondratenko Aleksey Kondratenko (Inactive) added a comment - Too late for 2.0.2 but IMHO must have for 2.1
          Hide
          ingenthr Matt Ingenthron added a comment -

          Is the most recent comment here still relevant? We still see this in integration tests. 2.1 or ...

          Show
          ingenthr Matt Ingenthron added a comment - Is the most recent comment here still relevant? We still see this in integration tests. 2.1 or ...
          Hide
          alkondratenko Aleksey Kondratenko (Inactive) added a comment -

          yes. We will not have it in most upcoming release

          Show
          alkondratenko Aleksey Kondratenko (Inactive) added a comment - yes. We will not have it in most upcoming release
          Hide
          alkondratenko Aleksey Kondratenko (Inactive) added a comment -

          let me clarify. We still don't plan to address that in next release (previously known as 2.0.2 and being renamed to 2.1)

          Show
          alkondratenko Aleksey Kondratenko (Inactive) added a comment - let me clarify. We still don't plan to address that in next release (previously known as 2.0.2 and being renamed to 2.1)
          Hide
          maria Maria McDuff (Inactive) added a comment -

          moving to 2.0.3

          Show
          maria Maria McDuff (Inactive) added a comment - moving to 2.0.3
          Hide
          anil Anil Kumar added a comment -

          this is must have for 2.0.3.

          Show
          anil Anil Kumar added a comment - this is must have for 2.0.3.
          Hide
          daschl Michael Nitschinger added a comment -

          FYI the way we try to remedy that on the client side in the meantime is to just retry on another node (if we get 300 or 500 back)..

          Show
          daschl Michael Nitschinger added a comment - FYI the way we try to remedy that on the client side in the meantime is to just retry on another node (if we get 300 or 500 back)..
          Hide
          kzeller kzeller added a comment - - edited

          Added to 2.1, 2.0.1, and 2.0 RN:

          <rnentry type="knownissue">

          <version ver="2.1.0a"/>

          <class id="cluster"/>

          <issue type="cb" ref="MB-7661"/>

          <rntext>

          <para>
          If you query a view during cluster rebalance it will fail and return the messages
          "error Reason: A view spec can not consist of merges exclusively" and then
          "no_active_vbuckets Reason: Cannot execute view query since the node has no active vbuckets."
          The workaround for this situation is to handle this error and retry later in your code. Alternatively
          the latest version of the Java SDK will automatically retry upon these errors.
          </para>

          </rntext>

          </rnentry>

          Show
          kzeller kzeller added a comment - - edited Added to 2.1, 2.0.1, and 2.0 RN: <rnentry type="knownissue"> <version ver="2.1.0a"/> <class id="cluster"/> <issue type="cb" ref=" MB-7661 "/> <rntext> <para> If you query a view during cluster rebalance it will fail and return the messages "error Reason: A view spec can not consist of merges exclusively" and then "no_active_vbuckets Reason: Cannot execute view query since the node has no active vbuckets." The workaround for this situation is to handle this error and retry later in your code. Alternatively the latest version of the Java SDK will automatically retry upon these errors. </para> </rntext> </rnentry>
          Hide
          ingenthr Matt Ingenthron added a comment -

          I would turn that last sentence around and say in general an application developer should retry the request if they get these error responses. The 1.1.8 (not yet released) version of the Java SDK is planned to automatically retry those for applications.

          Show
          ingenthr Matt Ingenthron added a comment - I would turn that last sentence around and say in general an application developer should retry the request if they get these error responses. The 1.1.8 (not yet released) version of the Java SDK is planned to automatically retry those for applications.
          Hide
          cweirich Christian Weirich added a comment -

          And what about the none Java SDKs? Will they be updated too?

          Show
          cweirich Christian Weirich added a comment - And what about the none Java SDKs? Will they be updated too?
          Hide
          alkondratenko Aleksey Kondratenko (Inactive) added a comment -

          As usual things are getting messy rapidly due to un-detected duplicates.

          As of 2.0 we do send redirect. Matt, is that something you're reasonably happy about (i.e. all sdks will honor those redirects) ?

          Corresponding commit is this:

          commit 06eed990928b99d5223094a61d68971f427df9ed
          Author: Aliaksey Kandratsenka <alk@tut.by>
          Date: Mon Oct 15 17:07:57 2012 -0700

          MB-6922: send 302 when handling no active vbuckets on view query

          So that clients can clearly distinguish hitting node being
          rebalanced-in or -out and hitting dead ddoc or bucket. Also Location
          header will point client to better node which is helpful as well.

          Change-Id: I5ed1066ba646a67d0197b67f3988251822dfec31
          Reviewed-on: http://review.couchbase.org/21657
          Tested-by: Aliaksey Kandratsenka <alkondratenko@gmail.com>
          Reviewed-by: Aliaksey Artamonau <aliaksiej.artamonau@gmail.com>

          Show
          alkondratenko Aleksey Kondratenko (Inactive) added a comment - As usual things are getting messy rapidly due to un-detected duplicates. As of 2.0 we do send redirect. Matt, is that something you're reasonably happy about (i.e. all sdks will honor those redirects) ? Corresponding commit is this: commit 06eed990928b99d5223094a61d68971f427df9ed Author: Aliaksey Kandratsenka <alk@tut.by> Date: Mon Oct 15 17:07:57 2012 -0700 MB-6922 : send 302 when handling no active vbuckets on view query So that clients can clearly distinguish hitting node being rebalanced-in or -out and hitting dead ddoc or bucket. Also Location header will point client to better node which is helpful as well. Change-Id: I5ed1066ba646a67d0197b67f3988251822dfec31 Reviewed-on: http://review.couchbase.org/21657 Tested-by: Aliaksey Kandratsenka <alkondratenko@gmail.com> Reviewed-by: Aliaksey Artamonau <aliaksiej.artamonau@gmail.com>
          Hide
          ingenthr Matt Ingenthron added a comment -

          Alk: did you mean "as of 2.0", or some other version? I ask because this has definitely been observed with recent releases.

          Show
          ingenthr Matt Ingenthron added a comment - Alk: did you mean "as of 2.0", or some other version? I ask because this has definitely been observed with recent releases.
          Hide
          ingenthr Matt Ingenthron added a comment -

          Christian: yes, we'll update other clients as soon as possible.

          Show
          ingenthr Matt Ingenthron added a comment - Christian: yes, we'll update other clients as soon as possible.
          Hide
          alkondratenko Aleksey Kondratenko (Inactive) added a comment -

          Yes. 2.0. So my own comments above from Apr are somewhat invalid. Apparently I forgot we did redirect.

          Now the question is: is redirect a sensible solution or you want server to deal with that (for some additional cost complexity- and efficiency- wide)

          Show
          alkondratenko Aleksey Kondratenko (Inactive) added a comment - Yes. 2.0. So my own comments above from Apr are somewhat invalid. Apparently I forgot we did redirect. Now the question is: is redirect a sensible solution or you want server to deal with that (for some additional cost complexity- and efficiency- wide)
          Hide
          ingenthr Matt Ingenthron added a comment -

          Two parts. One, I think a redirect is better than proxying or some other expensive solution. Two: we've seen this recently, so something must be wrong.

          Show
          ingenthr Matt Ingenthron added a comment - Two parts. One, I think a redirect is better than proxying or some other expensive solution. Two: we've seen this recently, so something must be wrong.
          Hide
          alkondratenko Aleksey Kondratenko (Inactive) added a comment -

          Noted. Good that we agree on redirect.

          That issue still occurs with regular views must be a bug. We'll need reproduction instructions or diags from recent reproduction in order to do something with that.

          Show
          alkondratenko Aleksey Kondratenko (Inactive) added a comment - Noted. Good that we agree on redirect. That issue still occurs with regular views must be a bug. We'll need reproduction instructions or diags from recent reproduction in order to do something with that.
          Hide
          cweirich Christian Weirich added a comment -

          Matt: thank you. good to know.

          Show
          cweirich Christian Weirich added a comment - Matt: thank you. good to know.
          Hide
          ingenthr Matt Ingenthron added a comment -

          Deepti: have we seen the error message identified in the summary line on this issue in any recent testing of 2.1.0 or 2.1.1? My recollection is that we have. Can you put 30 minutes or so into looking over past results and if you can identify a situation, we may need to repro again to gather logs for the cluster dev team.

          Show
          ingenthr Matt Ingenthron added a comment - Deepti: have we seen the error message identified in the summary line on this issue in any recent testing of 2.1.0 or 2.1.1? My recollection is that we have. Can you put 30 minutes or so into looking over past results and if you can identify a situation, we may need to repro again to gather logs for the cluster dev team.
          Hide
          deeptida Deepti Dawar added a comment -

          Hi Matt, I looked through the test results from the 2.1.1 and cbc 1.1.8 testing and I did not find these errors repeating. Now only the Invalid view exception is appearing most frequently which has been raised as a bug.

          Show
          deeptida Deepti Dawar added a comment - Hi Matt, I looked through the test results from the 2.1.1 and cbc 1.1.8 testing and I did not find these errors repeating. Now only the Invalid view exception is appearing most frequently which has been raised as a bug.
          Hide
          ingenthr Matt Ingenthron added a comment -

          Marking as resolved, as SDKQE reports this is not something we see any longer when client libraries correctly handle redirects.

          Show
          ingenthr Matt Ingenthron added a comment - Marking as resolved, as SDKQE reports this is not something we see any longer when client libraries correctly handle redirects.

            People

            • Assignee:
              andreibaranouski Andrei Baranouski
              Reporter:
              daschl Michael Nitschinger
            • Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes