Uploaded image for project: 'Couchbase Java Client'
  1. Couchbase Java Client
  2. JCBC-305

Client does not recover with Memcached connection to the server and primary node goes down.

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Security Level: Public
    • Labels:
      None

      Description

      2013-05-21 12:59:04.158 ERROR net.spy.memcached.protocol.ascii.StoreOperationImpl: Error: SERVER_ERROR temporary failure
      Deepti Exception 1 java.util.concurrent.ExecutionException: OperationException: SERVER: SERVER_ERROR temporary failure
      2013-05-21 12:59:04.158 INFO net.spy.memcached.MemcachedConnection: Reconnection due to exception handling a memcached operation on

      {QA sa=/10.3.121.207:11211, #Rops=2, #Wops=0, #iq=0, topRop=Cmd: add Key: Emp0000000365cbc Flags: 0 Exp: 0 Data Length: 79, topWop=null, toWrite=0, interested=8}

      . This may be due to an authentication failure.
      OperationException: SERVER: SERVER_ERROR temporary failure
      at net.spy.memcached.protocol.BaseOperationImpl.handleError(BaseOperationImpl.java:164)
      at net.spy.memcached.protocol.ascii.OperationImpl.readFromBuffer(OperationImpl.java:151)
      at net.spy.memcached.MemcachedConnection.handleReads(MemcachedConnection.java:537)
      at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:430)
      at net.spy.memcached.MemcachedConnection.handleIO(MemcachedConnection.java:247)
      at net.spy.memcached.MemcachedConnection.run(MemcachedConnection.java:915)
      2013-05-21 12:59:04.159 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000366cbc.
      2013-05-21 12:59:04.159 WARN net.spy.memcached.MemcachedConnection: Closing, and reopening

      {QA sa=/10.3.121.207:11211, #Rops=2, #Wops=0, #iq=0, topRop=Cmd: add Key: Emp0000000365cbc Flags: 0 Exp: 0 Data Length: 79, topWop=null, toWrite=0, interested=8}

      , attempt 22.
      2013-05-21 12:59:04.159 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000366cbc.
      2013-05-21 12:59:04.159 WARN net.spy.memcached.protocol.ascii.AsciiMemcachedNodeImpl: Discarding partially completed op: Cmd: add Key: Emp0000000365cbc Flags: 0 Exp: 0 Data Length: 79
      2013-05-21 12:59:04.160 WARN net.spy.memcached.protocol.ascii.AsciiMemcachedNodeImpl: Discarding partially completed op: Cmd: version
      2013-05-21 12:59:04.160 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000366cbc.
      2013-05-21 12:59:04.160 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000366cbc.
      Deepti Exception 1 java.lang.RuntimeException: Timed out waiting for operation
      2013-05-21 12:59:06.660 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000367cbc.
      2013-05-21 12:59:06.661 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000367cbc.
      Deepti Exception 1 java.lang.RuntimeException: Timed out waiting for operation
      2013-05-21 12:59:09.163 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000368cbc.
      2013-05-21 12:59:09.164 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000368cbc.
      Deepti Exception 1 java.lang.RuntimeException: Timed out waiting for operation
      2013-05-21 12:59:11.666 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000369cbc.
      2013-05-21 12:59:11.667 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000369cbc.
      Deepti Exception 1 java.lang.RuntimeException: Timed out waiting for operation
      2013-05-21 12:59:14.168 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000370cbc.
      2013-05-21 12:59:14.169 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000370cbc.
      Deepti Exception 1 java.lang.RuntimeException: Timed out waiting for operation
      2013-05-21 12:59:16.671 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000371cbc.
      2013-05-21 12:59:16.672 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000371cbc.
      Deepti Exception 1 java.lang.RuntimeException: Timed out waiting for operation
      2013-05-21 12:59:19.174 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000372cbc.
      2013-05-21 12:59:19.175 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000372cbc.
      Deepti Exception 1 java.lang.RuntimeException: Timed out waiting for operation
      2013-05-21 12:59:21.677 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000373cbc.
      2013-05-21 12:59:21.678 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000373cbc.
      Deepti Exception 1 java.lang.RuntimeException: Timed out waiting for operation
      2013-05-21 12:59:24.179 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000374cbc.
      2013-05-21 12:59:24.180 WARN net.spy.memcached.MemcachedConnection: Could not redistribute to another node, retrying primary node for Emp0000000374cbc.

      1. mem_all_nodes_input.log
        81 kB
        Deepti Dawar
      2. mem_primary_node_failure.log
        93 kB
        Deepti Dawar

        Issue Links

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          Hide
          deeptida Deepti Dawar added a comment -

          In this case, server is stopped one by one on all the server nodes starting with the primary node.
          Even when the first node is brought up, the client does not recover. It recovers much after all the nodes are started and are active in the cluster.

          Show
          deeptida Deepti Dawar added a comment - In this case, server is stopped one by one on all the server nodes starting with the primary node. Even when the first node is brought up, the client does not recover. It recovers much after all the nodes are started and are active in the cluster.
          Hide
          deeptida Deepti Dawar added a comment - - edited

          One more observation, the client is only trying to insert data to the server nodes. Even if the primary node fails, it should not wait for the recovery of this node, instead should point to the other active node of the cluster and add the data there.
          Not sure of the behavior here. Matt, can you please help me in understanding what should be the correct behavior in this case.

          Show
          deeptida Deepti Dawar added a comment - - edited One more observation, the client is only trying to insert data to the server nodes. Even if the primary node fails, it should not wait for the recovery of this node, instead should point to the other active node of the cluster and add the data there. Not sure of the behavior here. Matt, can you please help me in understanding what should be the correct behavior in this case.
          Hide
          deeptida Deepti Dawar added a comment -

          When all nodes are given as input, then also the behavior remains the same or even worse. Now getting the ConcurrentExecutionException while cancelling the connection to the primary node when it goes down. Attatched is the log.

          Show
          deeptida Deepti Dawar added a comment - When all nodes are given as input, then also the behavior remains the same or even worse. Now getting the ConcurrentExecutionException while cancelling the connection to the primary node when it goes down. Attatched is the log.
          Hide
          daschl Michael Nitschinger added a comment -

          This should be fixed as of now, because the resubscribing is now handling on memcache buckets as well.

          Please note that we track pure SPY issues in the SPY jira.

          Show
          daschl Michael Nitschinger added a comment - This should be fixed as of now, because the resubscribing is now handling on memcache buckets as well. Please note that we track pure SPY issues in the SPY jira.

            People

            • Assignee:
              deeptida Deepti Dawar
              Reporter:
              deeptida Deepti Dawar
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes