Uploaded image for project: 'Couchbase Java Client'
  1. Couchbase Java Client
  2. JCBC-70

Client fails to reconnect to server of non-default memcached bucket after failover and add back

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Duplicate
    • Affects Version/s: 1.0.3
    • Fix Version/s: 1.1.2
    • Component/s: Core
    • Security Level: Public
    • Labels:

      Description

      In earlier tests with reconnecting to a node on failover we used default memcached bucket. But when we tested the same scenario with a non-default bucket, we noticed the client did not reconnect (due to a null pointer exception internally). I have attached the SDK logs for this scenario where we used "IndexByLniataData" memcached bucket. The problem presents when adding the node back after a failover.

      11:34:43,411 DEBUG [Memcached IO over

      {MemcachedConnection to /10.14.5.119:11210}

      ] [CouchbaseMemcachedConnection] Selecting with delay of 3038ms
      Exception in thread "Thread-3" java.lang.NullPointerException
      at net.spy.memcached.auth.AuthThread.buildOperation(AuthThread.java:117)
      at net.spy.memcached.auth.AuthThread.run(AuthThread.java:86)

      Logs/stack trace attached.

        Issue Links

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

          perry Perry Krug created issue -
          Hide
          ingenthr Matt Ingenthron added a comment -

          I've spent a bit of time analyzing this issue, and it's not clear what the cause is. It is correct though that this would cause the auth thread to die, and as such authentication to the node would never complete.

          There is a safeguard already in that the continuous timeout threshold will kick in and then the connection will be rebuilt. I don't know if this issue comes up all of the time, but assuming it's a rare event we'd see 1000 operations timeout (by default) followed by the connection being rebuilt.

          We'd have to add some diagnostic information to the client and reliably reproduce this to identify the issue. I think the scenario is:
          1) set up a cluster of say 3 nodes
          2) configure a client, have it work with an authenticated memcached bucket on the cluster
          3) faillover a node by clicking on "failover" in the console
          4) add the node back by clicking on "add back"

          Is this correct?

          Show
          ingenthr Matt Ingenthron added a comment - I've spent a bit of time analyzing this issue, and it's not clear what the cause is. It is correct though that this would cause the auth thread to die, and as such authentication to the node would never complete. There is a safeguard already in that the continuous timeout threshold will kick in and then the connection will be rebuilt. I don't know if this issue comes up all of the time, but assuming it's a rare event we'd see 1000 operations timeout (by default) followed by the connection being rebuilt. We'd have to add some diagnostic information to the client and reliably reproduce this to identify the issue. I think the scenario is: 1) set up a cluster of say 3 nodes 2) configure a client, have it work with an authenticated memcached bucket on the cluster 3) faillover a node by clicking on "failover" in the console 4) add the node back by clicking on "add back" Is this correct?
          Hide
          perry Perry Krug added a comment -

          That appears correct. The customer has been able to reliably reproduce this, but since so much time has passed I would be hesitant in going back to them if not necessary...

          Show
          perry Perry Krug added a comment - That appears correct. The customer has been able to reliably reproduce this, but since so much time has passed I would be hesitant in going back to them if not necessary...
          rags Raghavan Srinivas (Inactive) made changes -
          Field Original Value New Value
          Status Open [ 1 ] Resolved [ 5 ]
          Fix Version/s 1.1beta [ 10370 ]
          Resolution Incomplete [ 4 ]
          rags Raghavan Srinivas (Inactive) made changes -
          Resolution Incomplete [ 4 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          ingenthr Matt Ingenthron made changes -
          Link This issue is duplicated by JCBC-120 [ JCBC-120 ]
          ingenthr Matt Ingenthron made changes -
          Link This issue is duplicated by JCBC-120 [ JCBC-120 ]
          daschl Michael Nitschinger made changes -
          Fix Version/s 1.0.4 [ 10364 ]
          rags Raghavan Srinivas (Inactive) made changes -
          Status Reopened [ 4 ] In Progress [ 3 ]
          ingenthr Matt Ingenthron made changes -
          Assignee Raghavan Srinivas [ rags ] Matt Ingenthron [ ingenthr ]
          daschl Michael Nitschinger made changes -
          Priority Major [ 3 ] Blocker [ 1 ]
          ingenthr Matt Ingenthron made changes -
          Link This issue depends on SPY-102 [ SPY-102 ]
          ingenthr Matt Ingenthron made changes -
          Status In Progress [ 3 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Hide
          ingenthr Matt Ingenthron added a comment -

          There is an open changeset for this. Please determine if it is correct, needs to go in.

          Show
          ingenthr Matt Ingenthron added a comment - There is an open changeset for this. Please determine if it is correct, needs to go in.
          ingenthr Matt Ingenthron made changes -
          Resolution Fixed [ 1 ]
          Status Resolved [ 5 ] Reopened [ 4 ]
          Assignee Matt Ingenthron [ ingenthr ] Michael Nitschinger [ daschl ]
          daschl Michael Nitschinger made changes -
          Fix Version/s 1.1.1 [ 10430 ]
          Fix Version/s 1.0.4 [ 10364 ]
          Fix Version/s 1.1-beta [ 10370 ]
          daschl Michael Nitschinger made changes -
          Fix Version/s 1.1.2 [ 10480 ]
          Fix Version/s 1.1.1 [ 10430 ]
          Hide
          daschl Michael Nitschinger added a comment -

          Duplicate of Spy-111

          Show
          daschl Michael Nitschinger added a comment - Duplicate of Spy-111
          daschl Michael Nitschinger made changes -
          Status Reopened [ 4 ] Resolved [ 5 ]
          Resolution Duplicate [ 3 ]
          daschl Michael Nitschinger made changes -
          Link This issue is duplicated by SPY-111 [ SPY-111 ]
          ingenthr Matt Ingenthron made changes -
          Workflow jira [ 18123 ] Couchbase SDK Workflow [ 38351 ]

            People

            • Assignee:
              daschl Michael Nitschinger
              Reporter:
              perry Perry Krug
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes