Uploaded image for project: 'Couchbase .NET client library'
  1. Couchbase .NET client library
  2. NCBC-1502

Fatal error: unexpected end of JSON input

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.5.2
    • Component/s: None
    • Labels:
      None

      Description

      I'm using current top of the tree of couchbase-net-client (e205bc140519d117cbacb93ef563e1fe4248545b)

      With following 2 node cluster

      node 1 server version 4.5 GA : kv

      node 2 server version 4.5 GA : kv,index,n1ql

      and extra 2 node cluster with 5.0.0-3217 to be swap rebalanced.

       

      Test was started with 2 node cluster then follow this

      1. swap rebalance kv node by removing 4.5 and adding 5.0
      2. add 5.0 and rebalance then copy index with different index name ( at this point, we have two index with same index value)
      3. remove 4.5GA and rebalance (at this point, we have completed upgrade to 5.0 so we have just 2 node cluster of 5.0)

       

      Expected: no errors

      Actual: Fatal error continues even after upgrade was completed. Success also continues but total success is smaller than before upgrade.

       

      client log is attached

       

       

        Attachments

        1. log.zip
          164 kB
        2. log2.zip
          129 kB
        3. log.zip
          815 kB
        4. log3.zip
          82 kB
        5. net_fatal.jpg
          net_fatal.jpg
          960 kB
        6. repro.sh
          1 kB

          Issue Links

          For Gerrit Dashboard: NCBC-1502
          # Subject Branch Project Status CR V

            Activity

            Hide
            jaekwon.park Jae Park [X] (Inactive) added a comment - - edited

            Matt Ingenthron yes, I kept the header and just hand modified the request body. Also, found a bug in SDKD-Java that it does not run prepared statement test due to this code.

            https://github.com/couchbase/sdkd-java/blob/master/src/main/java/com/couchbase/sdkd/cbclient/N1QLQueryCommandContext.java#L259

            n1qlparams.adhoc(true) should be n1qlparams.adhoc(false)

             

            Some corrections : LCB, Java works but .NET doesn't.

            I did some more test with Query QE and it turned out prepared statement was copied to Spock when I tested with Java and LCB but with .NET, prepared statement was empty.

            Sounds weird but I assume Java SDK and LCB does some magic to copy the prepared statement which .Net SDK doesn't do?

            Show
            jaekwon.park Jae Park [X] (Inactive) added a comment - - edited Matt Ingenthron yes, I kept the header and just hand modified the request body. Also, found a bug in SDKD-Java that it does not run prepared statement test due to this code. https://github.com/couchbase/sdkd-java/blob/master/src/main/java/com/couchbase/sdkd/cbclient/N1QLQueryCommandContext.java#L259 n1qlparams.adhoc(true) should be n1qlparams.adhoc(false)   Some corrections : LCB, Java works but .NET doesn't. I did some more test with Query QE and it turned out prepared statement was copied to Spock when I tested with Java and LCB but with .NET, prepared statement was empty. Sounds weird but I assume Java SDK and LCB does some magic to copy the prepared statement which .Net SDK doesn't do?
            Hide
            ingenthr Matt Ingenthron added a comment -

            That could be the case Jae Park [X].

            Jeff Morris: I looked at the original plan ( https://docs.google.com/document/d/1h89KV1GD7kNLsMqQjf1KqKRs7plsVVcuj7SpYq6KAsk/edit ) and it does call for re-prepare if the cluster returns code 4070.  It looks like .NET is parsing strings on 4070, but it's not clear why.  Can you have a look at the original plan and impl and see if we need to fix something?

            Show
            ingenthr Matt Ingenthron added a comment - That could be the case Jae Park [X] . Jeff Morris : I looked at the original plan (  https://docs.google.com/document/d/1h89KV1GD7kNLsMqQjf1KqKRs7plsVVcuj7SpYq6KAsk/edit  ) and it does call for re-prepare if the cluster returns code 4070.  It looks like .NET is parsing strings on 4070, but it's not clear why.  Can you have a look at the original plan and impl and see if we need to fix something?
            Hide
            jmorris Jeff Morris added a comment - - edited

            The .NET does retry in the case of 4070 (and the other cases outlined in the specs above). The retry is done one time in the QueryClient as opposed to letting the RequestExecutor do the retry like all other retries. I think it should simply be flagged for retry and them the cache purged of the original prepared statement in the QueryClient and then use the standard retry loop from the CouchbaseRequestExecutor. This would allow the plan to be retried until the operation is timed out. 

            https://github.com/couchbase/couchbase-net-client/commit/96be2380924b9ae1d02228d7ed2362a4377c39c9

            Show
            jmorris Jeff Morris added a comment - - edited The .NET does retry in the case of 4070 (and the other cases outlined in the specs above). The retry is done one time in the QueryClient as opposed to letting the RequestExecutor do the retry like all other retries. I think it should simply be flagged for retry and them the cache purged of the original prepared statement in the QueryClient and then use the standard retry loop from the CouchbaseRequestExecutor. This would allow the plan to be retried until the operation is timed out.  https://github.com/couchbase/couchbase-net-client/commit/96be2380924b9ae1d02228d7ed2362a4377c39c9
            Hide
            jmorris Jeff Morris added a comment -

            Changing this to a non-blocker; likely release note for 2.5.0 and fix in a later version.

            Show
            jmorris Jeff Morris added a comment - Changing this to a non-blocker; likely release note for 2.5.0 and fix in a later version.
            Hide
            mike.goldsmith Michael Goldsmith added a comment -

            Reopening because it caused NCBC1547. Moving target fix to 2.5.2.

            Show
            mike.goldsmith Michael Goldsmith added a comment - Reopening because it caused NCBC1547. Moving target fix to 2.5.2.

              People

              • Assignee:
                jmorris Jeff Morris
                Reporter:
                jaekwon.park Jae Park [X] (Inactive)
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty

                    Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.