Uploaded image for project: 'Couchbase PHP client library'
  1. Couchbase PHP client library
  2. PCBC-518

Operations can return null if a request is terminated within the SDK

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.4.2
    • Fix Version/s: 2.4.3
    • Component/s: library
    • Security Level: Public
    • Labels:
      None

      Description

      Issue
      It appears that under certain circumstances the PHP SDK can begin returning null for operations (e.g. get, upsert).

      This circumstance is where a PHP request is prematurely terminated while performing an operation within the SDK, at least when running under Apache, I haven't tested with php-fpm or nginx.

      Reproduction
      This can be replicated very easily using the following PHP script (modified version of 'Hello Couchbase'):

      <?php
      // Establish username and password for bucket-access
      $authenticator = new \Couchbase\PasswordAuthenticator();
      $authenticator->username(getenv('USER'))->password(getenv('PASS'));
       
      // Connect to Couchbase Server
      $connStr = "couchbase://" . getenv('HOST') . "/";
      $cluster = new CouchbaseCluster($connStr);
       
      // Authenticate, then open bucket
      $cluster->authenticate($authenticator);
      $bucket = $cluster->openBucket(getenv('BUCKET'));
       
      // Store a document
      $result = $bucket->upsert('u:king_arthur', array(
          "email" => "kingarthur@couchbase.com",
          "interests" => array("African Swallows")
      ));
       
      // Retrieve a document
      while(true){
      	$result = $bucket->get("u:king_arthur");
      	if ($result == null){
      		echo "I got null when I shouldn't";
      		break;
      	}	
      }
      

      The idea of this script is to trigger the max_execution_time by looping on the get endlessly which will then terminate the request.
      There is then also special logic to check if the result is ever null and if so report it to the user.

      If you request the page and then a few seconds later request it again (in a different tab), after a while the max execution time of the first page will be hit and then the second page will display the null message.

      To make it easy to reproduce and see the results of this, I have created a docker image which runs the script above within Apache.

      You can run it as follows, obviously you must have a reachable CB instance running:

      docker run -p 8080:80 -e HOST=<your host> -e BUCKET=<your bucket> -e USER=<your user> -e PASS=<your password> mattcarabine/pcbc-518
      

      You can then access the page at http://localhost:8080/index.php

      Summary
      I suspect what's happening here is that the underlying (shared) libcouchbase object somehow becomes disposed/corrupted as a result of the first request terminating prematurely and then the other request sharing it can no longer use it (hence return null).
      If this is the case then I'm not sure what the best way to handle it would be, but at the very least I would expect an explicit error message get returned to the user, rather than failing 'silently' by returning null.

        Attachments

        For Gerrit Dashboard: PCBC-518
        # Subject Branch Project Status CR V

          Activity

          Hide
          avsej Sergey Avseyev added a comment -

          What does mean "terminated within the SDK"?

          Show
          avsej Sergey Avseyev added a comment - What does mean "terminated within the SDK"?
          Hide
          matt.carabine Matt Carabine added a comment -

          Sorry Sergey Avseyev I was still in the process of writing the bug report - assigning it to you now!

          What I mean is if the request is terminated by the web-server while it is carrying out a CRUD operation inside of the SDK (as opposed to within the user's code not using the SDK).

          Show
          matt.carabine Matt Carabine added a comment - Sorry Sergey Avseyev I was still in the process of writing the bug report - assigning it to you now! What I mean is if the request is terminated by the web-server while it is carrying out a CRUD operation inside of the SDK (as opposed to within the user's code not using the SDK).
          Hide
          avsej Sergey Avseyev added a comment -

          I see, I will take a look at it.

          Show
          avsej Sergey Avseyev added a comment - I see, I will take a look at it.
          Hide
          avsej Sergey Avseyev added a comment -

          This is definitely SDK error, and I'm working on the fix.

          Show
          avsej Sergey Avseyev added a comment - This is definitely SDK error, and I'm working on the fix.
          Hide
          avsej Sergey Avseyev added a comment -

          Matt Carabine, I've uploaded patch with the fix to gerrit http://review.couchbase.org/87388. You were right, the problem was that max_execution_time timeout left connection instance in bad state, and the next request picks up it from persistent cache, and the library immediately return once you try to make any request as it thinks that we are in the event loop (but we are not).

          My patch detects these kind of connections, destroy them, and then create fresh one.

          One small note to your docker: you install libev plugin, but do not activate it using LCB_IOPS_NAME=libev environment variable, so it falls back to select, which is not the best IO plugin for production.

          Show
          avsej Sergey Avseyev added a comment - Matt Carabine , I've uploaded patch with the fix to gerrit http://review.couchbase.org/87388 . You were right, the problem was that max_execution_time timeout left connection instance in bad state, and the next request picks up it from persistent cache, and the library immediately return once you try to make any request as it thinks that we are in the event loop (but we are not). My patch detects these kind of connections, destroy them, and then create fresh one. One small note to your docker: you install libev plugin, but do not activate it using LCB_IOPS_NAME=libev environment variable, so it falls back to select, which is not the best IO plugin for production.

            People

            • Assignee:
              avsej Sergey Avseyev
              Reporter:
              matt.carabine Matt Carabine
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Gerrit Reviews

                There are no open Gerrit changes

                  PagerDuty

                  Error rendering 'com.pagerduty.jira-server-plugin:PagerDuty'. Please contact your Jira administrators.