Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19358

Indexing service should return an error code as well as an error string

    XMLWordPrintable

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.5.0, 5.0.0, 5.1.0, 6.5.0, 6.0.0, 5.5.0, Cheshire-Cat, 6.6.0
    • feature-backlog
    • secondary-index

    Description

      When n1ql passes a request to the indexing service, and this fails, n1ql may have a need to return a specific error code.
      For instance - if an index we are trying to create already exists, we should be returning a 4300.
      We currently return 5000 (a general error) which is incorrect.
      In order to do that - the only choice currently open to n1ql is to scan the indexing error string for keywords, which is inelegant, and less than ideal.
      This could be solved if the indexing service could pass n1ql an integer error code alongside the error string.
      n1ql conversely would provide a new NewIndexingError() error method, which the indexing client could call, so that n1ql could set the error appropriately.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            Marco Greco, If indexer passes a code along with error string, is the change on n1ql end being targeted for vulcan? If yes, we can pass an integer along side error.

            prathibha Prathibha Bisarahalli (Inactive) added a comment - Marco Greco , If indexer passes a code along with error string, is the change on n1ql end being targeted for vulcan? If yes, we can pass an integer along side error.
            marco.greco Marco Greco added a comment -

            Prathibha Bisarahalli to be fair this was important, when it was logged, prior to the auto_reprepare feature that I just coded in Vulcan.
            Now the SDKs don't really have to parse error texts to decide if they have to take any action, so we don't really have to target this for Vulcan.
            I'm happy to have a new API or have an integer along side the error.
            You just decide what works best for you (new API / integer) and I'll code my side accordingly.

            marco.greco Marco Greco added a comment - Prathibha Bisarahalli to be fair this was important, when it was logged, prior to the auto_reprepare feature that I just coded in Vulcan. Now the SDKs don't really have to parse error texts to decide if they have to take any action, so we don't really have to target this for Vulcan. I'm happy to have a new API or have an integer along side the error. You just decide what works best for you (new API / integer) and I'll code my side accordingly.

            Hi Marco Greco,

            1. I do not have the context on the requirement that SDKs had. If there is any related ticket please do share.
            2. Pls also share how error codes map to error string. In many scenarios, gsi client is unaware of what the exact issue, in which case we have to respond with a generic error code. Example: Error from Count2 implementation of datastore.CountIndex2 might be coming from server. In some cases, errors are detected within gsi client in which case we can give proper error code. Example: Alter API of datastore.Index3 returns "GSI AlterIndex() Unsupported action value" which is clearly a "bad request".

            prathibha Prathibha Bisarahalli (Inactive) added a comment - Hi Marco Greco , 1. I do not have the context on the requirement that SDKs had. If there is any related ticket please do share. 2. Pls also share how error codes map to error string. In many scenarios, gsi client is unaware of what the exact issue, in which case we have to respond with a generic error code. Example: Error from Count2 implementation of datastore.CountIndex2 might be coming from server. In some cases, errors are detected within gsi client in which case we can give proper error code. Example: Alter API of datastore.Index3 returns "GSI AlterIndex() Unsupported action value" which is clearly a "bad request".

            Thanks Marco Greco for chasing down getting a stable interface here. Note that it's not just for SDKs N1QL is a public interface and you can't really program to something that may change behavior on you with different expectations at different times.

            While you're looking at this, you may want to look into some kind of error classification or bracketing. This is common with HTTP, of course. A programmer can look at a spec and know if an error is transient and retryable or fatal and there's no choice but to bubble it up the stack (as 5000 was originally intended, I believe).

            One source of minor inspiration here might be the KVErrorMap sdk-rfc. In that, we defined various behaviors defined by error codes and defined the contract between the two actors more clearly. This will help us in the future since a mixed cluster of old SDKs will be able to work with new server behaviors. 2i and query have this challenge too in mixed clusters. If you work out the contract, then 2i can change the strings and behaviors knowing that you won't have a set of upgrade bugs.

            ingenthr Matt Ingenthron added a comment - Thanks Marco Greco for chasing down getting a stable interface here. Note that it's not just for SDKs N1QL is a public interface and you can't really program to something that may change behavior on you with different expectations at different times. While you're looking at this, you may want to look into some kind of error classification or bracketing. This is common with HTTP, of course. A programmer can look at a spec and know if an error is transient and retryable or fatal and there's no choice but to bubble it up the stack (as 5000 was originally intended, I believe). One source of minor inspiration here might be the KVErrorMap sdk-rfc . In that, we defined various behaviors defined by error codes and defined the contract between the two actors more clearly. This will help us in the future since a mixed cluster of old SDKs will be able to work with new server behaviors. 2i and query have this challenge too in mixed clusters. If you work out the contract, then 2i can change the strings and behaviors knowing that you won't have a set of upgrade bugs.
            adamf Adam Fraser added a comment -

            Sync Gateway (as an SDK consumer) would benefit from this as well. In particular, we're currently having to do brittle string matching for the following errors, in order to properly respond to the 'will retry in background' handling provided by GSI:

            [5000] GSI CreateIndex() - cause: Encountered transient error. Index creation will be retried in background. Error: Index [indexname] will retry building in the background for reason: Bucket [bucketname] In Recovery.
            [5000] GSI Drop() - cause: Fail to drop index on some indexer nodes. Error=Encountered error when dropping index: Indexer In Recovery. Drop index will be retried in background.
            [5000] BuildIndexes - cause: Build index fails. %vIndex [indexname] will retry building in the background for reason: Build Already In Progress. Bucket [bucketname].

            adamf Adam Fraser added a comment - Sync Gateway (as an SDK consumer) would benefit from this as well. In particular, we're currently having to do brittle string matching for the following errors, in order to properly respond to the 'will retry in background' handling provided by GSI: [5000] GSI CreateIndex() - cause: Encountered transient error. Index creation will be retried in background. Error: Index [indexname] will retry building in the background for reason: Bucket [bucketname] In Recovery. [5000] GSI Drop() - cause: Fail to drop index on some indexer nodes. Error=Encountered error when dropping index: Indexer In Recovery. Drop index will be retried in background. [5000] BuildIndexes - cause: Build index fails. %vIndex [indexname] will retry building in the background for reason: Build Already In Progress. Bucket [bucketname] .
            adamf Adam Fraser added a comment -

            Sync Gateway seems to be hitting an issue that's related to this brittle string matching on error messages.  We were previously getting this error message when trying to build deferred indexes that are already building, and were string matching on "will retry building in the background":

            [5000] BuildIndexes - cause: Build index fails. %vIndex [indexname] will retry building in the background for reason: Build Already In Progress. Bucket [bucketname].

            Recent tests (using 6.5.1 I believe) are instead hitting:

            [5000] BuildIndexes - cause: Build index fails. Some index will be retried building in the background. For more details, please check index status.

            I'm modifying our code to attempt to match both error messages, but having an error code (or range) associated with operations that will be retried would allow consumer code like ours to be much more robust.

            adamf Adam Fraser added a comment - Sync Gateway seems to be hitting an issue that's related to this brittle string matching on error messages.  We were previously getting this error message when trying to build deferred indexes that are already building, and were string matching on "will retry building in the background": [5000]  BuildIndexes - cause: Build index fails. %vIndex  [indexname]  will retry building in the background for reason: Build Already In Progress. Bucket  [bucketname] . Recent tests (using 6.5.1 I believe) are instead hitting: [5000] BuildIndexes - cause: Build index fails. Some index will be retried building in the background. For more details, please check index status. I'm modifying our code to attempt to match both error messages, but having an error code (or range) associated with operations that will be retried would allow consumer code like ours to be much more robust.

            People

              jeelan.poola Jeelan Poola
              marco.greco Marco Greco
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty