Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31805

Index information is not updated on UI when some of the index nodes are not responsive

    XMLWordPrintable

Details

    • Untriaged
    • Unknown

    Description

      Steps to repro:

      a. Setup a cluster with 2 KV+Query and 3 indexer nodes

      b. Load sample buckets and create indexes

      c. Stop one of the indexer service

      d. Create a new index

       

      The newly created index is not seen on the UI. The issue can also be seen with the following steps:

      a. Setup a cluster with 2 KV+Query and 3 indexer nodes

      b. Load sample buckets and create indexes

      c. Stop all the 3 indexer nodes

      d. Bring back 2 of the indexer nodes

      e. The UI does not show any indexes

       

      The reason behind such an inconsistency is because, the "getIndexStatus" request will collect index information from all the nodes and if there is a failed node in the quorum, it will return the HTTP response as "500-Internal server error" along with the list of indexes from active nodes. Also the "code" field in response JSON set to "error". Because of this, the UI is not updating the information although response contains indexes from active indexer nodes.

       

      Ideally, the expectation is that UI shows the indexes that are currently available and also show a warning like "the index information shown is not complete as some index nodes are not responsive"

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-31805
          # Subject Branch Project Status CR V

          Activity

            Build couchbase-server-6.5.0-3814 contains ns_server commit 956fbf9 with commit message:
            MB-31805: add "stale" property of the index item

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-3814 contains ns_server commit 956fbf9 with commit message: MB-31805 : add "stale" property of the index item

            Build couchbase-server-6.5.0-3814 contains ns_server commit da5641d with commit message:
            MB-31805: added "stale" label to the index row

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-3814 contains ns_server commit da5641d with commit message: MB-31805 : added "stale" label to the index row

            Added "stale" label next to index name

            pavel Pavel Blagodov added a comment - Added "stale" label next to index name

            Build couchbase-server-6.5.0-2924 contains indexing commit 5679275 with commit message:
            MB-31805: Cache LocalIndexMetadata and stats locally

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-2924 contains indexing commit 5679275 with commit message: MB-31805 : Cache LocalIndexMetadata and stats locally

            Build couchbase-server-6.5.0-1970 contains ns_server commit ecabef9 with commit message:
            MB-31805: Add node UUID to /pools/default output.

            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.0-1970 contains ns_server commit ecabef9 with commit message: MB-31805 : Add node UUID to /pools/default output.

            When there are multiple indexer nodes in the quorum, as per the current behaviour, ns_server would make a getIndexStatus to one of the nodes (say leader node). The leader node would collect the local meta data information from all the other indexer nodes in the cluster, consolidate the response and then send the response to ns_server. This response will be processed and shown on the UI.

            If the indexer process on any of the nodes is unavailable, then currently we show cached information of indexes i.e. the index list when all the nodes were active. This means that if we create any new indexes in this state, this will not be reflected on the UI. Also, If all the nodes go down and  some of them come back again, then we do not show any indexes on the UI.

            This is happening because on seeing a failed indexer node, we send a "500-Internal server error" to ns_server which will then ignore the index list and show the cached information. This behaviour is changed so that we send 200-Ok to ns_server even if any of the indexer nodes are not available. With this solution, all the indexes that are available on the active nodes will be shown on the UI.

            However, the indexes on failed nodes will not be shown in the UI. To address this problem, the plan is to persist the index instance to Metakv whenever an instance is created (and delete it from metakv when ever an index is dropped). The leader node on observing that some failed nodes exist in cluster, will retrieve the index information from metakv. 

             

            varun.velamuri Varun Velamuri added a comment - When there are multiple indexer nodes in the quorum, as per the current behaviour, ns_server would make a getIndexStatus to one of the nodes (say leader node). The leader node would collect the local meta data information from all the other indexer nodes in the cluster, consolidate the response and then send the response to ns_server. This response will be processed and shown on the UI. If the indexer process on any of the nodes is unavailable, then currently we show cached information of indexes i.e. the index list when all the nodes were active. This means that if we create any new indexes in this state, this will not be reflected on the UI. Also, If all the nodes go down and  some of them come back again, then we do not show any indexes on the UI. This is happening because on seeing a failed indexer node, we send a "500-Internal server error" to ns_server which will then ignore the index list and show the cached information. This behaviour is changed so that we send 200-Ok to ns_server even if any of the indexer nodes are not available. With this solution, all the indexes that are available on the active nodes will be shown on the UI. However, the indexes on failed nodes will not be shown in the UI. To address this problem, the plan is to persist the index instance to Metakv whenever an instance is created (and delete it from metakv when ever an index is dropped). The leader node on observing that some failed nodes exist in cluster, will retrieve the index information from metakv.   

            The expectation is as below.

            1. Available indexes from active nodes in the http response must be refreshed and shown on the UI.
            2. A Warning must be shown stating some index nodes are down and the list of indexes is not complete.

            We could possibly do it in 2 ways,

            1. Keep the current getIndexStatus() response as it is today. UI will consume the index list even though http response code in 500-Internal error and code is error. And refresh the indexes in UI.
            2. Change the getIndexStatus() response to return 200-Ok along with code as error. UI will refresh the index list and show the warning as code is set to error.

            Requesting Rob Ashcom for inputs on the fix approaches above. 

            One long term solution that can always refresh complete index data on UI even when some indexer nodes are down is to write index data to metakv and read from it for not-reachable/all nodes during getIndexStatus() processing. 

            jeelan.poola Jeelan Poola added a comment - The expectation is as below. Available indexes from active nodes in the http response must be refreshed and shown on the UI. A Warning must be shown stating some index nodes are down and the list of indexes is not complete. We could possibly do it in 2 ways, Keep the current getIndexStatus() response as it is today. UI will consume the index list even though http response code in 500-Internal error and code is error. And refresh the indexes in UI. Change the getIndexStatus() response to return 200-Ok along with code as error. UI will refresh the index list and show the warning as code is set to error. Requesting Rob Ashcom for inputs on the fix approaches above.  One long term solution that can always refresh complete index data on UI even when some indexer nodes are down is to write index data to metakv and read from it for not-reachable/all nodes during getIndexStatus() processing. 

            People

              pavel Pavel Blagodov
              varun.velamuri Varun Velamuri
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty