Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43921

[BP 6.5.2] - listReplicaCount is taking more than 10Sec and timing out

    XMLWordPrintable

    Details

    • Story Points:
      1

      Description

      We see that listReplicaCount gets timed out as below
      2020-10-08T15:20:31.868+00:00 [Verbose] GetWithAuth: <url to get listReplicaCount> elapsed 10.000339766s
      2020-10-08T15:20:31.868+00:00 [Error] Planner::getIndexNumReplica: Error from reading index num replica for node <removed>. Error = Get <url to get listReplicaCount> net/http: request canceled (Client.Timeout exceeded while awaiting headers)

      This call is made by Index Planner to get optimal load distribution.
      Tried reproducing this issue locally and looks like the http calls to MetaKv to check for DDL tokens is causing this delay.

        Attachments

          Issue Links

          For Gerrit Dashboard: MB-43921
          # Subject Branch Project Status CR V

            Activity

            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.5.2-6625 contains indexing commit e7559c2 with commit message:
            MB-43921 - [BP] Reduce time taken for listReplicaCount

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.2-6625 contains indexing commit e7559c2 with commit message: MB-43921 - [BP] Reduce time taken for listReplicaCount
            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.5.2-6625 contains indexing commit 8db3659 with commit message:
            MB-43921 : [BP] Make Timeout for GetWithCbauth Configurable.

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.5.2-6625 contains indexing commit 8db3659 with commit message: MB-43921 : [BP] Make Timeout for GetWithCbauth Configurable.
            Hide
            girish.benakappa Girish Benakappa added a comment -

            Verified below scenarios with 6.5.2-6625

            • Cluster with 3 kv, 2 index+n1ql, 4 search nodes
            • 6 bkts with 5000 docs each
            • Built 200 GSI indexes with replica 1 (50 indexes on 4 buckets)
            • Created 30 fts custom indexes (10 indexes on 3 buckets), just to add more entries to metakv
            • Create and Drop 100 gsi indexes sequentially on 4 buckets ( so this would be adding more entries of create/drop of 400 indexes)
            • Creation of indexes initially faster in milli sec but eventually it got slower taking 30 – 40 sec even with defer_build = true
            • So this took around 4 – 5 hrs
            • Now, did a rebalance, adding a search node

            Additional testing with clusterops of FTS and GSI nodes - Failover, rebalance in/out and swap rebalance.

            Show
            girish.benakappa Girish Benakappa added a comment - Verified below scenarios with 6.5.2-6625 Cluster with 3 kv, 2 index+n1ql, 4 search nodes 6 bkts with 5000 docs each Built 200 GSI indexes with replica 1 (50 indexes on 4 buckets) Created 30 fts custom indexes (10 indexes on 3 buckets), just to add more entries to metakv Create and Drop 100 gsi indexes sequentially on 4 buckets ( so this would be adding more entries of create/drop of 400 indexes) Creation of indexes initially faster in milli sec but eventually it got slower taking 30 – 40 sec even with defer_build = true So this took around 4 – 5 hrs Now, did a rebalance, adding a search node Additional testing with clusterops of FTS and GSI nodes - Failover, rebalance in/out and swap rebalance.

              People

              Assignee:
              girish.benakappa Girish Benakappa
              Reporter:
              jeelan.poola Jeelan Poola
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty