Details
-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
7.2.1
-
Untriaged
-
-
0
-
Unknown
Description
Test Steps:
- 15 scopes and 60 collections across 3 buckets
- Each bucket -> 10M data 1 index on each collection -> 60 indexes (each index ~400k docs)
- Sleep - 15 mins
- Run FTS Queries FTS Flex Queries Random fashion for 2 hrs
- Kill CBFT
- sleep for 15 mins
- Scale out to 5 nodes
- Create 10 more fts indexes with following config on default scope and default index (500k docs per index) :
- 1 Index | 1 Replica | 5 Partitions
- 3 Indexes | 2 Replicas | 6 Partitions
- 6 Indexes | 0 Replicas | 4 Partitions
- Sleep for 15 mins
- Run FTS Queries FTS Flex Queries Random fashion for 30 mins again
- Kill CBFT
- Sleep for 15 mins
- Rebalance/Scale in to 4 nodes
- Create 10 more fts indexes with following config :
-
- 1 Index | 1 Replica | 5 Partitions
-
- 3 Indexes | 2 Replicas | 6 Partitions
-
- 6 Indexes | 0 Replicas | 4 Partitions"
- Run FTS Queries FTS Flex Queries Random fashion for 30 mins again
- Kill memcached
- Sleep for 15 mins
- Scale in back to 3 nodes
Test Logs: http://qe-jenkins1.sc.couchbase.com/job/cp-cli-fts-system-test/7/console
Seeing that KV rebalance has hung.
Suspecting it to be a sizing issue as I am seeing default1 bucket to go to 0% RR but even less docs would cause the same, as filed in MB-58014.
On this bucket itself the ram used is 3.6GB/4GB making it > 90% of allocated memory, not sure why because all buckets have some size and no of data, but this hints towards undersized cluster.
I am also seeing node getting failed over in this process
Failed over ['ns_1@svc-dqs-node-004.lie3v0iv5ulitlp.sandbox.nonprod-project-avengers.com']: okfailover 000ns_1@svc-dqs-node-001.lie3v0iv5ulitlp.sandbox.nonprod-project-avengers.com8: |
13:29 PM 10 Aug, 2023 |
please check if it is actually a sizing problem or if something actually went wrong.