Details
-
Bug
-
Resolution: Fixed
-
Critical
-
Cheshire-Cat
-
Centos 7 64 bit; CB EE 7.0.0-5219
-
Untriaged
-
Centos 64-bit
-
-
1
-
Unknown
Description
Summary:
I had a 18 node cluster with close to 10K gsi indexes (almost 10 indexes per collection), and a total of 1K collections in one bucket.
After this I performed a quorum failover of 13 nodes, and then dropped all collections and flushed all items. But after this I saw a couple of indexes still remain. Trying to drop them gave error like "Keyspace not found". Trying to rebalance-out also failed.
Output from /getIndexStatus
{"indexes":[{"storageMode":"plasma","partitionMap":{"172.23.106.238:8091":[0]},"numPartition":1,"partitioned":false,"instId":1464317576398494102,"hosts":["172.23.106.238:8091"],"stale":false,"progress":100,"definition":"CREATE INDEX `gsi703` ON `09wpGw1pyr-1-589000`.`_default`.`GiBlm8Z_IQ3tkMs3E-1-610000`(`age`) WITH { \"defer_build\":true }","status":"Ready","collection":"GiBlm8Z_IQ3tkMs3E-1-610000","scope":"_default","bucket":"09wpGw1pyr-1-589000","numReplica":0,"lastScanTime":"NA","indexName":"gsi703","index":"gsi703","id":4423815996602674555},{"storageMode":"plasma","partitionMap":{"172.23.106.250:8091":[0]},"numPartition":1,"partitioned":false,"instId":13487405438025204320,"hosts":["172.23.106.250:8091"],"stale":false,"progress":100,"definition":"CREATE INDEX `gsi707` ON `09wpGw1pyr-1-589000`.`_default`.`GiBlm8Z_IQ3tkMs3E-1-610000`(`age`) WITH { \"defer_build\":true }","status":"Ready","collection":"GiBlm8Z_IQ3tkMs3E-1-610000","scope":"_default","bucket":"09wpGw1pyr-1-589000","numReplica":0,"lastScanTime":"NA","indexName":"gsi707","index":"gsi707","id":15628560032118528258}],"version":31485457,"warnings":[]}
|
Timeline
1. Create a 17 node cluster
2021-05-24 09:38:51,283 | test | INFO | pool-2-thread-21 | [table_view:display:72] Rebalance Overview
|
+----------------+-----------+-----------------------+----------------+--------------+
|
| Nodes | Services | Version | CPU | Status |
|
+----------------+-----------+-----------------------+----------------+--------------+
|
| 172.23.105.175 | kv | 7.0.0-5219-enterprise | 0.300827275006 | Cluster node |
|
| 172.23.106.233 | ['kv'] | | | <--- IN --- |
|
| 172.23.106.236 | ['n1ql'] | | | <--- IN --- |
|
| 172.23.106.238 | ['index'] | | | <--- IN --- |
|
| 172.23.106.250 | ['index'] | | | <--- IN --- |
|
| 172.23.106.251 | ['index'] | | | <--- IN --- |
|
| 172.23.121.74 | ['index'] | | | <--- IN --- |
|
| 172.23.121.78 | ['index'] | | | <--- IN --- |
|
| 172.23.107.43 | ['index'] | | | <--- IN --- |
|
| 172.23.107.58 | ['index'] | | | <--- IN --- |
|
| 172.23.107.44 | ['index'] | | | <--- IN --- |
|
| 172.23.107.45 | ['index'] | | | <--- IN --- |
|
| 172.23.107.54 | ['index'] | | | <--- IN --- |
|
| 172.23.107.47 | ['index'] | | | <--- IN --- |
|
| 172.23.107.78 | ['index'] | | | <--- IN --- |
|
| 172.23.107.84 | ['index'] | | | <--- IN --- |
|
| 172.23.107.85 | ['index'] | | | <--- IN --- |
|
+----------------+-----------+-----------------------+----------------+--------------+
|
2. Create 10K indexes and build them.
2021-05-24 23:15:40,164 | test | INFO | MainThread | [Metadata:build_deferred_indexes:159] online indexes count: 10000
3. Rebalance-in a index node: 172.23.107.88 at
2021-05-24 23:17:57
4. Stop server on these nodes and quorum-failover them
nodes = ["172.23.106.251", "172.23.107.43", "172.23.107.44", "172.23.107.45", "172.23.107.47", "172.23.107.54",nodes = ["172.23.106.251", "172.23.107.43", "172.23.107.44", "172.23.107.45", "172.23.107.47", "172.23.107.54", "172.23.107.58", "172.23.107.78", "172.23.107.84", "172.23.107.85", "172.23.107.88", "172.23.121.74", "172.23.121.78"]
|
this happens at 12:38:51 AM 25 May, 2021 from Ui logs (refer screenshot)
5. Now drop all collections and flush the bucket.
check the Indexes page and observed that 2 indexes were not cleaned-up.
RAM used by indexer at this point: 1009MiB
Attaching two sets of logs: one from the resulting cluster, and the other set was from the nodes that were quorum-failovered and removed and their configs wiped