Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.1.0
Affects Version/s: Cheshire-Cat
Component/s: secondary-index
Labels:
- approved-for-7.0.0
- volume-test
Environment:
Centos 7 64 bit; CB EE 7.0.0-5219

Triage:
Untriaged
Operating System:
Centos 64-bit
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
http://supportal.couchbase.com/snapshot/a613253edc267dcf88a7eae504ea7855::1
s3://cb-customers-secure/indexes_qf_drop_collections/2021-05-25/collectinfo-2021-05-25t084534-ns_1@172.23.105.175.zip
s3://cb-customers-secure/indexes_qf_drop_collections/2021-05-25/collectinfo-2021-05-25t084534-ns_1@172.23.106.233.zip
s3://cb-customers-secure/indexes_qf_drop_collections/2021-05-25/collectinfo-2021-05-25t084534-ns_1@172.23.106.236.zip
s3://cb-customers-secure/indexes_qf_drop_collections/2021-05-25/collectinfo-2021-05-25t084534-ns_1@172.23.106.238.zip
s3://cb-customers-secure/indexes_qf_drop_collections/2021-05-25/collectinfo-2021-05-25t084534-ns_1@172.23.106.250.zip

Nodes that were quorum-failovered and removed
http://supportal.couchbase.com/snapshot/445ab5396f04e0ea90142f510c4c31fa::0
s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.43.zip
s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.44.zip
s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.47.zip
s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.54.zip
s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.85.zip
s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.88.zip
s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.121.74.zip
s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.121.78.zip
s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.106.251.zip
s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.45.zip
s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.58.zip
s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.78.zip
s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.84.zip

Show
http://supportal.couchbase.com/snapshot/a613253edc267dcf88a7eae504ea7855::1 s3://cb-customers-secure/indexes_qf_drop_collections/2021-05-25/collectinfo-2021-05-25t084534-ns_1@172.23.105.175.zip s3://cb-customers-secure/indexes_qf_drop_collections/2021-05-25/collectinfo-2021-05-25t084534-ns_1@172.23.106.233.zip s3://cb-customers-secure/indexes_qf_drop_collections/2021-05-25/collectinfo-2021-05-25t084534-ns_1@172.23.106.236.zip s3://cb-customers-secure/indexes_qf_drop_collections/2021-05-25/collectinfo-2021-05-25t084534-ns_1@172.23.106.238.zip s3://cb-customers-secure/indexes_qf_drop_collections/2021-05-25/collectinfo-2021-05-25t084534-ns_1@172.23.106.250.zip Nodes that were quorum-failovered and removed http://supportal.couchbase.com/snapshot/445ab5396f04e0ea90142f510c4c31fa::0 s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.43.zip s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.44.zip s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.47.zip s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.54.zip s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.85.zip s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.88.zip s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.121.74.zip s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.121.78.zip s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.106.251.zip s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.45.zip s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.58.zip s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.78.zip s3://cb-customers-secure/nodes_that_were_qf/2021-05-25/collectinfo-2021-05-25t094255-ns_1@172.23.107.84.zip
Story Points:
1
Is this a Regression?:
Unknown

Description

Summary:
I had a 18 node cluster with close to 10K gsi indexes (almost 10 indexes per collection), and a total of 1K collections in one bucket.
After this I performed a quorum failover of 13 nodes, and then dropped all collections and flushed all items. But after this I saw a couple of indexes still remain. Trying to drop them gave error like "Keyspace not found". Trying to rebalance-out also failed.

Output from /getIndexStatus

{"indexes":[{"storageMode":"plasma","partitionMap":{"172.23.106.238:8091":[0]},"numPartition":1,"partitioned":false,"instId":1464317576398494102,"hosts":["172.23.106.238:8091"],"stale":false,"progress":100,"definition":"CREATE INDEX `gsi703` ON `09wpGw1pyr-1-589000`.`_default`.`GiBlm8Z_IQ3tkMs3E-1-610000`(`age`) WITH {  \"defer_build\":true }","status":"Ready","collection":"GiBlm8Z_IQ3tkMs3E-1-610000","scope":"_default","bucket":"09wpGw1pyr-1-589000","numReplica":0,"lastScanTime":"NA","indexName":"gsi703","index":"gsi703","id":4423815996602674555},{"storageMode":"plasma","partitionMap":{"172.23.106.250:8091":[0]},"numPartition":1,"partitioned":false,"instId":13487405438025204320,"hosts":["172.23.106.250:8091"],"stale":false,"progress":100,"definition":"CREATE INDEX `gsi707` ON `09wpGw1pyr-1-589000`.`_default`.`GiBlm8Z_IQ3tkMs3E-1-610000`(`age`) WITH {  \"defer_build\":true }","status":"Ready","collection":"GiBlm8Z_IQ3tkMs3E-1-610000","scope":"_default","bucket":"09wpGw1pyr-1-589000","numReplica":0,"lastScanTime":"NA","indexName":"gsi707","index":"gsi707","id":15628560032118528258}],"version":31485457,"warnings":[]}

Timeline
1. Create a 17 node cluster

2021-05-24 09:38:51,283 | test  | INFO    | pool-2-thread-21 | [table_view:display:72] Rebalance Overview

+----------------+-----------+-----------------------+----------------+--------------+

| Nodes          | Services  | Version               | CPU            | Status       |

+----------------+-----------+-----------------------+----------------+--------------+

| 172.23.105.175 | kv        | 7.0.0-5219-enterprise | 0.300827275006 | Cluster node |

| 172.23.106.233 | ['kv']    |                       |                | <--- IN ---  |

| 172.23.106.236 | ['n1ql']  |                       |                | <--- IN ---  |

| 172.23.106.238 | ['index'] |                       |                | <--- IN ---  |

| 172.23.106.250 | ['index'] |                       |                | <--- IN ---  |

| 172.23.106.251 | ['index'] |                       |                | <--- IN ---  |

| 172.23.121.74  | ['index'] |                       |                | <--- IN ---  |

| 172.23.121.78  | ['index'] |                       |                | <--- IN ---  |

| 172.23.107.43  | ['index'] |                       |                | <--- IN ---  |

| 172.23.107.58  | ['index'] |                       |                | <--- IN ---  |

| 172.23.107.44  | ['index'] |                       |                | <--- IN ---  |

| 172.23.107.45  | ['index'] |                       |                | <--- IN ---  |

| 172.23.107.54  | ['index'] |                       |                | <--- IN ---  |

| 172.23.107.47  | ['index'] |                       |                | <--- IN ---  |

| 172.23.107.78  | ['index'] |                       |                | <--- IN ---  |

| 172.23.107.84  | ['index'] |                       |                | <--- IN ---  |

| 172.23.107.85  | ['index'] |                       |                | <--- IN ---  |

+----------------+-----------+-----------------------+----------------+--------------+

2. Create 10K indexes and build them.
2021-05-24 23:15:40,164 | test | INFO | MainThread | [Metadata:build_deferred_indexes:159] online indexes count: 10000
3. Rebalance-in a index node: 172.23.107.88 at
2021-05-24 23:17:57

4. Stop server on these nodes and quorum-failover them

nodes = ["172.23.106.251", "172.23.107.43", "172.23.107.44", "172.23.107.45", "172.23.107.47", "172.23.107.54",nodes = ["172.23.106.251", "172.23.107.43", "172.23.107.44", "172.23.107.45", "172.23.107.47", "172.23.107.54", "172.23.107.58", "172.23.107.78", "172.23.107.84", "172.23.107.85", "172.23.107.88", "172.23.121.74", "172.23.121.78"]

this happens at 12:38:51 AM 25 May, 2021 from Ui logs (refer screenshot)
5. Now drop all collections and flush the bucket.
check the Indexes page and observed that 2 indexes were not cleaned-up.
RAM used by indexer at this point: 1009MiB

Attaching two sets of logs: one from the resulting cluster, and the other set was from the nodes that were quorum-failovered and removed and their configs wiped

Attachments

Issue Links

Clones

MB-46513 Indexes did not get cleaned up after quorum_failover followed by dropping all collections

Closed

is cloned by

MB-47046 [Backport to 7.0.2]- MonitorKeyspace - Indexes did not get cleaned up after quorum_failover followed by dropping all collections

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-46751
#	Subject	Branch	Project	Status	CR	V
155328,7	MB-46751: MonitorKeyspace - Indexes did not get cleaned up after quorum_failover followed by dropping all collections	unstable	indexing	Status: MERGED	+2	+1
158931,2	MB-47046: [Backport to 7.0.1]-MonitorKeyspace - Indexes did not get cleaned up after quorum_failover followed by dropping all collections	cheshire-cat	indexing	Status: MERGED	+2	+1

Activity

People

Assignee:: Sumedh Basarkod (Inactive)

Reporter:: Jeelan Poola

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Due:: 04/Jun/21

Created:: 04/Jun/21 2:10 AM

Updated:: 04/Feb/22 1:29 AM

Resolved:: 21/Jun/21 9:53 PM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

MB-46751: MonitorKeyspace - Indexes did not get cleaned up after quorum_failover followed by dropping all collections: Gerrit Review:

MB-47046: [Backport to 7.0.1]-MonitorKeyspace - Indexes did not get cleaned up after quorum_failover followed by dropping all collections: Gerrit Review:

Details

Description

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty