Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-46751

MonitorKeyspace - Indexes did not get cleaned up after quorum_failover followed by dropping all collections

    XMLWordPrintable

Details

    Description

      Summary:
      I had a 18 node cluster with close to 10K gsi indexes (almost 10 indexes per collection), and a total of 1K collections in one bucket. 
      After this I performed a quorum failover of 13 nodes, and then dropped all collections and flushed all items. But after this I saw a couple of indexes still remain. Trying to drop them gave error like "Keyspace not found". Trying to rebalance-out also failed.

      Output from /getIndexStatus

      {"indexes":[{"storageMode":"plasma","partitionMap":{"172.23.106.238:8091":[0]},"numPartition":1,"partitioned":false,"instId":1464317576398494102,"hosts":["172.23.106.238:8091"],"stale":false,"progress":100,"definition":"CREATE INDEX `gsi703` ON `09wpGw1pyr-1-589000`.`_default`.`GiBlm8Z_IQ3tkMs3E-1-610000`(`age`) WITH {  \"defer_build\":true }","status":"Ready","collection":"GiBlm8Z_IQ3tkMs3E-1-610000","scope":"_default","bucket":"09wpGw1pyr-1-589000","numReplica":0,"lastScanTime":"NA","indexName":"gsi703","index":"gsi703","id":4423815996602674555},{"storageMode":"plasma","partitionMap":{"172.23.106.250:8091":[0]},"numPartition":1,"partitioned":false,"instId":13487405438025204320,"hosts":["172.23.106.250:8091"],"stale":false,"progress":100,"definition":"CREATE INDEX `gsi707` ON `09wpGw1pyr-1-589000`.`_default`.`GiBlm8Z_IQ3tkMs3E-1-610000`(`age`) WITH {  \"defer_build\":true }","status":"Ready","collection":"GiBlm8Z_IQ3tkMs3E-1-610000","scope":"_default","bucket":"09wpGw1pyr-1-589000","numReplica":0,"lastScanTime":"NA","indexName":"gsi707","index":"gsi707","id":15628560032118528258}],"version":31485457,"warnings":[]}

      Timeline
      1. Create a 17 node cluster

      2021-05-24 09:38:51,283 | test  | INFO    | pool-2-thread-21 | [table_view:display:72] Rebalance Overview
      +----------------+-----------+-----------------------+----------------+--------------+
      | Nodes          | Services  | Version               | CPU            | Status       |
      +----------------+-----------+-----------------------+----------------+--------------+
      | 172.23.105.175 | kv        | 7.0.0-5219-enterprise | 0.300827275006 | Cluster node |
      | 172.23.106.233 | ['kv']    |                       |                | <--- IN ---  |
      | 172.23.106.236 | ['n1ql']  |                       |                | <--- IN ---  |
      | 172.23.106.238 | ['index'] |                       |                | <--- IN ---  |
      | 172.23.106.250 | ['index'] |                       |                | <--- IN ---  |
      | 172.23.106.251 | ['index'] |                       |                | <--- IN ---  |
      | 172.23.121.74  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.121.78  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.43  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.58  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.44  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.45  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.54  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.47  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.78  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.84  | ['index'] |                       |                | <--- IN ---  |
      | 172.23.107.85  | ['index'] |                       |                | <--- IN ---  |
      +----------------+-----------+-----------------------+----------------+--------------+

      2. Create 10K indexes and build them. 
      2021-05-24 23:15:40,164 | test | INFO | MainThread | [Metadata:build_deferred_indexes:159] online indexes count: 10000
      3. Rebalance-in a index node: 172.23.107.88 at
      2021-05-24 23:17:57

      4.  Stop server on these nodes and quorum-failover them

      nodes = ["172.23.106.251", "172.23.107.43", "172.23.107.44", "172.23.107.45", "172.23.107.47", "172.23.107.54",nodes = ["172.23.106.251", "172.23.107.43", "172.23.107.44", "172.23.107.45", "172.23.107.47", "172.23.107.54", "172.23.107.58", "172.23.107.78", "172.23.107.84", "172.23.107.85", "172.23.107.88", "172.23.121.74", "172.23.121.78"]

      this happens at 12:38:51 AM   25 May, 2021 from Ui logs (refer screenshot)
      5. Now drop all collections and flush the bucket. 
      check the Indexes page and observed that 2 indexes were not cleaned-up.
      RAM used by indexer at this point: 1009MiB

      Attaching two sets of logs: one from the resulting cluster, and the other set was from the nodes that were quorum-failovered and removed and their configs wiped

      Attachments

        Issue Links

          Activity

            People

              sumedh.basarkod Sumedh Basarkod (Inactive)
              jeelan.poola Jeelan Poola
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                PagerDuty