Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-30998

[FTS System test] rebalance failed with buckets_cleanup_failed error

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • 6.0.0
    • 6.0.0
    • ns_server

    Description

      Build
      6.0.0-1529

      Testcase
      ./sequoia -scope tests/fts/scope_component_fts.yml -test tests/fts/test_fts_alice_component.yml -provider file:centos_second_cluster.yml -version 6.0.0-1529
       
      Steps:
      1. Create a single node kv+fts (.206) cluster
      2. Create 2 buckets on it, load 10M docs
      3. Create 2 default indexes - scorch and up-side down on the cluster
      4. While indexing is on,
      add .207(kv), .209(kv+fts), .210(fts), .212(kv+fts)
       rebalance - goes through fine.
      5. Now create 2 more indexes - scorch and upside_down with custom mapping
      6. Add - .215 (fts), .216(kv+fts), .48(kv)
      Remove .209 and .212 added in step 4.
      Rebalance fails.
      7. We then failover .212(request times out) and then rebalance.
      Rebalance again fails with buckets_cleanup_failed error as shown below.

      Rebalance exited with reason {buckets_cleanup_failed,['ns_1@172.23.96.216']}
      ns_orchestrator 000
      ns_1@172.23.96.206
      1:41:49 PM   Tue Aug 21, 2018
      Failed to cleanup old buckets on some nodes: ['ns_1@172.23.96.216']
      ns_rebalancer 000
      ns_1@172.23.96.206
      1:41:49 PM   Tue Aug 21, 2018
      Node 'ns_1@172.23.96.206' saw that node 'ns_1@172.23.96.216' went down. Details: [{nodedown_reason,
      net_tick_timeout}]
      ns_node_disco 005
      ns_1@172.23.96.206
      1:41:49 PM   Tue Aug 21, 2018
      Node 'ns_1@172.23.96.209' saw that node 'ns_1@172.23.96.216' went down. Details: [{nodedown_reason,
      net_tick_timeout}]
      ns_node_disco 005
      ns_1@172.23.96.209
      1:41:47 PM   Tue Aug 21, 2018
      Node 'ns_1@172.23.96.207' saw that node 'ns_1@172.23.96.212' went down. Details: [{nodedown_reason,
      connection_closed}]
      ns_node_disco 005
      ns_1@172.23.96.207
      1:20:28 AM   Tue Aug 21, 2018
      Node 'ns_1@172.23.96.210' saw that node 'ns_1@172.23.96.212' went down. Details: [{nodedown_reason,
      connection_closed}]
      ns_node_disco 005
      ns_1@172.23.96.210
      1:20:28 AM   Tue Aug 21, 2018
      Node 'ns_1@172.23.96.209' saw that node 'ns_1@172.23.96.212' went down. Details: [{nodedown_reason,
      connection_closed}]
      ns_node_disco 005
      ns_1@172.23.96.209
      1:20:28 AM   Tue Aug 21, 2018
      Node 'ns_1@172.23.96.206' saw that node 'ns_1@172.23.96.212' went down. Details: [{nodedown_reason,
      connection_closed}]
      ns_node_disco 005
      ns_1@172.23.96.206
      1:20:28 AM   Tue Aug 21, 2018
      Node 'ns_1@172.23.96.215' saw that node 'ns_1@172.23.96.212' went down. Details: [{nodedown_reason,
      connection_closed}]
      ns_node_disco 005
      ns_1@172.23.96.215
      1:20:28 AM   Tue Aug 21, 2018
      Node 'ns_1@172.23.96.48' saw that node 'ns_1@172.23.96.212' went down. Details: [{nodedown_reason,
      connection_closed}]
      ns_node_disco 005
      ns_1@172.23.96.48
      1:20:28 AM   Tue Aug 21, 2018
      Deleting old data files of bucket "default"
      ns_storage_conf 000
      ns_1@172.23.96.209
      1:20:26 AM   Tue Aug 21, 2018
      Deleting old data files of bucket "other"
      ns_storage_conf 000
      ns_1@172.23.96.209
      1:20:26 AM   Tue Aug 21, 2018
      Node 'ns_1@172.23.96.212' is leaving cluster.
      ns_cluster 001
      ns_1@172.23.96.212
      1:20:26 AM   Tue Aug 21, 2018
      Starting rebalance, KeepNodes = ['ns_1@172.23.96.206','ns_1@172.23.96.207',
      'ns_1@172.23.96.209','ns_1@172.23.96.210',
      'ns_1@172.23.96.215','ns_1@172.23.96.216',
      'ns_1@172.23.96.48'], EjectNodes = [], Failed over and being ejected nodes = ['ns_1@172.23.96.212']; no delta recovery nodes
      

      Also see some errors from Janitor on .216

      Janitor cleanup of "default" failed after failover of ['ns_1@172.23.96.212']: {error,
      {badmatch,
      {error,
      {failed_nodes,
      ['ns_1@172.23.96.216']}}}}
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            apiravi Aruna Piravi (Inactive)
            apiravi Aruna Piravi (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty