Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-18517

[FTS] Indexing stuck at 99.55% following failover(hard) and add back in full recovery (of the only fts node)

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 4.5.0
    • 4.5.0
    • cbft

    Description

      Build
      4.5.0-1666

      Testcase(to be merged)
      ./testrunner -i INI_FILE.ini -p skip-cleanup=True,stop-on-failure=False -t fts.moving_topology_fts.MovingTopFTS.hard_failover_and_full_recovery_during_index_building,items=30000,cluster=D,D+F,GROUP=P0

      I've run this test at least 3 times yesterday and I'm seeing this problem for the first time.

      Steps
      1. Cluster: D, D+F [1 kv, 1 kv+fts node]
      2. Create a default index, while it is building, hard-failover the D+F node.
      3. Immediately add back D+F in full recovery mode. Rebalance completes successfully.
      4. Index building progresses, only to get stuck at 99%.

      Live cluster - http://172.23.106.175:8091/ui/index.html#/fts_view/default_index_1

      2016-03-03 10:38:13 | INFO | MainProcess | test_thread | [fts_base.create] Creating fulltext-index default_index_1 on 172.23.106.175
      2016-03-03 10:38:13 | INFO | MainProcess | test_thread | [rest_client.create_fts_index] {"params": {}, "name": "default_index_1", "planParams": {"hierarchyRules": null, "nodePlanParams": null, "numReplicas": 0, "planFrozen": false, "maxPartitionsPerPIndex": 32}, "sourceName": "default", "sourceUUID": "", "sourceType": "couchbase", "type": "fulltext-index", "sourceParams": {"authUser": "default", "dataManagerSleepMaxMS": 20000, "authSaslUser": "", "clusterManagerSleepMaxMS": 20000, "authSaslPassword": "", "clusterManagerSleepInitMS": 0, "dataManagerBackoffFactor": 0, "authPassword": "", "dataManagerSleepInitMS": 0, "feedBufferAckThreshold": 0, "feedBufferSizeBytes": 0, "clusterManagerBackoffFactor": 0}, "uuid": ""}
      2016-03-03 10:38:13 | INFO | MainProcess | test_thread | [rest_client.create_fts_index] Index default_index_1 created
      2016-03-03 10:38:13 | INFO | MainProcess | test_thread | [fts_base.sleep] sleep for 5 secs. No pindexes found, waiting for index to get created ...
      2016-03-03 10:38:18 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: Number of PIndexes = 32
      2016-03-03 10:38:18 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: Every pIndex serves 32 partitions or lesser
      2016-03-03 10:38:18 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: pIndexes are distributed across [u'e730424ee9bf53090dee6cc35ed1e032'] 
      2016-03-03 10:38:18 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Expecting num of partitions in each node in range 992-1056
      2016-03-03 10:38:18 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: Node e730424ee9bf53090dee6cc35ed1e032 houses 32 pindexes which serve 1024 partitions
      2016-03-03 10:38:18 | INFO | MainProcess | test_thread | [fts_base.sleep] sleep for 10 secs.  ...
      2016-03-03 10:38:28 | INFO | MainProcess | test_thread | [moving_topology_fts.graceful_failover_and_full_recovery_during_index_building] Index building has begun...
      2016-03-03 10:38:29 | INFO | MainProcess | test_thread | [moving_topology_fts.graceful_failover_and_full_recovery_during_index_building] Index count for default_index_1: 3118
      2016-03-03 10:38:29 | INFO | MainProcess | test_thread | [fts_base.__async_failover] Starting failover for nodes:[ip:172.23.106.175 port:8091 ssh_username:root] at C1 cluster 172.23.106.139
      2016-03-03 10:38:29 | INFO | MainProcess | Cluster_Thread | [task._failover_nodes] Failing over 172.23.106.175:8091 with graceful=False
      2016-03-03 10:38:29 | INFO | MainProcess | Cluster_Thread | [rest_client.fail_over] fail_over node ns_1@172.23.106.175 successful
      2016-03-03 10:38:29 | INFO | MainProcess | Cluster_Thread | [task.execute] 0 seconds sleep after failover, for nodes to go pending....
      2016-03-03 10:38:30 | INFO | MainProcess | test_thread | [rest_client.add_back_node] add_back_node ns_1@172.23.106.175 successful
      2016-03-03 10:38:30 | INFO | MainProcess | test_thread | [rest_client.set_recovery_type] Going to set recoveryType=full for node :: ns_1@172.23.106.175
      2016-03-03 10:38:30 | INFO | MainProcess | test_thread | [rest_client.set_recovery_type] recoveryType for node ns_1@172.23.106.175 set successful
      2016-03-03 10:38:31 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%40172.23.106.175%2Cns_1%40172.23.106.139
      2016-03-03 10:38:31 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance operation started
      2016-03-03 10:38:31 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 0.00 %
      2016-03-03 10:38:41 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 10.16 %
      2016-03-03 10:38:51 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 21.03 %
      2016-03-03 10:39:01 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 31.17 %
      2016-03-03 10:39:11 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 40.82 %
      2016-03-03 10:39:21 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 51.56 %
      2016-03-03 10:39:31 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 63.48 %
      2016-03-03 10:39:41 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 74.74 %
      2016-03-03 10:39:51 | INFO | MainProcess | Cluster_Thread | [task.check] rebalancing was completed with progress: 100% in 80.3003251553 sec
      2016-03-03 10:39:51 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: Number of PIndexes = 32
      2016-03-03 10:39:51 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: Every pIndex serves 32 partitions or lesser
      2016-03-03 10:39:51 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: pIndexes are distributed across [u'e730424ee9bf53090dee6cc35ed1e032'] 
      2016-03-03 10:39:51 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Expecting num of partitions in each node in range 992-1056
      2016-03-03 10:39:51 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: Node e730424ee9bf53090dee6cc35ed1e032 houses 32 pindexes which serve 1024 partitions
      2016-03-03 10:39:52 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 14547
      2016-03-03 10:40:03 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 17110
      2016-03-03 10:40:14 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 19483
      2016-03-03 10:40:25 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 22092
      2016-03-03 10:40:36 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 24683
      2016-03-03 10:40:47 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 27185
      2016-03-03 10:40:57 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 29864
      2016-03-03 10:41:08 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 29864
      2016-03-03 10:41:19 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 29864
      2016-03-03 10:41:29 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 29864
      2016-03-03 10:41:40 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 29864
      2016-03-03 10:41:50 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 29864
      2016-03-03 10:42:01 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 29864
      2016-03-03 10:42:11 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] FTS indexed 29864 docs in 2.33 mins
      
      

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              apiravi Aruna Piravi (Inactive)
              apiravi Aruna Piravi (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty