Loading...

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 4.5.0
Affects Version/s: 4.5.0
Component/s: cbft
Labels:
- functional-test

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
https://s3.amazonaws.com/cb-customers/Aruna/collectinfo-2016-03-03T184642-ns_1%40172.23.106.139.zip <==kv
https://s3.amazonaws.com/cb-customers/Aruna/collectinfo-2016-03-03T184642-ns_1%40172.23.106.175.zip <== kv+fts

Show
https://s3.amazonaws.com/cb-customers/Aruna/collectinfo-2016-03-03T184642-ns_1%40172.23.106.139.zip <==kv https://s3.amazonaws.com/cb-customers/Aruna/collectinfo-2016-03-03T184642-ns_1%40172.23.106.175.zip <== kv+fts
Is this a Regression?:
No

Description

Build
4.5.0-1666

Testcase(to be merged)
./testrunner -i INI_FILE.ini -p skip-cleanup=True,stop-on-failure=False -t fts.moving_topology_fts.MovingTopFTS.hard_failover_and_full_recovery_during_index_building,items=30000,cluster=D,D+F,GROUP=P0

I've run this test at least 3 times yesterday and I'm seeing this problem for the first time.

Steps
1. Cluster: D, D+F [1 kv, 1 kv+fts node]
2. Create a default index, while it is building, hard-failover the D+F node.
3. Immediately add back D+F in full recovery mode. Rebalance completes successfully.
4. Index building progresses, only to get stuck at 99%.

Live cluster - http://172.23.106.175:8091/ui/index.html#/fts_view/default_index_1

2016-03-03 10:38:13 | INFO | MainProcess | test_thread | [fts_base.create] Creating fulltext-index default_index_1 on 172.23.106.175

2016-03-03 10:38:13 | INFO | MainProcess | test_thread | [rest_client.create_fts_index] {"params": {}, "name": "default_index_1", "planParams": {"hierarchyRules": null, "nodePlanParams": null, "numReplicas": 0, "planFrozen": false, "maxPartitionsPerPIndex": 32}, "sourceName": "default", "sourceUUID": "", "sourceType": "couchbase", "type": "fulltext-index", "sourceParams": {"authUser": "default", "dataManagerSleepMaxMS": 20000, "authSaslUser": "", "clusterManagerSleepMaxMS": 20000, "authSaslPassword": "", "clusterManagerSleepInitMS": 0, "dataManagerBackoffFactor": 0, "authPassword": "", "dataManagerSleepInitMS": 0, "feedBufferAckThreshold": 0, "feedBufferSizeBytes": 0, "clusterManagerBackoffFactor": 0}, "uuid": ""}

2016-03-03 10:38:13 | INFO | MainProcess | test_thread | [rest_client.create_fts_index] Index default_index_1 created

2016-03-03 10:38:13 | INFO | MainProcess | test_thread | [fts_base.sleep] sleep for 5 secs. No pindexes found, waiting for index to get created ...

2016-03-03 10:38:18 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: Number of PIndexes = 32

2016-03-03 10:38:18 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: Every pIndex serves 32 partitions or lesser

2016-03-03 10:38:18 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: pIndexes are distributed across [u'e730424ee9bf53090dee6cc35ed1e032']

2016-03-03 10:38:18 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Expecting num of partitions in each node in range 992-1056

2016-03-03 10:38:18 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: Node e730424ee9bf53090dee6cc35ed1e032 houses 32 pindexes which serve 1024 partitions

2016-03-03 10:38:18 | INFO | MainProcess | test_thread | [fts_base.sleep] sleep for 10 secs.  ...

2016-03-03 10:38:28 | INFO | MainProcess | test_thread | [moving_topology_fts.graceful_failover_and_full_recovery_during_index_building] Index building has begun...

2016-03-03 10:38:29 | INFO | MainProcess | test_thread | [moving_topology_fts.graceful_failover_and_full_recovery_during_index_building] Index count for default_index_1: 3118

2016-03-03 10:38:29 | INFO | MainProcess | test_thread | [fts_base.__async_failover] Starting failover for nodes:[ip:172.23.106.175 port:8091 ssh_username:root] at C1 cluster 172.23.106.139

2016-03-03 10:38:29 | INFO | MainProcess | Cluster_Thread | [task._failover_nodes] Failing over 172.23.106.175:8091 with graceful=False

2016-03-03 10:38:29 | INFO | MainProcess | Cluster_Thread | [rest_client.fail_over] fail_over node ns_1@172.23.106.175 successful

2016-03-03 10:38:29 | INFO | MainProcess | Cluster_Thread | [task.execute] 0 seconds sleep after failover, for nodes to go pending....

2016-03-03 10:38:30 | INFO | MainProcess | test_thread | [rest_client.add_back_node] add_back_node ns_1@172.23.106.175 successful

2016-03-03 10:38:30 | INFO | MainProcess | test_thread | [rest_client.set_recovery_type] Going to set recoveryType=full for node :: ns_1@172.23.106.175

2016-03-03 10:38:30 | INFO | MainProcess | test_thread | [rest_client.set_recovery_type] recoveryType for node ns_1@172.23.106.175 set successful

2016-03-03 10:38:31 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance params : password=password&ejectedNodes=&user=Administrator&knownNodes=ns_1%40172.23.106.175%2Cns_1%40172.23.106.139

2016-03-03 10:38:31 | INFO | MainProcess | Cluster_Thread | [rest_client.rebalance] rebalance operation started

2016-03-03 10:38:31 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 0.00 %

2016-03-03 10:38:41 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 10.16 %

2016-03-03 10:38:51 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 21.03 %

2016-03-03 10:39:01 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 31.17 %

2016-03-03 10:39:11 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 40.82 %

2016-03-03 10:39:21 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 51.56 %

2016-03-03 10:39:31 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 63.48 %

2016-03-03 10:39:41 | INFO | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] rebalance percentage : 74.74 %

2016-03-03 10:39:51 | INFO | MainProcess | Cluster_Thread | [task.check] rebalancing was completed with progress: 100% in 80.3003251553 sec

2016-03-03 10:39:51 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: Number of PIndexes = 32

2016-03-03 10:39:51 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: Every pIndex serves 32 partitions or lesser

2016-03-03 10:39:51 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: pIndexes are distributed across [u'e730424ee9bf53090dee6cc35ed1e032']

2016-03-03 10:39:51 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Expecting num of partitions in each node in range 992-1056

2016-03-03 10:39:51 | INFO | MainProcess | test_thread | [fts_base.is_index_partitioned_balanced] Validated: Node e730424ee9bf53090dee6cc35ed1e032 houses 32 pindexes which serve 1024 partitions

2016-03-03 10:39:52 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 14547

2016-03-03 10:40:03 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 17110

2016-03-03 10:40:14 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 19483

2016-03-03 10:40:25 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 22092

2016-03-03 10:40:36 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 24683

2016-03-03 10:40:47 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 27185

2016-03-03 10:40:57 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 29864

2016-03-03 10:41:08 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 29864

2016-03-03 10:41:19 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 29864

2016-03-03 10:41:29 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 29864

2016-03-03 10:41:40 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 29864

2016-03-03 10:41:50 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 29864

2016-03-03 10:42:01 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] Docs in bucket = 30000, docs in FTS index 'default_index_1': 29864

2016-03-03 10:42:11 | INFO | MainProcess | test_thread | [fts_base.wait_for_indexing_complete] FTS indexed 29864 docs in 2.33 mins

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Screen Shot 2016-03-03 at 10.51.55 AM.png
69 kB
03/Mar/16 11:19 AM

Issue Links

relates to

MB-18447 [FTS] After kv node in cluster is rebooted during index building, indexing does not resume

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

[FTS] Indexing stuck at 99.55% following failover(hard) and add back in full recovery (of the only fts node)

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty