Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.6.0
-
Untriaged
-
1
-
Unknown
Description
This bug contains two different failures, for different proximate reasons, more than one month apart, but possibly share the same root cause. The test scenario was created to stress metakv:
- Dave Finlay analyzed the first one, from 2020-12-10, which has 10 log URLs, then requested a newer run.
- Kevin Cherkauer analyzed the second one, from 2021-01-21, which has a Supportal snapshot. This run failed for a different reason than the first one.
Description of the first failure (2020-12-10):
Build: 6.6.0-7909
- Cluster with 3 kv, 2 index+n1ql, 4 search nodes
- 6 bkts with 5000 docs each
- Built 200 GSI indexes with replica 1 (50 indexes on 4 buckets)
- Created 30 fts custom indexes (10 indexes on 3 buckets), just to add more entries to metakv
- Create and Drop 100 gsi indexes sequentially on 4 buckets ( so this would be adding more entries of create/drop of 400 indexes)
- Create and drop 50 fts indexes on 3 buckets.
- Note for QE: Examples to create/drop, fts and gsi indexes can be found here: https://github.com/couchbaselabs/productivitynautomation/tree/master/create-drop-gsi-fts-indexes
- Graceful Failover a node with kv service. Failover rebalance fails with below:
{view_fragmentation_threshold,{30,undefined}}]
|
[ns_server:warn,2020-12-14T09:40:56.993-08:00,ns_1@172.23.106.245:<0.28966.245>:leader_lease_acquire_worker:handle_acquire_timeout:112]Timeout while trying to acquire lease from 'ns_1@172.23.121.47'.
|
Acquire options were [{timeout,0},{period,15000}]
|
[ns_server:warn,2020-12-14T09:40:56.993-08:00,ns_1@172.23.106.245:<0.28988.245>:leader_lease_acquire_worker:handle_acquire_timeout:112]Timeout while trying to acquire lease from 'ns_1@172.23.107.197'.
|
Acquire options were [{timeout,0},{period,15000}]
|
[ns_server:warn,2020-12-14T09:40:56.993-08:00,ns_1@172.23.106.245:<0.29045.245>:leader_lease_acquire_worker:handle_acquire_timeout:112]Timeout while trying to acquire lease from 'ns_1@172.23.107.87'.
|
Acquire options were [{timeout,0},{period,15000}]
|
[ns_server:warn,2020-12-14T09:40:56.993-08:00,ns_1@172.23.106.245:<0.21396.245>:leader_lease_acquire_worker:handle_acquire_timeout:112]Timeout while trying to acquire lease from 'ns_1@172.23.121.41'.
|
Acquire options were [{timeout,0},{period,15000}]
|
[ns_server:warn,2020-12-14T09:40:56.994-08:00,ns_1@172.23.106.245:<0.29093.245>:leader_lease_acquire_worker:handle_acquire_timeout:112]Timeout while trying to acquire lease from 'ns_1@172.23.106.245'.
|
Acquire options were [{timeout,0},{period,15000}]
|
[ns_server:debug,2020-12-14T09:40:56.994-08:00,ns_1@172.23.106.245:leader_lease_agent<0.23775.0>:leader_lease_agent:handle_lease_expired:286]Lease held by {lease_holder,<<"5d86e482ae739d52262a6ebd2d87c1ca">>,
|
'ns_1@172.23.106.245'} expired. Starting expirer.
|
[ns_server:debug,2020-12-14T09:40:56.994-08:00,ns_1@172.23.106.245:leader_activities<0.23774.0>:leader_activities:terminate_activities:635]Terminating activities (reason is {shutdown,
|
{quorum_lost,
|
{lease_lost,'ns_1@172.23.106.245'}}}):
|
[{activity,<0.26428.889>,#Ref<0.2150457706.1058275335.246268>,default,
|
<<"59a2588ae4959d8162566e0e9c3ab763">>,
|
[rebalance],
|
majority,[]}]
|
[ns_server:info,2020-12-14T09:40:57.000-08:00,ns_1@172.23.106.245:rebalance_agent<0.23812.0>:rebalance_agent:handle_down:296]Rebalancer process <0.26591.889> died (reason shutdown).
|
[ns_server:error,2020-12-14T09:40:57.000-08:00,ns_1@172.23.106.245:<0.26556.889>:leader_activities:report_error:1011]Activity {default,rebalance} failed with error {quorum_lost,
|
{lease_lost,
|
'ns_1@172.23.106.245'}}
|
[ns_server:debug,2020-12-14T09:40:57.004-08:00,ns_1@172.23.106.245:ns_config_log<0.201.0>:ns_config_log:log_common:229]config change:
|
{local_changes_count,<<"fda11948c11fa3107f8aa6d83b739109">>} ->
|
[{'_vclock',[{<<"fda11948c11fa3107f8aa6d83b739109">>,{1572,63775186856}}]}]
|
[user:error,2020-12-14T09:40:57.008-08:00,ns_1@172.23.106.245:<0.29120.245>:ns_orchestrator:log_rebalance_completion:1445]Rebalance exited with reason {{badmatch,
|
{leader_activities_error,
|
{default,rebalance},
|
{quorum_lost,
|
{lease_lost,'ns_1@172.23.106.245'}}}},
|
[{ns_rebalancer,rebalance,5,
|
[{file,"src/ns_rebalancer.erl"},{line,480}]},
|
{proc_lib,init_p_do_apply,3,
|
[{file,"proc_lib.erl"},{line,247}]}]}.
|
Rebalance Operation Id = f5876e978c98291d711dd1704249b07b
|
Logs:
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.106.245.zip. (kv node)
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.107.104.zip (kv node)
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.107.197.zip (kv node)
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.107.87.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.41.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.45.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.46.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.47.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.49.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.52.zip
Attachments
Issue Links
- backports to
-
MB-46532 [BP MB-43285 to 6.6.3] Rebalance exited with reason {{badmatch : Timeout while trying to acquire lease
- Closed