Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43285

Rebalance exited with reason {{badmatch : Timeout while trying to acquire lease

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      This bug contains two different failures, for different proximate reasons, more than one month apart, but possibly share the same root cause. The test scenario was created to stress metakv:

      • Dave Finlay analyzed the first one, from 2020-12-10, which has 10 log URLs, then requested a newer run.
      • Kevin Cherkauer analyzed the second one, from 2021-01-21, which has a Supportal snapshot. This run failed for a different reason than the first one.

       

      Description of the first failure (2020-12-10):

      Build: 6.6.0-7909

      • Cluster with 3 kv, 2 index+n1ql, 4 search nodes
      • 6 bkts with 5000 docs each
      • Built 200 GSI indexes with replica 1 (50 indexes on 4 buckets)
      • Created 30 fts custom indexes (10 indexes on 3 buckets), just to add more entries to metakv
      • Create and Drop 100 gsi indexes sequentially on 4 buckets ( so this would be adding more entries of create/drop of 400 indexes)
      • Create and drop 50 fts indexes on 3 buckets.
      • Note for QE: Examples to create/drop, fts and gsi indexes can be found here: https://github.com/couchbaselabs/productivitynautomation/tree/master/create-drop-gsi-fts-indexes
      • Graceful Failover a node with kv service. Failover rebalance fails with below:

      {view_fragmentation_threshold,{30,undefined}}]
      [ns_server:warn,2020-12-14T09:40:56.993-08:00,ns_1@172.23.106.245:<0.28966.245>:leader_lease_acquire_worker:handle_acquire_timeout:112]Timeout while trying to acquire lease from 'ns_1@172.23.121.47'.
      Acquire options were [{timeout,0},{period,15000}]
      [ns_server:warn,2020-12-14T09:40:56.993-08:00,ns_1@172.23.106.245:<0.28988.245>:leader_lease_acquire_worker:handle_acquire_timeout:112]Timeout while trying to acquire lease from 'ns_1@172.23.107.197'.
      Acquire options were [{timeout,0},{period,15000}]
      [ns_server:warn,2020-12-14T09:40:56.993-08:00,ns_1@172.23.106.245:<0.29045.245>:leader_lease_acquire_worker:handle_acquire_timeout:112]Timeout while trying to acquire lease from 'ns_1@172.23.107.87'.
      Acquire options were [{timeout,0},{period,15000}]
      [ns_server:warn,2020-12-14T09:40:56.993-08:00,ns_1@172.23.106.245:<0.21396.245>:leader_lease_acquire_worker:handle_acquire_timeout:112]Timeout while trying to acquire lease from 'ns_1@172.23.121.41'.
      Acquire options were [{timeout,0},{period,15000}]
      [ns_server:warn,2020-12-14T09:40:56.994-08:00,ns_1@172.23.106.245:<0.29093.245>:leader_lease_acquire_worker:handle_acquire_timeout:112]Timeout while trying to acquire lease from 'ns_1@172.23.106.245'.
      Acquire options were [{timeout,0},{period,15000}]
      [ns_server:debug,2020-12-14T09:40:56.994-08:00,ns_1@172.23.106.245:leader_lease_agent<0.23775.0>:leader_lease_agent:handle_lease_expired:286]Lease held by {lease_holder,<<"5d86e482ae739d52262a6ebd2d87c1ca">>,
                                  'ns_1@172.23.106.245'} expired. Starting expirer.
      [ns_server:debug,2020-12-14T09:40:56.994-08:00,ns_1@172.23.106.245:leader_activities<0.23774.0>:leader_activities:terminate_activities:635]Terminating activities (reason is {shutdown,
                                         {quorum_lost,
                                          {lease_lost,'ns_1@172.23.106.245'}}}):
      [{activity,<0.26428.889>,#Ref<0.2150457706.1058275335.246268>,default,
                 <<"59a2588ae4959d8162566e0e9c3ab763">>,
                 [rebalance],
                 majority,[]}]
      [ns_server:info,2020-12-14T09:40:57.000-08:00,ns_1@172.23.106.245:rebalance_agent<0.23812.0>:rebalance_agent:handle_down:296]Rebalancer process <0.26591.889> died (reason shutdown).
      [ns_server:error,2020-12-14T09:40:57.000-08:00,ns_1@172.23.106.245:<0.26556.889>:leader_activities:report_error:1011]Activity {default,rebalance} failed with error {quorum_lost,
                                                      {lease_lost,
                                                       'ns_1@172.23.106.245'}}
      [ns_server:debug,2020-12-14T09:40:57.004-08:00,ns_1@172.23.106.245:ns_config_log<0.201.0>:ns_config_log:log_common:229]config change:
      {local_changes_count,<<"fda11948c11fa3107f8aa6d83b739109">>} ->
      [{'_vclock',[{<<"fda11948c11fa3107f8aa6d83b739109">>,{1572,63775186856}}]}]
      [user:error,2020-12-14T09:40:57.008-08:00,ns_1@172.23.106.245:<0.29120.245>:ns_orchestrator:log_rebalance_completion:1445]Rebalance exited with reason {{badmatch,
                                     {leader_activities_error,
                                      {default,rebalance},
                                      {quorum_lost,
                                       {lease_lost,'ns_1@172.23.106.245'}}}},
                                    [{ns_rebalancer,rebalance,5,
                                      [{file,"src/ns_rebalancer.erl"},{line,480}]},
                                     {proc_lib,init_p_do_apply,3,
                                      [{file,"proc_lib.erl"},{line,247}]}]}.
      Rebalance Operation Id = f5876e978c98291d711dd1704249b07b
      

      Logs:
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.106.245.zip. (kv node)
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.107.104.zip (kv node)
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.107.197.zip (kv node)
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.107.87.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.41.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.45.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.46.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.47.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.49.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.52.zip

      Attachments

        1. mb43285_2.sh
          0.5 kB
        2. mb43285_3a.sh
          0.3 kB
        3. mb43285_3b.sh
          0.3 kB
        4. mb43285_3c.sh
          0.3 kB
        5. mb43285.sh
          0.3 kB
        6. metakv.Get.times
          218 kB
        7. metakv.Get.times.png
          metakv.Get.times.png
          198 kB
        8. metakv.ListAllChildren.create100.drop100+create100.rebal+create100
          4 kB
        9. metakv.ListAllChildren.create100.drop100+create100.rebal+create100.png
          metakv.ListAllChildren.create100.drop100+create100.rebal+create100.png
          19 kB
        10. points1000
          19 kB
        11. points1000.png
          points1000.png
          53 kB
        12. points500
          8 kB
        13. points500.png
          points500.png
          30 kB
        14. points800
          14 kB
        15. points800.png
          points800.png
          44 kB
        16. screenshot-1.png
          screenshot-1.png
          156 kB
        17. screenshot-2.png
          screenshot-2.png
          171 kB
        18. tombstones72500
          9 kB
        19. tombstones72500.excl_8_outliers
          9 kB
        20. tombstones72500.excl_8_outliers.png
          tombstones72500.excl_8_outliers.png
          33 kB
        21. tombstones72500.excl_8_outliers.short.png
          tombstones72500.excl_8_outliers.short.png
          29 kB
        22. tombstones72500.png
          tombstones72500.png
          31 kB

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              girish.benakappa Girish Benakappa
              girish.benakappa Girish Benakappa
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty