Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-43285

Rebalance exited with reason {{badmatch : Timeout while trying to acquire lease

    XMLWordPrintable

Details

    • Untriaged
    • 1
    • Unknown

    Description

      This bug contains two different failures, for different proximate reasons, more than one month apart, but possibly share the same root cause. The test scenario was created to stress metakv:

      • Dave Finlay analyzed the first one, from 2020-12-10, which has 10 log URLs, then requested a newer run.
      • Kevin Cherkauer analyzed the second one, from 2021-01-21, which has a Supportal snapshot. This run failed for a different reason than the first one.

       

      Description of the first failure (2020-12-10):

      Build: 6.6.0-7909

      • Cluster with 3 kv, 2 index+n1ql, 4 search nodes
      • 6 bkts with 5000 docs each
      • Built 200 GSI indexes with replica 1 (50 indexes on 4 buckets)
      • Created 30 fts custom indexes (10 indexes on 3 buckets), just to add more entries to metakv
      • Create and Drop 100 gsi indexes sequentially on 4 buckets ( so this would be adding more entries of create/drop of 400 indexes)
      • Create and drop 50 fts indexes on 3 buckets.
      • Note for QE: Examples to create/drop, fts and gsi indexes can be found here: https://github.com/couchbaselabs/productivitynautomation/tree/master/create-drop-gsi-fts-indexes
      • Graceful Failover a node with kv service. Failover rebalance fails with below:

      {view_fragmentation_threshold,{30,undefined}}]
      [ns_server:warn,2020-12-14T09:40:56.993-08:00,ns_1@172.23.106.245:<0.28966.245>:leader_lease_acquire_worker:handle_acquire_timeout:112]Timeout while trying to acquire lease from 'ns_1@172.23.121.47'.
      Acquire options were [{timeout,0},{period,15000}]
      [ns_server:warn,2020-12-14T09:40:56.993-08:00,ns_1@172.23.106.245:<0.28988.245>:leader_lease_acquire_worker:handle_acquire_timeout:112]Timeout while trying to acquire lease from 'ns_1@172.23.107.197'.
      Acquire options were [{timeout,0},{period,15000}]
      [ns_server:warn,2020-12-14T09:40:56.993-08:00,ns_1@172.23.106.245:<0.29045.245>:leader_lease_acquire_worker:handle_acquire_timeout:112]Timeout while trying to acquire lease from 'ns_1@172.23.107.87'.
      Acquire options were [{timeout,0},{period,15000}]
      [ns_server:warn,2020-12-14T09:40:56.993-08:00,ns_1@172.23.106.245:<0.21396.245>:leader_lease_acquire_worker:handle_acquire_timeout:112]Timeout while trying to acquire lease from 'ns_1@172.23.121.41'.
      Acquire options were [{timeout,0},{period,15000}]
      [ns_server:warn,2020-12-14T09:40:56.994-08:00,ns_1@172.23.106.245:<0.29093.245>:leader_lease_acquire_worker:handle_acquire_timeout:112]Timeout while trying to acquire lease from 'ns_1@172.23.106.245'.
      Acquire options were [{timeout,0},{period,15000}]
      [ns_server:debug,2020-12-14T09:40:56.994-08:00,ns_1@172.23.106.245:leader_lease_agent<0.23775.0>:leader_lease_agent:handle_lease_expired:286]Lease held by {lease_holder,<<"5d86e482ae739d52262a6ebd2d87c1ca">>,
                                  'ns_1@172.23.106.245'} expired. Starting expirer.
      [ns_server:debug,2020-12-14T09:40:56.994-08:00,ns_1@172.23.106.245:leader_activities<0.23774.0>:leader_activities:terminate_activities:635]Terminating activities (reason is {shutdown,
                                         {quorum_lost,
                                          {lease_lost,'ns_1@172.23.106.245'}}}):
      [{activity,<0.26428.889>,#Ref<0.2150457706.1058275335.246268>,default,
                 <<"59a2588ae4959d8162566e0e9c3ab763">>,
                 [rebalance],
                 majority,[]}]
      [ns_server:info,2020-12-14T09:40:57.000-08:00,ns_1@172.23.106.245:rebalance_agent<0.23812.0>:rebalance_agent:handle_down:296]Rebalancer process <0.26591.889> died (reason shutdown).
      [ns_server:error,2020-12-14T09:40:57.000-08:00,ns_1@172.23.106.245:<0.26556.889>:leader_activities:report_error:1011]Activity {default,rebalance} failed with error {quorum_lost,
                                                      {lease_lost,
                                                       'ns_1@172.23.106.245'}}
      [ns_server:debug,2020-12-14T09:40:57.004-08:00,ns_1@172.23.106.245:ns_config_log<0.201.0>:ns_config_log:log_common:229]config change:
      {local_changes_count,<<"fda11948c11fa3107f8aa6d83b739109">>} ->
      [{'_vclock',[{<<"fda11948c11fa3107f8aa6d83b739109">>,{1572,63775186856}}]}]
      [user:error,2020-12-14T09:40:57.008-08:00,ns_1@172.23.106.245:<0.29120.245>:ns_orchestrator:log_rebalance_completion:1445]Rebalance exited with reason {{badmatch,
                                     {leader_activities_error,
                                      {default,rebalance},
                                      {quorum_lost,
                                       {lease_lost,'ns_1@172.23.106.245'}}}},
                                    [{ns_rebalancer,rebalance,5,
                                      [{file,"src/ns_rebalancer.erl"},{line,480}]},
                                     {proc_lib,init_p_do_apply,3,
                                      [{file,"proc_lib.erl"},{line,247}]}]}.
      Rebalance Operation Id = f5876e978c98291d711dd1704249b07b
      

      Logs:
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.106.245.zip. (kv node)
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.107.104.zip (kv node)
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.107.197.zip (kv node)
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.107.87.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.41.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.45.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.46.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.47.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.49.zip
      url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1607975035/collectinfo-2020-12-14T194357-ns_1%40172.23.121.52.zip

      Attachments

        1. mb43285_2.sh
          0.5 kB
          Kevin Cherkauer
        2. mb43285_3a.sh
          0.3 kB
          Kevin Cherkauer
        3. mb43285_3b.sh
          0.3 kB
          Kevin Cherkauer
        4. mb43285_3c.sh
          0.3 kB
          Kevin Cherkauer
        5. mb43285.sh
          0.3 kB
          Kevin Cherkauer
        6. metakv.Get.times
          218 kB
          Kevin Cherkauer
        7. metakv.Get.times.png
          198 kB
          Kevin Cherkauer
        8. metakv.ListAllChildren.create100.drop100+create100.rebal+create100
          4 kB
          Kevin Cherkauer
        9. metakv.ListAllChildren.create100.drop100+create100.rebal+create100.png
          19 kB
          Kevin Cherkauer
        10. points1000
          19 kB
          Kevin Cherkauer
        11. points1000.png
          53 kB
          Kevin Cherkauer
        12. points500
          8 kB
          Kevin Cherkauer
        13. points500.png
          30 kB
          Kevin Cherkauer
        14. points800
          14 kB
          Kevin Cherkauer
        15. points800.png
          44 kB
          Kevin Cherkauer
        16. screenshot-1.png
          156 kB
          Dave Finlay
        17. screenshot-2.png
          171 kB
          Dave Finlay
        18. tombstones72500
          9 kB
          Kevin Cherkauer
        19. tombstones72500.excl_8_outliers
          9 kB
          Kevin Cherkauer
        20. tombstones72500.excl_8_outliers.png
          33 kB
          Kevin Cherkauer
        21. tombstones72500.excl_8_outliers.short.png
          29 kB
          Kevin Cherkauer
        22. tombstones72500.png
          31 kB
          Kevin Cherkauer

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              girish.benakappa Girish Benakappa
              girish.benakappa Girish Benakappa
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty