Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31279

[System test] : Rebalance failed with badarg error

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • None
    • 6.0.0
    • couchbase-bucket
    • Centos cluster 1

    Description

      Build:  6.0.0 build 1529

      Test Job: http://qa.sc.couchbase.com/job/centos-systest-launcher/1580/console 

      Cluster: http://172.23.108.103:8091/ 

      We run the following steps in centos longevity 

      Longevity :

      1. Create 22 node cluster  (9 KV,5 index, 2 query , 2 fts, 2 eventing, 2 cbas)
      2. Create 10 buckets (default bucket with Active compression)
      3. Create views
      4. Load data
      5. Remove kv node
      6. Deploy eventing functions
      7. Create dataset on analytics on 4 buckets
      8. Create index on 2 datasets
      9. Create 2i index
      10. Load more data
      11. Run queries on 2i
      12. Swap a KV node
      13. Run 240 queries per second on analytics
      14. Connect link Local
      15. Load more data to default bucket
      16. Add eventing node
      17. Remove eventing node
      18. Swap eventing node
      19. Disconnect link Local 
      20. Add analytics node 
      21. connect link Local
      22. Disconnect link Local 
      23. Remove analytics node
      24. connect link Local
      25. Swap analytics node
      26. Kill analytics nodes
      27. Run views
      28. Create fts indexes 
      29. Regex search on FTS
      30. XDCR replication
      31. Add rbac users
      32. Undeploy eventing handlers
      33. Load 1M doc
      34. Create 2i indexes 
      35. Rebalance in index
      36. Rebalance out index
      37. Swap index node
      38. Rebalance in 2 index nodes
      39. Rebalance out 2 index nodes
      40. Rebalance out 1 KV
      41. Rebalance in 1 KV
      42. Failover -> Full recovery index node
      43. Failover -> Rebalance out index node
      44. Add index node
      45. Redeploy eventing handlers
      46. Run Tpcc
      47. Update Doc
      48. Add a kv node , failover kv node -> rebalance 
      49. swap hard failover -> Add 1 KV remove 2 KV as soft and hard failover
      50. Multinode autofailover -> failover 3 KV nodes and rebalance 

      The rebalance operation fails because of a “badarg” exception. This issue was seen while debugging MB-30967:

      [user:error,2018-09-11T20:27:39.219-07:00,ns_1@172.23.108.103:<0.8021.0>:ns_orchestrator:do_log_rebalance_completion:1117]Rebalance exited with reason {mover_crashed,
                                    {unexpected_exit,
                                     {'EXIT',<0.684.1355>,
                                      {badarg,
                                       {gen_server,call,
                                        [{'janitor_agent-DISTRICT',
                                          'ns_1@172.23.108.104'},
                                         {if_rebalance,<0.17033.1318>,
                                          {dcp_takeover,'ns_1@172.23.99.21',679}},
                                         infinity]}}}}}

      The following crash can be seen on node 172.23.108.104:

      =========================CRASH REPORT=========================
        crasher:
          initial call: janitor_agent:-spawn_rebalance_subprocess/3-fun-0-/0
          pid: <0.18154.445>                                              
          registered_name: []
          exception error: bad argument
            in function  link/1
               called as link(undefined)                                                                                       
            in call from janitor_agent:'-handle_call/3-fun-5-'/3 (src/janitor_agent.erl, line 721)                             
            in call from janitor_agent:'-spawn_rebalance_subprocess/3-fun-0-'/3 (src/janitor_agent.erl, line 896)
          ancestors: ['janitor_agent-DISTRICT','janitor_agent_sup-DISTRICT',                                                     
                        'single_bucket_kv_sup-DISTRICT',ns_bucket_sup,                                                           
                        ns_bucket_worker_sup,ns_server_sup,ns_server_nodes_sup,
                        <0.5984.169>,ns_server_cluster_sup,<0.89.0>]                                                             
          messages: []                                                                                                           
          links: [<0.25295.403>,<0.25397.403>]
          dictionary: []                                                                                                          
          trap_exit: false                                                                                                        
          status: running
          heap_size: 987                                                                                                          
          stack_size: 27                                                                                                          
          reductions: 1004

      The following are the logs:

      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.104.61.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.104.67.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.104.69.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.104.70.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.104.87.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.104.88.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.106.188.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.108.103.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.108.104.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.96.145.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.96.148.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.96.168.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.96.56.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.96.95.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.97.239.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.97.242.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.98.135.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.99.11.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.99.20.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.99.21.zip
      https://s3.amazonaws.com/cb-engineering/mb30967/collectinfo-2018-09-12T191303-ns_1@172.23.99.25.zip

       

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              vikas.chaudhary Vikas Chaudhary
              ajit.yagaty Ajit Yagaty [X] (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty