Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-30967

[System test] : Rebalance failed with noproc and mover_crashed

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 6.0.0
    • 6.0.0
    • go-couchbase
    • Centos cluster 1

    Description

      Build:  6.0.0 build 1529

      Test Job: http://qa.sc.couchbase.com/job/centos-systest-launcher/1580/console 

      Cluster: http://172.23.108.103:8091/ 

      We run the following steps in centos longevity 

      Longevity :

      1. Create 22 node cluster  (9 KV,5 index, 2 query , 2 fts, 2 eventing, 2 cbas)
      2. Create 10 buckets (default bucket with Active compression)
      3. Create views
      4. Load data
      5. Remove kv node
      6. Deploy eventing functions
      7. Create dataset on analytics on 4 buckets
      8. Create index on 2 datasets
      9. Create 2i index
      10. Load more data
      11. Run queries on 2i
      12. Swap a KV node
      13. Run 240 queries per second on analytics
      14. Connect link Local
      15. Load more data to default bucket
      16. Add eventing node
      17. Remove eventing node
      18. Swap eventing node
      19. Disconnect link Local 
      20. Add analytics node 
      21. connect link Local
      22. Disconnect link Local 
      23. Remove analytics node
      24. connect link Local
      25. Swap analytics node
      26. Kill analytics nodes
      27. Run views
      28. Create fts indexes 
      29. Regex search on FTS
      30. XDCR replication
      31. Add rbac users
      32. Undeploy eventing handlers
      33. Load 1M doc
      34. Create 2i indexes 
      35. Rebalance in index
      36. Rebalance out index
      37. Swap index node
      38. Rebalance in 2 index nodes
      39. Rebalance out 2 index nodes
      40. Rebalance out 1 KV
      41. Rebalance in 1 KV
      42. Failover -> Full recovery index node
      43. Failover -> Rebalance out index node
      44. Add index node
      45. Redeploy eventing handlers
      46. Run Tpcc
      47. Update Doc
      48. Add a kv node , failover kv node -> rebalance 
      49. swap hard failover -> Add 1 KV remove 2 KV as soft and hard failover
      50. Multinode autofailover -> failover 3 KV nodes and rebalance 

      observed the following fails with the below steps

      • Adding a data node back

      [2018-08-18T02:50:04-07:00, sequoiatools/pillowfight:d91da7] -U couchbase://172.23.108.103/default?select_bucket=true -I 3000 -B 300 -t 4 -c 100 -P password
      [2018-08-18T02:54:55-07:00, sequoiatools/couchbase-cli:262814] server-add -c 172.23.108.103:8091 --server-add 172.23.108.104:8091 -u Administrator -p password --server-add-username Administrator --server-add-password password --services data
      [2018-08-18T02:55:18-07:00, sequoiatools/couchbase-cli:4cc1a4] rebalance -c 172.23.108.103:8091 -u Administrator -p password
       
      Error occurred on container - sequoiatools/couchbase-cli:[rebalance -c 172.23.108.103:8091 -u Administrator -p password]
       
      docker logs 4cc1a4
      docker start 4cc1a4
       
      *Unable to display progress bar on this os
      JERROR: Rebalance failed. See logs for detailed reason. You can try again.

      [user:error,2018-08-18T03:08:17.682-07:00,ns_1@172.23.108.103:<0.9637.0>:ns_orchestrator:do_log_rebalance_completion:1117]Rebalance exited with reason {noproc,
      {gen_server,call,
      [{'janitor_agent-WAREHOUSE',
      'ns_1@172.23.96.56'},
      {get_dcp_docs_estimate,134,
      ['ns_1@172.23.108.104']},
      infinity]}}

      • Removing data node

      [2018-08-18T03:38:18-07:00, sequoiatools/couchbase-cli:8c9dd6] rebalance -c 172.23.108.103:8091 --server-remove 172.23.108.104:8091 -u Administrator -p password
       
      Error occurred on container - sequoiatools/couchbase-cli:[rebalance -c 172.23.108.103:8091 --server-remove 172.23.108.104:8091 -u Administrator -p password]
       
      docker logs 8c9dd6
      docker start 8c9dd6
       
      *Unable to display progress bar on this os

      [user:error,2018-08-18T03:55:49.367-07:00,ns_1@172.23.108.103:<0.9637.0>:ns_orchestrator:do_log_rebalance_completion:1117]Rebalance exited with reason {mover_crashed,
      {unexpected_exit,
      {'EXIT',<0.28031.797>,
      {{error,{badrpc,nodedown}},
      {gen_server,call,
      [{'janitor_agent-DISTRICT',
      'ns_1@172.23.96.56'},
      {if_rebalance,<0.1608.796>,
      {update_vbucket_state,135,active,
      undefined,undefined}},
      infinity]}}}}}

      • swap of analytics node

      [2018-08-18T04:24:55-07:00, sequoiatools/couchbase-cli:97ad55] server-add -c 172.23.108.103:8091 --server-add 172.23.96.148:8091 -u Administrator -p password --server-add-username Administrator --server-add-password password --services analytics
      [2018-08-18T04:25:20-07:00, sequoiatools/couchbase-cli:d8b44d] rebalance -c 172.23.108.103:8091 --server-remove 172.23.99.25 -u Administrator -p password
       
      Error occurred on container - sequoiatools/couchbase-cli:[rebalance -c 172.23.108.103:8091 --server-remove 172.23.99.25 -u Administrator -p password]
       
      docker logs d8b44d
      docker start d8b44d
       
      *Unable to display progress bar on this os

      [user:error,2018-08-18T04:35:38.929-07:00,ns_1@172.23.108.103:<0.9637.0>:ns_orchestrator:do_log_rebalance_completion:1117]Rebalance exited with reason {mover_crashed,
      {unexpected_exit,
      {'EXIT',<0.26855.819>,
      {noproc,
      {gen_server,call,
      [{'janitor_agent-DISTRICT',
      'ns_1@172.23.96.56'},
      {if_rebalance,<0.18658.819>,
      {inhibit_view_compaction,<0.18658.819>}},
      infinity]}}}}}

      Note: With alice we are able to complete first cycle first time. Hence its not regression

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              vikas.chaudhary Vikas Chaudhary
              vikas.chaudhary Vikas Chaudhary
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty