Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19394

Repeated graceful failover fails

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Critical
    • 4.5.0
    • 4.5.0
    • ns_server
    • None
    • Untriaged
    • Yes

    Description

      Running a 4 node cluster with cluster_run on OSx.
      In the UI

      • select failover node 1
      • select graceful failover
      • select delta recovery
        When the failover has finished
        -select rebalance
        When the rebalance has finished
      • Select failover of node 1 again same as above

      The UI now hangs showing "Failing over 1 node" but no progress is made.

      ns_rebalancer is blocked in wait_for_mover_tail

      erlang:process_info(Pid).
      [{current_function,{ns_rebalancer,wait_for_mover_tail,2}},
       {initial_call,{proc_lib,init_p,5}},
       {status,waiting},
       {message_queue_len,0},
       {messages,[]},
       {links,[<0.1368.0>,<0.18016.2>]},
       {dictionary,[{'$ancestors',[<0.1368.0>,ns_orchestrator_sup,
                                   mb_master_sup,mb_master,<0.690.0>,ns_server_sup,
                                   ns_server_nodes_sup,<0.155.0>,ns_server_cluster_sup,
                                   <0.89.0>]},
                    {'$initial_call',{erlang,apply,2}}]},
       {trap_exit,false},
       {error_handler,error_handler},
       {priority,normal},
       {group_leader,<0.88.0>},
       {total_heap_size,318186},
       {heap_size,121536},
       {stack_size,24},
       {reductions,256621},
       {garbage_collection,[{min_bin_vheap_size,46422},
                            {min_heap_size,233},
                            {fullsweep_after,512},
                            {minor_gcs,1}]},
       {suspending,[]}]
      

      ns_vbucket_mover got an empty Actions list

      [ns_server:debug,2016-04-26T18:50:20.076-05:00,n_0@192.168.1.70:<0.18016.2>:ns_vbucket_mover:spawn_workers:326]Got actions: []
      

      But vbucket_move_scheduler:is_done, does not return true. vbucket_move_scheduler’s state seems to have 490 moves_left.
      It is blocked in gen_server:loop.

      I have confirmed that this works in sherlock

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Anders.Nygren Anders Nygren (Inactive)
            Anders.Nygren Anders Nygren (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty