Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-36484

[Volume Test] Rebalance Fails after delta recovery of the failed over node

    XMLWordPrintable

Details

    Description

      Steps to Reproduce:

      1. Create a 7 node cluster.

        +----------------+----------+--------------+
        | Nodes          | Services | Status       |
        +----------------+----------+--------------+
        | 172.23.106.134 | [u'kv']  | Cluster node |
        | 172.23.106.136 | None     | <--- IN ---  |
        | 172.23.106.137 | None     | <--- IN ---  |
        | 172.23.106.138 | None     | <--- IN ---  |
        | 172.23.105.168 | None     | <--- IN ---  |
        | 172.23.106.82  | None     | <--- IN ---  |
        | 172.23.106.83  | None     | <--- IN ---  |
        +----------------+----------+--------------+
         

      1. Create a membase bucket with eviction policy=valueOnly, compression=off, replicas=1.
      2. Load 1K docs in the bucket with durability = MAJORITY.
      3. Rebalance In 1 node (172.23.106.86) with 200 creates, 400 updates, 200 deletes in parallel with durability = MAJORITY.
      4. Rebalance Out 1 node (172.23.106.136) with 200 creates, 400 updates, 200 deletes in parallel with durability = MAJORITY.
      5. Rebalance In 2 nodes (172.23.106.136, 172.23.106.85)  and Rebalance Out 1 node (172.23.106.82) with 200 creates, 400 updates, 200 deletes in parallel with durability = MAJORITY.
      6. Swap Rebalance 1 node (IN=172.23.106.82, OUT=172.23.106.83) with 200 creates, 400 updates, 200 deletes in parallel with durability = MAJORITY.
      7. Update replica number of bucket from 1 to 2.
      8. Rebalance In 1 node (172.23.106.83) and perform 200 creates, 400 updates, 200 deletes in parallel with durability = MAJORITY.
      9. Start Rebalancing the cluster with 200 creates, 400 updates, 200 deletes in parallel.
      10. While Step 10 is in progress, stop the memcached process for 20 seconds and start it again.
      11. Failover a node.(172.23.106.83)
      12. Rebalance Out the node failed over in Step 12 and perform 200 creates, 400 updates, 200 deletes in parallel with durability=MAJORITY.
      13. Rebalance In a node. (172.23.106.83)
      14. Failover a node. (172.23.106.83)
      15. Perform full Recovery of the node failed over in Step 14.(172.23.106.83)
      16. Perform Rebalance operation with 200 creates, 400 updates, 200 deletes in parallel with durability=MAJORITY.
      17. Failover a node. (172.23.106.83)
      18. Perform Delta Recovery of the node failed over in Step 17. (172.23.106.83)
      19. Perform Rebalance operation with 200 creates, 400 updates, 200 deletes in parallel with durability=MAJORITY.

      Rebalance fails with mover_crashed.

      Error Messages:

      Rebalance exited with reason {mover_crashed,
      {unexpected_exit,
      {'EXIT',<0.25354.24>,
      {{{{badmatch,{error,closed}},
      [{mc_client_binary,cmd_vocal_recv,5,
      [{file,"src/mc_client_binary.erl"},
      {line,155}]},
      {mc_client_binary,set_vbucket,4,
      [{file,"src/mc_client_binary.erl"},
      {line,394}]},
      {ns_memcached,do_handle_call,3,
      [{file,"src/ns_memcached.erl"},
      {line,547}]},
      {ns_memcached,worker_loop,3,
      [{file,"src/ns_memcached.erl"},
      {line,246}]},
      {proc_lib,init_p_do_apply,3,
      [{file,"proc_lib.erl"},{line,247}]}]},
      {gen_server,call,
      ['ns_memcached-GleamBookUsers',
      {set_vbucket,698,active,
      [['ns_1@172.23.106.83',
      'ns_1@172.23.106.136',
      'ns_1@172.23.106.85']]},
      180000]}},
      {gen_server,call,
      [{'janitor_agent-GleamBookUsers',
      'ns_1@172.23.106.83'},
      {if_rebalance,<0.20969.24>,
      {dcp_takeover,'ns_1@172.23.106.137',1016}},
      infinity]}}}}}.
      Rebalance Operation Id = 666b89fd799eb3304812b51e9045a7b1 

       Worker <0.24826.24> (for action {move,{1016,
      ['ns_1@172.23.106.137',
      'ns_1@172.23.106.86',
      'ns_1@172.23.106.83'],
      ['ns_1@172.23.106.83',
      'ns_1@172.23.106.137',
      'ns_1@172.23.106.86'],
      []}}) exited with reason {unexpected_exit,
      {'EXIT',
      <0.25354.24>,
      {{{{badmatch,
      {error,
      closed}},
      [{mc_client_binary,
      cmd_vocal_recv,
      5,
      [{file,
      "src/mc_client_binary.erl"},
      {line,
      155}]},
      {mc_client_binary,
      set_vbucket,
      4,
      [{file,
      "src/mc_client_binary.erl"},
      {line,
      394}]},
      {ns_memcached,
      do_handle_call,
      3,
      [{file,
      "src/ns_memcached.erl"},
      {line,
      547}]},
      {ns_memcached,
      worker_loop,
      3,
      [{file,
      "src/ns_memcached.erl"},
      {line,
      246}]},
      {proc_lib,
      init_p_do_apply,
      3,
      [{file,
      "proc_lib.erl"},
      {line,
      247}]}]},
      {gen_server,
      call,
      ['ns_memcached-GleamBookUsers',
      {set_vbucket,
      698,
      active,
      [['ns_1@172.23.106.83',
      'ns_1@172.23.106.136',
      'ns_1@172.23.106.85']]},
      180000]}},
      {gen_server,
      call,
      [{'janitor_agent-GleamBookUsers',
      'ns_1@172.23.106.83'},
      {if_rebalance,
      <0.20969.24>,
      {dcp_takeover,
      'ns_1@172.23.106.137',
      1016}},
      infinity]}}}} 

      Attachments

        For Gerrit Dashboard: MB-36484
        # Subject Branch Project Status CR V

        Activity

          People

            prateek.kumar Prateek Kumar (Inactive)
            prateek.kumar Prateek Kumar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty