Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-2911

Two thirds of vbuckets were in dead state and VBucketMigrators are exited and restarted repeatedly

    XMLWordPrintable

Details

    Description

      I did the following steps:

      • Set up two node cluster (6 GB RAM on each node) with one replica
      • Load 10 millions items by running memcachetest and mc_loader
      • Add another node and rebalance
      • Validate the items loaded by mc_loader

      Rebalance was successful, but after some time, vbucket migrator processes on all three nodes were killed and restarted repeatedly. The data validation was also failed as 3 million items were missing in the system.

      The output from running vbucketctl command to see how many vbuckets were active on each host:

      chiyoung:management chiyoung$ python ./vbucketctl 10.2.1.51:11210 list | grep active | wc -l
      0
      chiyoung:management chiyoung$ python ./vbucketctl 10.2.1.53:11210 list | grep active | wc -l
      342
      chiyoung:management chiyoung$ python ./vbucketctl 10.2.1.54:11210 list | grep active | wc -l
      0

      Only 342 vbuckets were active on one node, and the rest of them were all in dead state.

      The following is the log snippet from the web UI console:

      Bucket "default" loaded on node 'ns_1@10.2.1.51' in 354 seconds. ns_memcached001 13:08:51 - Thu Nov 18, 2010
      Control connection to memcached on 'ns_1@10.2.1.51' disconnected: {{badmatch,
      {error,
      closed}},
      [

      {mc_client_binary, stats_recv, 4}, {mc_client_binary, stats,4}, {ns_memcached, handle_call, 3}, {gen_server, handle_msg, 5}, {proc_lib, init_p_do_apply, 3}]} ns_memcached004 13:02:56 - Thu Nov 18, 2010
      Port server memcached on node 'ns_1@10.2.1.51' exited with status 136. Restarting. Messages: Backfilling token for eq_tapq:anon_1 went invalid. Stopping backfill.
      Backfilling token for eq_tapq:anon_1 went invalid. Stopping backfill.
      Backfilling token for eq_tapq:anon_1 went invalid. Stopping backfill. ns_port_server000 13:02:56 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.53' exited with status 70. Restarting. Messages: Connecting to {Sock 10.2.1.53:11210}
      Authenticating towards: {Sock 10.2.1.53:11210}
      Authenticated towards: {Sock 10.2.1.53:11210} (repeated 19 times) ns_port_server000 13:01:28 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.51' exited with status 78. Restarting. Messages: Connecting to {Sock 10.2.1.54:11210}
      Failed to connect to host: Failed to connect to [10.2.1.54:11210] (repeated 12 times) ns_port_server000 13:00:52 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.51' exited with status 70. Restarting. Messages: Authenticating towards: {Sock 10.2.1.51:11210}
      Authenticated towards: {Sock 10.2.1.51:11210}
      Downstream connection closed.. shutdown upstream (repeated 6 times) ns_port_server000 13:00:52 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.53' exited with status 70. Restarting. Messages: Connecting to {Sock 10.2.1.53:11210}
      Authenticating towards: {Sock 10.2.1.53:11210}
      Authenticated towards: {Sock 10.2.1.53:11210} ns_port_server000 12:55:39 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.53' exited with status 74. Restarting. Messages: An error occured on the downstream connection..
      Downstream connection closed.. shutdown upstream
      Had 360 pending messages at exit. ns_port_server000 12:55:34 - Thu Nov 18, 2010
      Node 'ns_1@10.2.1.53' saw that node 'ns_1@10.2.1.54' came up. ns_node_disco004 12:55:33 - Thu Nov 18, 2010
      Node 'ns_1@10.2.1.54' saw that node 'ns_1@10.2.1.53' came up. ns_node_disco004 12:55:32 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.51' exited with status 70. Restarting. Messages: Authenticating towards: {Sock 10.2.1.51:11210}
      Authenticated towards: {Sock 10.2.1.51:11210}
      Downstream connection closed.. shutdown upstream ns_port_server000 12:55:31 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.51' exited with status 78. Restarting. Messages: Connecting to {Sock 10.2.1.54:11210}
      Failed to connect to host: Failed to connect to [10.2.1.54:11210] ns_port_server000 12:55:31 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.51' exited with status 74. Restarting. Messages: An error occured on the downstream connection..
      Downstream connection closed.. shutdown upstream
      Had 1270 pending messages at exit. ns_port_server000 12:55:31 - Thu Nov 18, 2010
      Port server memcached on node 'ns_1@10.2.1.54' exited with status 137. Restarting. Messages: sqlite error: SQL logic error or missing database
      sqlite error: SQL logic error or missing database
      sqlite error: SQL logic error or missing database ns_port_server000 12:55:31 - Thu Nov 18, 2010
      Control connection to memcached on 'ns_1@10.2.1.54' disconnected: {{badmatch,
      {error,
      closed}},
      [{mc_client_binary,stats_recv,4}

      ,

      {mc_client_binary, stats,4}, {ns_memcached, handle_call, 3}, {gen_server, handle_msg, 5}, {proc_lib, init_p_do_apply, 3}]} ns_memcached004 12:55:31 - Thu Nov 18, 2010
      Node 'ns_1@10.2.1.54' saw that node 'ns_1@10.2.1.51' came up. ns_node_disco004 12:55:29 - Thu Nov 18, 2010
      Node 'ns_1@10.2.1.51' saw that node 'ns_1@10.2.1.54' came up. ns_node_disco004 12:55:29 - Thu Nov 18, 2010
      Node 'ns_1@10.2.1.51' saw that node 'ns_1@10.2.1.54' went down. ns_node_disco005 12:49:15 - Thu Nov 18, 2010
      Node 'ns_1@10.2.1.53' saw that node 'ns_1@10.2.1.54' went down. ns_node_disco005 12:49:14 - Thu Nov 18, 2010
      Bucket "default" loaded on node 'ns_1@10.2.1.51' in 393 seconds. ns_memcached001 12:27:27 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.54' exited with status 70. Restarting. Messages: Connecting to {Sock 10.2.1.54:11210}
      Authenticating towards: {Sock 10.2.1.54:11210}
      Authenticated towards: {Sock 10.2.1.54:11210} (repeated 1 times) ns_port_server000 12:26:43 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.54' exited with status 70. Restarting. Messages: Authenticating towards: {Sock 10.2.1.54:11210}
      Authenticated towards: {Sock 10.2.1.54:11210}
      Downstream connection closed.. shutdown upstream (repeated 14 times) ns_port_server000 12:26:43 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.53' exited with status 78. Restarting. Messages: Connecting to {Sock 10.2.1.51:11210}
      Failed to connect to host: Failed to connect to [10.2.1.51:11210] (repeated 15 times) ns_port_server000 12:26:28 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.53' exited with status 70. Restarting. Messages: Connecting to {Sock 10.2.1.53:11210}
      Authenticating towards: {Sock 10.2.1.53:11210}
      Authenticated towards: {Sock 10.2.1.53:11210} (repeated 2 times) ns_port_server000 12:26:28 - Thu Nov 18, 2010
      Membase Server has started on web port 8091 on node 'ns_1@10.2.1.51'. menelaus_app001 12:20:53 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.53' exited with status 74. Restarting. Messages: An error occured on the downstream connection..
      Downstream connection closed.. shutdown upstream
      Had 173 pending messages at exit. ns_port_server000 12:20:52 - Thu Nov 18, 2010
      Control connection to memcached on 'ns_1@10.2.1.53' disconnected: {{badmatch,
      {error,
      timeout}},
      [{mc_client_binary, cmd_binary_vocal_recv, 5}, {mc_client_binary, delete_vbucket, 2},{ns_memcached,handle_call,3}, {gen_server, handle_msg, 5}, {proc_lib, init_p_do_apply, 3}]} (repeated 8 times) ns_memcached004 12:11:28 - Thu Nov 18, 2010
      Bucket "default" loaded on node 'ns_1@10.2.1.53' in 0 seconds. (repeated 8 times) ns_memcached001 12:11:28 - Thu Nov 18, 2010
      Bucket "default" loaded on node 'ns_1@10.2.1.53' in 0 seconds. ns_memcached001 12:05:29 - Thu Nov 18, 2010
      Control connection to memcached on 'ns_1@10.2.1.53' disconnected: {{badmatch,
      {error,
      timeout}},
      [{mc_client_binary, cmd_binary_vocal_recv, 5}, {mc_client_binary, delete_vbucket, 2}, {ns_memcached, handle_call, 3},{gen_server,handle_msg,5}, {proc_lib, init_p_do_apply, 3}]} ns_memcached004 12:05:29 - Thu Nov 18, 2010
      Bucket "default" loaded on node 'ns_1@10.2.1.54' in 2698 seconds. ns_memcached001 11:21:45 - Thu Nov 18, 2010
      Bucket "default" loaded on node 'ns_1@10.2.1.53' in 657 seconds. ns_memcached001 10:59:50 - Thu Nov 18, 2010
      Control connection to memcached on 'ns_1@10.2.1.53' disconnected: {{badmatch,
      {error,
      closed}},
      [{mc_client_binary, stats_recv, 4},{mc_client_binary,stats,4}

      ,

      {ns_memcached, handle_call, 3}, {gen_server, handle_msg, 5}, {proc_lib, init_p_do_apply, 3}]} ns_memcached004 10:48:53 - Thu Nov 18, 2010
      Port server memcached on node 'ns_1@10.2.1.53' exited with status 136. Restarting. Messages: Backfilling token for eq_tapq:anon_21 went invalid. Stopping backfill.
      Backfilling token for eq_tapq:anon_21 went invalid. Stopping backfill.
      Backfilling token for eq_tapq:anon_21 went invalid. Stopping backfill. ns_port_server000 10:48:53 - Thu Nov 18, 2010
      Bucket "default" loaded on node 'ns_1@10.2.1.51' in 167 seconds. ns_memcached001 10:45:10 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.53' exited with status 70. Restarting. Messages: Authenticating towards: {Sock 10.2.1.53:11210}
      Authenticated towards: {Sock 10.2.1.53:11210}
      Downstream connection closed.. shutdown upstream (repeated 18 times) ns_port_server000 10:42:28 - Thu Nov 18, 2010
      Port server memcached on node 'ns_1@10.2.1.51' exited with status 136. Restarting. Messages: Backfilling token for eq_tapq:anon_2 went invalid. Stopping backfill.
      Backfilling token for eq_tapq:anon_2 went invalid. Stopping backfill.
      Backfilling token for eq_tapq:anon_2 went invalid. Stopping backfill. ns_port_server000 10:42:21 - Thu Nov 18, 2010
      Control connection to memcached on 'ns_1@10.2.1.51' disconnected: {{badmatch,
      {error,
      closed}},
      [{mc_client_binary, stats_recv, 4}, {mc_client_binary, stats,4},{ns_memcached,handle_call,3}

      ,

      {gen_server, handle_msg, 5}, {proc_lib, init_p_do_apply, 3}]} ns_memcached004 10:42:21 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.51' exited with status 78. Restarting. Messages: Connecting to {Sock 10.2.1.54:11210}
      Failed to connect to host: Failed to connect to [10.2.1.54:11210] (repeated 9 times) ns_port_server000 10:41:52 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.51' exited with status 70. Restarting. Messages: Authenticating towards: {Sock 10.2.1.51:11210}
      Authenticated towards: {Sock 10.2.1.51:11210}
      Downstream connection closed.. shutdown upstream (repeated 9 times) ns_port_server000 10:41:52 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.53' exited with status 70. Restarting. Messages: Connecting to {Sock 10.2.1.53:11210}
      Authenticating towards: {Sock 10.2.1.53:11210}
      Authenticated towards: {Sock 10.2.1.53:11210} ns_port_server000 10:36:47 - Thu Nov 18, 2010
      Control connection to memcached on 'ns_1@10.2.1.54' disconnected: {{badmatch,
      {error,
      closed}},
      [{mc_client_binary, stats_recv, 4}, {mc_client_binary, stats,4}, {ns_memcached, handle_call, 3},{gen_server,handle_msg,5}

      ,

      {proc_lib, init_p_do_apply, 3}

      ]} ns_memcached004 10:36:47 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.53' exited with status 70. Restarting. Messages: Authenticating towards:

      {Sock 10.2.1.53:11210}

      Authenticated towards:

      {Sock 10.2.1.53:11210}

      Downstream connection closed.. shutdown upstream ns_port_server000 10:36:47 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.53' exited with status 74. Restarting. Messages: Failed to read from stream: Connection reset by peer
      An error occured on the downstream connection..
      Downstream connection closed.. shutdown upstream ns_port_server000 10:36:47 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.51' exited with status 70. Restarting. Messages: Authenticating towards:

      {Sock 10.2.1.51:11210}

      Authenticated towards:

      {Sock 10.2.1.51:11210}

      Downstream connection closed.. shutdown upstream ns_port_server000 10:36:47 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.51' exited with status 78. Restarting. Messages: Connecting to

      {Sock 10.2.1.54:11210}

      Failed to connect to host: Failed to connect to [10.2.1.54:11210] ns_port_server000 10:36:47 - Thu Nov 18, 2010
      Port server vbucketmigrator on node 'ns_1@10.2.1.51' exited with status 74. Restarting. Messages: Failed to read from stream: Connection reset by peer
      An error occured on the downstream connection..
      Downstream connection closed.. shutdown upstream ns_port_server000 10:36:47 - Thu Nov 18, 2010
      Port server memcached on node 'ns_1@10.2.1.54' exited with status 136. Restarting. Messages: 35: FATAL: The engine does not support tap
      35: FATAL: The engine does not support tap
      35: FATAL: The engine does not support tap ns_port_server000 10:36:47 - Thu Nov 18, 2010
      Bucket "default" loaded on node 'ns_1@10.2.1.54' in 3270 seconds. ns_memcached001 10:22:41 - Thu Nov 18, 2010
      Bucket "default" loaded on node 'ns_1@10.2.1.53' in 832 seconds. ns_memcached001 09:42:22 - Thu Nov 18, 2010

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              trond Trond Norbye
              chiyoung Chiyoung Seo (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty