Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-6638

beam.smp ( erl_bestfit_alloc.c:435 ) segmentation fault when rebalancing a new node to a cluster with 2 buckets and 1 design doc per bucket

    XMLWordPrintable

Details

    Description

      Cluster information:

      • Load 9 million items to both buckets
      • Cluster has 2 buckets, default (3GB) and saslbucket (3GB)
      • Each bucket has one doc and 2 views for each doc (default d1 and saslbucket d11)
      • Create cluster with 10 nodes installed couchbase server 2.0.0-1708

      10.3.121.13
      10.3.121.14
      10.3.121.15
      10.3.121.16
      10.3.121.17
      10.3.121.20
      10.3.121.22
      10.3.121.24
      10.3.121.25
      10.3.121.23

      • Data path /data
      • View path /data

      Doing rebalanc in, out, failover.
      Last failed was rebalance add 2 nodes, 23 and 24 into cluster.
      During rebalance, node 23 suddenly down. Rebalance failed as expected.
      Then later, memcached on node 13 was killed and restart.

      [ns_server:debug,2012-09-12T17:06:05.705,ns_1@10.3.121.13:<0.28110.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription

      {ns_config_events,<0.28109.33>} exited with reason {timeout,
      {gen_server,
      call,
      [ns_node_disco,
      nodes_wanted]}}
      [ns_server:debug,2012-09-12T17:06:05.712,ns_1@10.3.121.13:<0.28166.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {ns_config_events,<0.28165.33>} exited with reason shutdown
      [ns_server:error,2012-09-12T17:06:05.718,ns_1@10.3.121.13:<0.29417.33>:ns_memcached:verify_report_long_call:274]call {stats,<<>>} took too long: 5881903 us
      [ns_server:debug,2012-09-12T17:06:05.720,ns_1@10.3.121.13:<0.28132.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {ns_node_disco_events,<0.28130.33>} exited with reason shutdown
      [ns_server:debug,2012-09-12T17:06:05.732,ns_1@10.3.121.13:<0.28110.33>:ns_pubsub:do_subscribe_link:149]Deleting {ns_config_events,<0.28109.33>}

      event handler: ok
      [ns_server:debug,2012-09-12T17:06:05.733,ns_1@10.3.121.13:<0.28166.33>:ns_pubsub:do_subscribe_link:149]Deleting

      {ns_config_events,<0.28165.33>}

      event handler: ok
      [ns_server:debug,2012-09-12T17:06:05.739,ns_1@10.3.121.13:<0.5306.465>:compaction_daemon:do_chain_compactors:500]Got exit signal from parent:

      {'EXIT',<0.4974.465>,shutdown}

      [ns_server:debug,2012-09-12T17:06:05.740,ns_1@10.3.121.13:<0.28132.33>:ns_pubsub:do_subscribe_link:149]Deleting

      {ns_node_disco_events,<0.28130.33>}

      event handler: ok
      [ns_server:debug,2012-09-12T17:06:05.740,ns_1@10.3.121.13:ns_node_disco:ns_node_disco:init:103]Initting ns_node_disco with ['ns_1@10.3.121.15','ns_1@10.3.121.16',
      'ns_1@10.3.121.17','ns_1@10.3.121.22',
      'ns_1@10.3.121.20','ns_1@10.3.121.14',
      'ns_1@10.3.121.25','ns_1@10.3.121.24']
      [ns_server:debug,2012-09-12T17:06:05.741,ns_1@10.3.121.13:<0.28129.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription

      {ns_config_events,<0.28128.33>} exited with reason shutdown
      [ns_server:debug,2012-09-12T17:06:05.743,ns_1@10.3.121.13:<0.19269.466>:janitor_agent:new_style_query_vbucket_states_loop:111]Exception from query_vbucket_states of "saslbucket":'ns_1@10.3.121.23'
      {'EXIT',{{nodedown,'ns_1@10.3.121.23'},
      {gen_server,call,
      [{'janitor_agent-saslbucket','ns_1@10.3.121.23'},
      query_vbucket_states,infinity]}}}
      [ns_server:debug,2012-09-12T17:06:05.745,ns_1@10.3.121.13:<0.5309.465>:compaction_daemon:do_chain_compactors:500]Got exit signal from parent: {'EXIT',<0.5306.465>,shutdown}
      [ns_server:debug,2012-09-12T17:06:05.746,ns_1@10.3.121.13:<0.28126.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {ns_config_events,<0.28125.33>} exited with reason shutdown
      [ns_server:debug,2012-09-12T17:06:05.761,ns_1@10.3.121.13:<0.19269.466>:janitor_agent:new_style_query_vbucket_states_loop_next_step:116]Waiting for "saslbucket" on 'ns_1@10.3.121.23'
      [ns_server:debug,2012-09-12T17:06:05.897,ns_1@10.3.121.13:<0.28129.33>:ns_pubsub:do_subscribe_link:149]Deleting {ns_config_events,<0.28128.33>}

      event handler: ok
      [ns_server:debug,2012-09-12T17:06:05.898,ns_1@10.3.121.13:<0.14047.466>:compaction_daemon:do_spawn_view_index_compactor:742]Got exit signal from parent:

      {'EXIT',<0.5309.465>,shutdown}

      [ns_server:debug,2012-09-12T17:06:05.903,ns_1@10.3.121.13:ns_cookie_manager:ns_cookie_manager:do_cookie_sync:115]ns_cookie_manager do_cookie_sync
      [ns_server:debug,2012-09-12T17:06:05.903,ns_1@10.3.121.13:<0.28126.33>:ns_pubsub:do_subscribe_link:149]Deleting

      {ns_config_events,<0.28125.33>}

      event handler: ok
      [couchdb:info,2012-09-12T17:06:05.906,ns_1@10.3.121.13:<0.5719.436>:couch_log:info:39]Set view `default`, replica group `_design/d1`, canceling compaction (pid <0.18999.466>)
      [ns_server:debug,2012-09-12T17:06:06.000,ns_1@10.3.121.13:ns_cookie_manager:ns_cookie_manager:do_cookie_save:152]saving cookie to "/opt/couchbase/var/lib/couchbase/couchbase-server.cookie"
      [ns_server:debug,2012-09-12T17:06:06.002,ns_1@10.3.121.13:<0.28122.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription

      {ns_stats_event,<0.28121.33>} exited with reason shutdown
      [ns_server:debug,2012-09-12T17:06:06.009,ns_1@10.3.121.13:<0.28122.33>:ns_pubsub:do_subscribe_link:149]Deleting {ns_stats_event,<0.28121.33>}

      event handler: ok
      [ns_server:debug,2012-09-12T17:06:06.040,ns_1@10.3.121.13:<0.28120.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription

      {ns_tick_event,<0.28119.33>} exited with reason shutdown
      [ns_server:debug,2012-09-12T17:06:06.044,ns_1@10.3.121.13:<0.28648.34>:single_bucket_sup:top_loop:28]Delegating exit {'EXIT',<0.28117.33>,shutdown} to child supervisor: <0.28649.34>

      [ns_server:debug,2012-09-12T17:06:06.047,ns_1@10.3.121.13:ns_cookie_manager:ns_cookie_manager:do_cookie_save:154]attempted to save cookie to "/opt/couchbase/var/lib/couchbase/couchbase-server.cookie": ok
      [ns_server:debug,2012-09-12T17:06:06.074,ns_1@10.3.121.13:<0.28120.33>:ns_pubsub:do_subscribe_link:149]Deleting {ns_tick_event,<0.28119.33>}

      event handler: ok
      [ns_server:debug,2012-09-12T17:06:06.074,ns_1@10.3.121.13:<0.19276.466>:ns_node_disco:do_nodes_wanted_updated_fun:202]ns_node_disco: nodes_wanted updated: ['ns_1@10.3.121.13','ns_1@10.3.121.14',
      'ns_1@10.3.121.15','ns_1@10.3.121.16',
      'ns_1@10.3.121.17','ns_1@10.3.121.20',
      'ns_1@10.3.121.22','ns_1@10.3.121.23',
      'ns_1@10.3.121.24','ns_1@10.3.121.25'], with cookie: oetfisqvqrrxvaat
      [ns_server:debug,2012-09-12T17:06:06.079,ns_1@10.3.121.13:<0.28689.34>:ns_pubsub:do_subscribe_link:134]Parent process of subscription

      {ns_stats_event,<0.28688.34>} exited with reason shutdown
      [ns_server:debug,2012-09-12T17:06:06.080,ns_1@10.3.121.13:<0.28687.34>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {ns_tick_event,<0.28686.34>} exited with reason shutdown
      [ns_server:debug,2012-09-12T17:06:06.083,ns_1@10.3.121.13:<0.28689.34>:ns_pubsub:do_subscribe_link:149]Deleting {ns_stats_event,<0.28688.34>}

      event handler: ok
      [ns_server:debug,2012-09-12T17:06:06.084,ns_1@10.3.121.13:<0.28687.34>:ns_pubsub:do_subscribe_link:149]Deleting

      {ns_tick_event,<0.28686.34>}

      event handler: ok
      [stats:warn,2012-09-12T17:06:06.087,ns_1@10.3.121.13:<0.29425.33>:stats_collector:latest_tick:201]Dropped 7 ticks
      [rebalance:info,2012-09-12T17:06:06.091,ns_1@10.3.121.13:<0.25334.457>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!

      [rebalance:info,2012-09-12T17:06:06.120,ns_1@10.3.121.13:<0.23919.457>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!

      [ns_server:debug,2012-09-12T17:06:06.120,ns_1@10.3.121.13:<0.1936.453>:ns_pubsub:do_subscribe_link:149]Deleting

      {master_activity_events_ingress,<0.28014.33>}

      event handler:

      {error, module_not_found}

      [rebalance:info,2012-09-12T17:06:06.128,ns_1@10.3.121.13:<0.23192.457>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!

      [rebalance:info,2012-09-12T17:06:06.147,ns_1@10.3.121.13:<0.22947.457>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!

      [rebalance:info,2012-09-12T17:06:06.201,ns_1@10.3.121.13:<0.21996.457>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!

      [rebalance:info,2012-09-12T17:06:06.218,ns_1@10.3.121.13:<0.11912.457>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!

      [rebalance:info,2012-09-12T17:06:06.223,ns_1@10.3.121.13:<0.10815.457>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!

      [ns_server:debug,2012-09-12T17:06:06.228,ns_1@10.3.121.13:<0.28050.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription

      {buckets_events,<0.28049.33>} exited with reason {shutdown,
      {gen_server,
      call,
      ['ns_vbm_new_sup-default',
      which_children,
      infinity]}}
      [ns_server:debug,2012-09-12T17:06:06.239,ns_1@10.3.121.13:<0.28050.33>:ns_pubsub:do_subscribe_link:149]Deleting {buckets_events,<0.28049.33>}

      event handler: ok
      [user:info,2012-09-12T17:06:06.245,ns_1@10.3.121.13:'ns_memcached-default':ns_memcached:terminate:625]Shutting down bucket "default" on 'ns_1@10.3.121.13' for server shutdown
      [ns_server:info,2012-09-12T17:06:06.261,ns_1@10.3.121.13:'ns_memcached-default':ns_memcached:terminate:636]This bucket shutdown is not due to bucket deletion. Doing nothing
      [ns_server:debug,2012-09-12T17:06:06.276,ns_1@10.3.121.13:<0.28651.34>:ns_pubsub:do_subscribe_link:134]Parent process of subscription

      {ns_config_events,<0.28650.34>} exited with reason shutdown
      [ns_server:debug,2012-09-12T17:06:06.276,ns_1@10.3.121.13:<0.28663.34>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {mc_couch_events,<0.28650.34>} exited with reason shutdown
      [ns_server:debug,2012-09-12T17:06:06.276,ns_1@10.3.121.13:<0.28662.34>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {ns_node_disco_events,<0.28650.34>} exited with reason shutdown
      [ns_server:debug,2012-09-12T17:06:06.276,ns_1@10.3.121.13:<0.28648.34>:single_bucket_sup:top_loop:24]per-bucket supervisor for "default" died with reason shutdown

      [ns_server:debug,2012-09-12T17:06:06.279,ns_1@10.3.121.13:<0.28651.34>:ns_pubsub:do_subscribe_link:149]Deleting {ns_config_events,<0.28650.34>}

      event handler: ok
      [ns_server:debug,2012-09-12T17:06:06.281,ns_1@10.3.121.13:<0.28663.34>:ns_pubsub:do_subscribe_link:149]Deleting

      {mc_couch_events,<0.28650.34>}

      event handler: ok
      [ns_server:debug,2012-09-12T17:06:06.283,ns_1@10.3.121.13:<0.28662.34>:ns_pubsub:do_subscribe_link:149]Deleting

      {ns_node_disco_events,<0.28650.34>}

      event handler: ok
      [ns_server:debug,2012-09-12T17:06:06.283,ns_1@10.3.121.13:<0.29393.33>:single_bucket_sup:top_loop:28]Delegating exit

      {'EXIT',<0.28117.33>,shutdown}

      to child supervisor: <0.29394.33>

      [ns_server:debug,2012-09-12T17:06:06.299,ns_1@10.3.121.13:<0.29428.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription

      {ns_stats_event,<0.29427.33>} exited with reason shutdown
      [ns_server:debug,2012-09-12T17:06:06.300,ns_1@10.3.121.13:<0.29426.33>:ns_pubsub:do_subscribe_link:134]Parent process of subscription {ns_tick_event,<0.29425.33>} exited with reason shutdown
      [ns_server:debug,2012-09-12T17:06:06.305,ns_1@10.3.121.13:<0.29428.33>:ns_pubsub:do_subscribe_link:149]Deleting {ns_stats_event,<0.29427.33>}

      event handler: ok
      [ns_server:debug,2012-09-12T17:06:06.307,ns_1@10.3.121.13:<0.29426.33>:ns_pubsub:do_subscribe_link:149]Deleting

      {ns_tick_event,<0.29425.33>}

      event handler: ok
      [rebalance:info,2012-09-12T17:06:06.310,ns_1@10.3.121.13:<0.11978.466>:ebucketmigrator_srv:do_confirm_sent_messages:655]Got close ack!

      [ns_server:info,2012-09-12T17:06:06.323,ns_1@10.3.121.13:ns_port_memcached:ns_port_server:log:169]memcached<0.28094.33>: Thu Sep 13 00:06:06.117323 3: TAP (Consumer) eq_tapq:anon_509 - disconnected
      memcached<0.28094.33>: Thu Sep 13 00:06:06.123981 3: TAP (Consumer) eq_tapq:anon_1652 - disconnected
      memcached<0.28094.33>: Thu Sep 13 00:06:06.144392 3: TAP (Consumer) eq_tapq:anon_480 - disconnected
      memcached<0.28094.33>: Thu Sep 13 00:06:06.151955 3: TAP (Consumer) eq_tapq:anon_488 - disconnected
      memcached<0.28094.33>: Thu Sep 13 00:06:06.203038 3: TAP (Consumer) eq_tapq:anon_504 - disconnected
      memcached<0.28094.33>: Thu Sep 13 00:06:06.221683 3: TAP (Consumer) eq_tapq:anon_719 - disconnected
      memcached<0.28094.33>: Thu Sep 13 00:06:06.227307 3: TAP (Consumer) eq_tapq:anon_499 - disconnected

      memcached log from node 13 at the time it was killed

      Thu Sep 13 00:06:07.606021 3: Schedule cleanup of "eq_tapq:anon_20"
      Thu Sep 13 00:06:07.606058 3: Schedule cleanup of "eq_tapq:anon_34"
      Thu Sep 13 00:06:07.606083 3: Schedule cleanup of "eq_tapq:anon_36"
      Thu Sep 13 00:06:07.606115 3: Schedule cleanup of "eq_tapq:anon_40"
      Thu Sep 13 00:06:07.606142 3: Schedule cleanup of "eq_tapq:anon_730"
      Thu Sep 13 00:06:07.606165 3: Schedule cleanup of "eq_tapq:anon_2425"
      Thu Sep 13 00:06:21.096648 3: Had to wait 13 s for shutdown
      Thu Sep 13 00:06:31.465158 3: Shutting down tap connections!
      Thu Sep 13 00:06:31.465199 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.15"
      Thu Sep 13 00:06:31.465278 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.16"
      Thu Sep 13 00:06:31.465357 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.22"
      Thu Sep 13 00:06:31.465405 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.17"
      Thu Sep 13 00:06:31.465414 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.15 - Clear the tap queues by force
      Thu Sep 13 00:06:31.465441 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.20"
      Thu Sep 13 00:06:31.465471 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.14"
      Thu Sep 13 00:06:31.465492 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.25"
      Thu Sep 13 00:06:31.465517 3: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.121.24"
      Thu Sep 13 00:06:31.465604 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.16 - Clear the tap queues by force
      Thu Sep 13 00:06:31.467071 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.22 - Clear the tap queues by force
      Thu Sep 13 00:06:31.467363 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.20 - Clear the tap queues by force
      Thu Sep 13 00:06:31.467606 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.24 - Clear the tap queues by force
      Thu Sep 13 00:06:31.467710 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.25 - Clear the tap queues by force
      Thu Sep 13 00:06:31.467917 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.17 - Clear the tap queues by force
      Thu Sep 13 00:06:31.468102 3: TAP (Producer) eq_tapq:replication_ns_1@10.3.121.14 - Clear the tap queues by force

      Link to diags of all nodes https://s3.amazonaws.com/packages.couchbase/diag-logs/orange/201209/10nodes-1708-memcached-killed-after-reb-failed-20120912.tgz

      Link to atop of node 13 https://s3.amazonaws.com/packages.couchbase/atop-files/orange/201209/atop-node13-20120912

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            damien damien (Inactive)
            thuan Thuan Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                PagerDuty