Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7865

swap rebalance 1.8.1 -> 2.0.1: addNode error 500 reason: unknown /Connection attempt from disallowed node

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Major
    • 2.0.1
    • 2.0.1
    • ns_server
    • Security Level: Public
    • None
    • centos-64 , 1.8.1-942-rel -> 2.0.1-170-rel

    Description

      ./testrunner -i /tmp/upgrade.ini get-logs=False,upgrade_version=2.0.1-170-rel,initial_vbuckets=64 -t newupgradetests.MultiNodesUpgradeTests.online_upgrade_swap_rebalance,initial_version=1.8.1-942-rel,standard_buckets=1,items=500000,max_verify=1000,GROUP=1_8;ONLINE

      steps:
      1. 2 node with 1.8.1-942: 10.3.3.14 & 10.3.3.11
      2 bucket*500K items
      2. install 2 node with 2.0.1-170-rel: 10.3.3.13 & 10.3.3.13

      3. [2013-03-04 13:32:37,568] - [rest_client:721] INFO - adding remote node @10.3.3.13:8091 to this cluster @10.3.3.11:8091
      [2013-03-04 13:32:39,028] - [rest_client:804] INFO - rebalance params : password=password&ejectedNodes=ns_1%4010.3.3.11&user=Administrator&knownNodes=ns_1%4010.3.3.11%2Cns_1%4010.3.3.14%2Cns_1%4010.3.3.13

      4.2013-03-04 13:36:55,544] - [newupgradetests:407] INFO - Swap rebalance: rebalance out 1 old version nodes, rebalance in 1 2.0 Nodes
      [2013-03-04 13:36:55,545] - [basetestcase:147] INFO - sleep for 10 secs. ...
      [2013-03-04 13:37:05,691] - [rest_client:819] INFO - /diag/eval status: True content: 'ns_1@10.3.3.13' command: node(global:whereis_name(ns_orchestrator))
      [2013-03-04 13:37:05,799] - [newupgradetests:420] INFO - 2.0 Node 10.3.3.13 becomes the master
      [2013-03-04 13:37:06,558] - [task:242] INFO - adding node 10.3.3.16:8091 to cluster
      [2013-03-04 13:37:06,559] - [rest_client:721] INFO - adding remote node @10.3.3.16:8091 to this cluster @10.3.3.14:8091
      [2013-03-04 13:37:36,569] - [rest_client:578] ERROR - http://10.3.3.14:8091/controller/addNode error 500 reason: unknown ["Unexpected server error, request logged."]
      [2013-03-04 13:37:36,571] - [rest_client:741] ERROR - add_node error : ["Unexpected server error, request logged."]

      on 10.3.3.14:

      [ns_server:debug] [2013-03-04 13:41:32] [ns_1@10.3.3.14:<0.14189.0>:ns_node_disco:do_nodes_wanted_updated_fun:182] ns_node_disco: nodes_wanted updated: ['ns_1@10.3.3.13','ns_1@10.3.3.14',
      'ns_1@10.3.3.16'], with cookie: ffswxwgaaiayftjf
      [ns_server:debug] [2013-03-04 13:41:32] [ns_1@10.3.3.14:<0.14189.0>:ns_node_disco:do_nodes_wanted_updated_fun:188] ns_node_disco: nodes_wanted pong: ['ns_1@10.3.3.13','ns_1@10.3.3.14',
      'ns_1@10.3.3.16'], with cookie: ffswxwgaaiayftjf
      [menelaus:warn] [2013-03-04 13:41:33] [ns_1@10.3.3.14:<0.14035.0>:menelaus_web:loop:334] Server error during processing: ["web request failed",

      {path,"/controller/addNode"}

      ,

      {type,exit}

      ,
      {what,
      {timeout,
      {gen_server,call,
      [ns_cluster,
      {add_node,"10.3.3.16",8091,
      {"Administrator","password"}},
      30000]}}},
      {trace,
      [

      {gen_server,call,3},
      {ns_cluster,add_node,3},
      {menelaus_web,handle_add_node,1},
      {menelaus_web,loop,3},
      {mochiweb_http,headers,5},
      {proc_lib,init_p_do_apply,3}]}]
      [ns_server:debug] [2013-03-04 13:41:33] [ns_1@10.3.3.14:ns_bucket_worker:ns_bucket_sup:update_childs:91] Stopping child for dead bucket: {{per_bucket_sup,"default"},
      <0.630.0>,supervisor,
      [single_bucket_sup]}

      [ns_server:debug] [2013-03-04 13:41:33] [ns_1@10.3.3.14:ns_config_log:ns_config_log:log_common:111] config change:
      buckets ->
      [{configs,[{"standard_bucket0",
      [{map,[]},
      {fastForwardMap,[]},
      {num_replicas,1},
      {ram_quota,736100352},
      {auth_type,none},
      {moxi_port,11214},
      {type,membase},
      {num_vbuckets,64},
      {servers,['ns_1@10.3.3.14','ns_1@10.3.3.13']}]}]}]
      [ns_server:debug] [2013-03-04 13:41:33] [ns_1@10.3.3.14:<0.630.0>:single_bucket_sup:top_loop:28] Delegating exit {'EXIT',<0.575.0>,shutdown} to child supervisor: <0.631.0>

      [ns_server:debug] [2013-03-04 13:41:33] [ns_1@10.3.3.14:<0.643.0>:ns_pubsub:do_subscribe_link:132] Parent process exited with reason shutdown
      [ns_server:debug] [2013-03-04 13:41:33] [ns_1@10.3.3.14:<0.641.0>:ns_pubsub:do_subscribe_link:132] Parent process exited with reason shutdown
      [ns_server:debug] [2013-03-04 13:41:33] [ns_1@10.3.3.14:<0.641.0>:ns_pubsub:do_subscribe_link:145] Deleting {ns_pubsub,#Ref<0.0.0.4996>} event handler: ok
      [ns_server:debug] [2013-03-04 13:41:33] [ns_1@10.3.3.14:<0.643.0>:ns_pubsub:do_subscribe_link:145] Deleting {ns_pubsub,#Ref<0.0.0.5024>} event handler: ok
      [rebalance:warn] [2013-03-04 13:41:33] [ns_1@10.3.3.14:<0.14009.0>:ebucketmigrator_srv:do_confirm_sent_messages:337] Got error while trying to read close ack:{error,closed}

      [ns_server:debug] [2013-03-04 13:41:33] [ns_1@10.3.3.14:ns_config_isasl_sync:ns_config_isasl_sync:writeSASLConf:133] Writing isasl passwd file: "/opt/couchbase/var/lib/couchbase/data/isasl.pw"
      [user:info] [2013-03-04 13:41:33] [ns_1@10.3.3.14:'ns_memcached-default':ns_memcached:terminate:467] Shutting down bucket "default" on 'ns_1@10.3.3.14' for deletion
      [ns_server:info] [2013-03-04 13:41:34] [ns_1@10.3.3.14:ns_port_memcached:ns_port_server:log:166] memcached<0.570.0>: TAP (Consumer) eq_tapq:anon_167 - disconnected
      memcached<0.570.0>: Schedule cleanup of "eq_tapq:anon_167"
      memcached<0.570.0>: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.13 - disconnected, keep alive for 300 seconds
      memcached<0.570.0>: Shutting down tap connections!
      memcached<0.570.0>: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.11"
      memcached<0.570.0>: Schedule cleanup of "eq_tapq:replication_ns_1@10.3.3.13"
      memcached<0.570.0>: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.11 - Clear the tap queues by force
      memcached<0.570.0>: TAP (Producer) eq_tapq:replication_ns_1@10.3.3.13 - Clear the tap queues by force

      [cluster:debug] [2013-03-04 13:41:36] [ns_1@10.3.3.14:ns_cluster:ns_cluster:do_add_node_engaged_inner:479] Reply from complete_join on "10.3.3.16:8091": {ok,[]}

      [cluster:debug] [2013-03-04 13:41:36] [ns_1@10.3.3.14:ns_cluster:ns_cluster:handle_call:120] add_node("10.3.3.16", 8091, ..) -> {ok,'ns_1@10.3.3.16'}

      [ns_server:info] [2013-03-04 13:41:36] [ns_1@10.3.3.14:ns_port_memcached:ns_port_server:log:166] memcached<0.570.0>: Total memory in memoryDeallocated() >= GIGANTOR !!! Disable the memory tracker...

      [ns_server:debug] [2013-03-04 13:41:37] [ns_1@10.3.3.14:'ns_memcached-default':ns_storage_conf:delete_bucket_db_directory:384] Result of deleting db directory: "/opt/couchbase/var/lib/couchbase/data/default-data": ok
      [ns_server:debug] [2013-03-04 13:41:37] [ns_1@10.3.3.14:<0.630.0>:single_bucket_sup:top_loop:24] per-bucket supervisor for "default" died with reason shutdown



      10.3.3.16

      [ns_doctor:error,2013-03-04T13:41:30.557,ns_1@10.3.3.16:<0.420.0>:ns_doctor:get_nodes:156]Error attempting to get nodes: {exit,
      {timeout,
      {gen_server,call,[ns_doctor,get_nodes]}}}
      [error_logger:error,2013-03-04T13:41:30.559,ns_1@10.3.3.16:error_logger<0.6.0>:ale_error_logger_handler:log_msg:76]** Generic server ns_doctor terminating
      ** Last message in was {get_node,'ns_1@10.3.3.16'}
      ** When Server state == {state,{dict,0,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[],[], [],[],[]},
      {{[],[],[],[],[],[],[],[],[],[],[],[],[],
      [],[],[]}}},
      undefined,undefined,undefined}
      ** Reason for termination ==
      ** {badarg,[{dict,fetch,
      ['ns_1@10.3.3.16',
      {dict,0,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
      {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
      []}}}]},
      {ns_doctor,handle_call,3},
      {gen_server,handle_msg,5},
      {proc_lib,init_p_do_apply,3}]}

      [ns_doctor:error,2013-03-04T13:41:30.559,ns_1@10.3.3.16:<0.420.0>:ns_doctor:get_node:165]Error attempting to get node 'ns_1@10.3.3.16': {exit,
      {{badarg,
      [{dict,fetch,
      ['ns_1@10.3.3.16',
      {dict,0,16,16,8,80,48,
      {[],[],[],[],[],[],[], [],[],[],[],[],[],[], [],[]},
      {{[],[],[],[],[],[],[],
      [],[],[],[],[],[],[],
      [],[]}}}]},
      {ns_doctor,handle_call,3},
      {gen_server,handle_msg,5},
      {proc_lib,init_p_do_apply, 3}]},
      {gen_server,call,
      [ns_doctor,
      {get_node, 'ns_1@10.3.3.16'}]}}}
      [error_logger:error,2013-03-04T13:41:30.615,ns_1@10.3.3.16:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
      =========================CRASH REPORT=========================
      crasher:
      initial call: ns_doctor:init/1
      pid: <0.1431.0>
      registered_name: ns_doctor
      exception exit: {badarg,
      [{dict,fetch,
      ['ns_1@10.3.3.16',
      {dict,0,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[],[],[], [],[]},
      {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
      [],[]}}}]},
      {ns_doctor,handle_call,3},
      {gen_server,handle_msg,5},
      {proc_lib,init_p_do_apply,3}]}
      in function gen_server:terminate/6
      ancestors: [ns_server_sup,ns_server_cluster_sup,<0.60.0>]
      messages: []
      links: [<0.346.0>,<0..34.0>,<0.57.0>]
      dictionary: []
      trap_exit: false
      status: running
      heap_size: 377
      stack_size: 24
      reductions: 1113
      neighbours:

      [error_logger:error,2013-03-04T13:41:30.616,ns_1@10.3.3.16:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
      =========================SUPERVISOR REPORT=========================
      Supervisor: {local,ns_server_sup}
      Context: child_terminated
      Reason: {badarg,
      [{dict,fetch,
      ['ns_1@10.3.3.16',
      {dict,0,16,16,8,80,48,
      {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[], []},
      {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
      []}}}]},
      {ns_doctor,handle_call,3},
      {gen_server,handle_msg,5},
      {proc_lib,init_p_do_apply,3}]}
      Offender: [{pid,<0.1431.0>},
      {name,ns_doctor},
      {mfargs,{ns_doctor,start_link,[]}},
      {restart_type,permanent},
      {shutdown,1000},
      {child_type,worker}]


      [error_logger:error,2013-03-04T13:41:31.253,ns_1@10.3.3.16:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
      =========================SUPERVISOR REPORT=========================
      Supervisor: {local,ns_bucket_sup}
      Context: shutdown_error
      Reason: normal
      Offender: [{pid,<0.467.0>},
      {name,buckets_observing_subscription},
      {mfargs,{ns_bucket_sup,subscribe_on_config_events,[]}},
      {restart_type,permanent},
      {shutdown,1000},
      {child_type,worker}]


      [error_logger:error,2013-03-04T13:41:31.850,ns_1@10.3.3.16:error_logger<0.6.0>:ale_error_logger_handler:log_msg:76]** Connection attempt from disallowed node 'ns_1@10.3.3.14' **

      [ns_server:error,2013-03-04T13:41:32.533,ns_1@10.3.3.16:ns_heart<0.1557.0>:ns_heart:grab_samples_loading_tasks:328]Failed to grab samples loader tasks: {exit,
      {noproc,
      {gen_server,call,
      [samples_loader_tasks,get_tasks,
      2000]}},
      [{gen_server,call,3}

      ,

      {ns_heart,grab_samples_loading_tasks,0},
      {ns_heart,current_status,0},
      {ns_heart,handle_info,2},
      {gen_server,handle_msg,5},
      {proc_lib,init_p_do_apply,3}]}
      [ns_server:error,2013-03-04T13:41:32.600,ns_1@10.3.3.16:ns_heart<0.1557.0>:ns_heart:grab_samples_loading_tasks:328]Failed to grab samples loader tasks: {exit,
      {noproc,
      {gen_server,call,
      [samples_loader_tasks,get_tasks,
      2000]}},
      [{gen_server,call,3},
      {ns_heart,grab_samples_loading_tasks,0}

      ,

      {ns_heart,current_status,0}

      ,

      {ns_heart,handle_call,3}

      ,

      {gen_server,handle_msg,5}

      ,

      {proc_lib,init_p_do_apply,3}

      ]}
      [error_logger:error,2013-03-04T13:41:47.228,ns_1@10.3.3.16:error_logger<0.6.0>:ale_error_logger_handler:log_report:72]
      =========================SUPERVISOR REPORT=========================
      Supervisor:

      {local,ns_bucket_sup}

      Context: shutdown_error
      Reason: normal
      Offender: [

      {pid,<0.1639.0>}

      ,

      {name,buckets_observing_subscription}

      ,
      {mfargs,{ns_bucket_sup,subscribe_on_config_events,[]}},

      {restart_type,permanent}

      ,

      {shutdown,1000}

      ,

      {child_type,worker}

      ]

      [error_logger:error,2013-03-04T13:41:47.540,ns_1@10.3.3.16:error_logger<0.6.0>:ale_error_logger_handler:log_msg:76]** Connection attempt from disallowed node 'ns_1@10.3.3.14' **

      [error_logger:error,2013-03-04T13:41:47.546,ns_1@10.3.3.16:error_logger<0.6.0>:ale_error_logger_handler:log_msg:76]** Connection attempt from disallowed node 'ns_1@10.3.3.14' **

      Attachments

        For Gerrit Dashboard: MB-7865
        # Subject Branch Project Status CR V

        Activity

          People

            andreibaranouski Andrei Baranouski
            andreibaranouski Andrei Baranouski
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty