Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7674

cluster got reply from wanted node 30 secs before but... Join completion call failed This could be due to an incorrect host/port combination or a firewall in place between the servers...

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 2.0.1
    • Fix Version/s: 3.0
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
      None

      Description

      build 2.0.1-146-rel
      http://qa.hq.northscale.net/job/centos-64-2.0-rebalance-regressions/176/consoleFull
      /testrunner -i /tmp/rebalance_regression.ini get-logs=True,wait_timeout=100,get-cbcollect-info=True -t swaprebalance.SwapRebalanceBasicTests.do_test,replica=1,num-buckets=2,num-swap=2,swap-orchestrator=True,keys-count=1000000

      [2013-02-01 13:26:37,318] - [rest_client:712] INFO - adding remote node @10.3.121.94:8091 to this cluster @10.3.121.93:8091
      [2013-02-01 13:27:07,860] - [rest_client:578] ERROR - http://10.3.121.93:8091/controller/addNode error 400 reason: unknown ["Join completion call failed. Timeout connecting to \"10.3.121.94\" on port \"8091\". This could be due to an incorrect host/port combination or a firewall in place between the servers."]

      both nodes must be available, there were no problems before in the same suite

      server logs:

      [cluster:debug,2013-02-01T13:30:05.114,ns_1@10.3.121.93:ns_cluster<0.279.0>:ns_cluster:verify_otp_connectivity:398]port_please("ns_1", "10.3.121.94") = 21100

      [views:debug,2013-02-01T13:30:05.115,ns_1@10.3.121.93:mc_couch_events<0.18607.120>:capi_set_view_manager:handle_mc_couch_event:529]Got set_vbucket event for bucket-0/869. Updated state: active (0)
      [ns_server:debug,2013-02-01T13:30:05.116,ns_1@10.3.121.93:<0.21074.120>:mc_connection:do_notify_vbucket_update:112]Signaled mc_couch_event:

      {set_vbucket,"bucket-0",869,active,0}

      [ns_server:debug,2013-02-01T13:30:05.117,ns_1@10.3.121.93:capi_set_view_manager-bucket-0<0.21041.120>:capi_set_view_manager:handle_info:377]Usable vbuckets:
      [997,965,933,901,869,984,952,920,888,1016,971,939,907,875,1003,990,958,926,
      894,1022,977,945,913,881,1009,996,964,932,900,983,951,919,887,1015,970,938,
      906,874,1002,989,957,925,893,1021,976,944,912,880,1008,995,963,931,899,982,
      950,918,886,1014,969,937,905,873,1001,988,956,924,892,1020,975,943,911,879,
      1007,994,978,962,946,930,914,898,882,1010,981,949,917,885,1013,968,936,904,
      872,1000,987,955,923,891,1019,974,942,910,878,1006,993,961,929,897,980,948,
      916,884,1012,999,967,935,903,871,986,954,922,890,1018,973,941,909,877,1005,
      992,960,928,896,979,947,915,883,1011,998,966,934,902,870,985,953,921,889,
      1017,972,940,908,876,1004,991,959,927,895,1023]
      [ns_server:debug,2013-02-01T13:30:05.118,ns_1@10.3.121.93:couch_stats_reader-bucket-0<0.21094.120>:couch_stats_reader:vbuckets_aggregation_loop:126]Failed to open vbucket: 376 (

      {not_found,no_db_file}). Ignoring
      [ns_server:debug,2013-02-01T13:30:05.121,ns_1@10.3.121.93:couch_stats_reader-bucket-1<0.21124.120>:couch_stats_reader:vbuckets_aggregation_loop:126]Failed to open vbucket: 359 ({not_found,no_db_file}

      ). Ignoring
      [cluster:info,2013-02-01T13:30:05.121,ns_1@10.3.121.93:ns_cluster<0.279.0>:ns_cluster:node_add_transaction:503]Started node add transaction by adding node 'ns_1@10.3.121.94' to nodes_wanted

      [ns_server:debug,2013-02-01T13:30:05.122,ns_1@10.3.121.93:ns_config_rep<0.18536.120>:ns_config_rep:do_push_keys:317]Replicating some config keys ([nodes_wanted,

      {node,'ns_1@10.3.121.94',membership}]..)
      [ns_server:debug,2013-02-01T13:30:05.122,ns_1@10.3.121.93:capi_set_view_manager-bucket-0<0.21041.120>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
      [ns_server:debug,2013-02-01T13:30:05.123,ns_1@10.3.121.93:capi_set_view_manager-bucket-1<0.21068.120>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
      [ns_server:debug,2013-02-01T13:30:05.127,ns_1@10.3.121.93:ns_config_events<0.331.0>:ns_node_disco_conf_events:handle_event:44]ns_node_disco_conf_events config on nodes_wanted
      [ns_server:debug,2013-02-01T13:30:05.127,ns_1@10.3.121.93:capi_set_view_manager-bucket-0<0.21041.120>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
      [ns_server:debug,2013-02-01T13:30:05.129,ns_1@10.3.121.93:capi_set_view_manager-bucket-1<0.21068.120>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
      [ns_server:debug,2013-02-01T13:30:05.129,ns_1@10.3.121.93:capi_set_view_manager-bucket-0<0.21041.120>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
      [ns_server:debug,2013-02-01T13:30:05.129,ns_1@10.3.121.93:mb_master<0.18561.120>:mb_master:update_peers:509]List of peers has changed from ['ns_1@10.3.121.93'] to ['ns_1@10.3.121.93',
      'ns_1@10.3.121.94']
      [ns_server:debug,2013-02-01T13:30:05.129,ns_1@10.3.121.93:ns_config_log<0.340.0>:ns_config_log:log_common:111]config change:
      nodes_wanted ->
      ['ns_1@10.3.121.93','ns_1@10.3.121.94']
      [ns_server:debug,2013-02-01T13:30:05.145,ns_1@10.3.121.93:ns_cookie_manager<0.278.0>:ns_cookie_manager:do_cookie_sync:115]ns_cookie_manager do_cookie_sync
      [ns_server:debug,2013-02-01T13:30:05.147,ns_1@10.3.121.93:capi_set_view_manager-bucket-1<0.21068.120>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
      [ns_server:debug,2013-02-01T13:30:05.147,ns_1@10.3.121.93:ns_config_log<0.340.0>:ns_config_log:log_common:111]config change:{node,'ns_1@10.3.121.94',membership}

      ->
      inactiveAdded
      [ns_server:debug,2013-02-01T13:30:05.148,ns_1@10.3.121.93:ns_cookie_manager<0.278.0>:ns_cookie_manager:do_cookie_save:152]saving cookie to "/opt/couchbase/var/lib/couchbase/couchbase-server.cookie"
      [ns_server:debug,2013-02-01T13:30:05.165,ns_1@10.3.121.93:couch_stats_reader-bucket-0<0.21094.120>:couch_stats_reader:vbuckets_aggregation_loop:126]Failed to open vbucket: 377 (

      {not_found,no_db_file}). Ignoring
      [ns_server:debug,2013-02-01T13:30:05.166,ns_1@10.3.121.93:capi_set_view_manager-bucket-0<0.21041.120>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
      [ns_server:debug,2013-02-01T13:30:05.166,ns_1@10.3.121.93:capi_set_view_manager-bucket-1<0.21068.120>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
      [views:debug,2013-02-01T13:30:05.167,ns_1@10.3.121.93:mc_couch_events<0.18607.120>:capi_set_view_manager:handle_mc_couch_event:529]Got set_vbucket event for bucket-1/558. Updated state: active (0)
      [ns_server:debug,2013-02-01T13:30:05.167,ns_1@10.3.121.93:<0.21110.120>:mc_connection:do_notify_vbucket_update:112]Signaled mc_couch_event: {set_vbucket,"bucket-1",558,active,0}
      [ns_server:debug,2013-02-01T13:30:05.168,ns_1@10.3.121.93:ns_cookie_manager<0.278.0>:ns_cookie_manager:do_cookie_save:154]attempted to save cookie to "/opt/couchbase/var/lib/couchbase/couchbase-server.cookie": ok
      [ns_server:debug,2013-02-01T13:30:05.169,ns_1@10.3.121.93:<0.25111.120>:ns_node_disco:do_nodes_wanted_updated_fun:202]ns_node_disco: nodes_wanted updated: ['ns_1@10.3.121.93','ns_1@10.3.121.94'], with cookie: nlnnkrmuayuibshk
      [cluster:debug,2013-02-01T13:30:05.167,ns_1@10.3.121.93:ns_cluster<0.279.0>:ns_cluster:do_add_node_engaged_inner:471]Posting the following to complete_join on "10.3.121.94:8091":
      {struct,
      [{<<"targetNode">>,'ns_1@10.3.121.94'},
      {availableStorage,
      {struct,
      [{hdd,
      [{struct,
      [{path,<<"/">>},
      {sizeKBytes,13167336},
      {usagePercent,59}]},
      {struct,
      [{path,<<"/dev">>},
      {sizeKBytes,2021356},
      {usagePercent,1}]},
      {struct,
      [{path,<<"/dev/shm">>},
      {sizeKBytes,2029332},
      {usagePercent,0}]},
      {struct,
      [{path,<<"/var/run">>},
      {sizeKBytes,2029332},
      {usagePercent,1}]},
      {struct,
      [{path,<<"/var/lock">>},
      {sizeKBytes,2029332},
      {usagePercent,0}]},
      {struct,
      [{path,<<"/boot">>},
      {sizeKBytes,233191},
      {usagePercent,41}]}]}]}},
      {memoryQuota,2113},
      {storageTotals,
      {struct,
      [{ram,
      {struct,
      [{total,4156071936},
      {quotaTotal,2215641088},
      {quotaUsed,1476395008},
      {used,2653765632},
      {usedByData,62268064}]}},
      {hdd,
      {struct,
      [{total,13483352064},
      {quotaTotal,13483352064},
      {used,7955177717},
      {usedByData,7069},
      {free,5528174347}]}}]}},
      {storage,
      {struct,
      [{ssd,[]},
      {hdd,
      [{struct,
      [{path,<<"/opt/couchbase/var/lib/couchbase/data">>},
      {index_path, <<"/opt/couchbase/var/lib/couchbase/data">>},
      {quotaMb,none},
      {state,ok}]}]}]}},
      {systemStats,
      {struct,
      [{cpu_utilization_rate,52.722772277227726},
      {swap_total,1073737728},
      {swap_used,5218304}]}},
      {interestingStats,
      {struct,
      [{couch_docs_actual_disk_size,7069},
      {couch_docs_data_size,0},
      {couch_views_actual_disk_size,0},
      {couch_views_data_size,0},
      {curr_items,0},
      {curr_items_tot,0},
      {mem_used,62268064},
      {vb_replica_curr_items,0}]}},
      {uptime,<<"38632">>},
      {memoryTotal,4156071936},
      {memoryFree,1502306304},
      {mcdMemoryReserved,3170},
      {mcdMemoryAllocated,3170},
      {couchApiBase,<<"http://10.3.121.93:8092/">>},
      {otpNode,<<"ns_1@10.3.121.93">>},
      {otpCookie,<<"nlnnkrmuayuibshk">>},
      {clusterMembership,<<"active">>},
      {status,<<"healthy">>},
      {thisNode,true},
      {hostname,<<"10.3.121.93:8091">>},
      {clusterCompatibility,131072},
      {version,<<"2.0.1-146-rel-enterprise">>},
      {os,<<"x86_64-unknown-linux-gnu">>},
      {ports,{struct,[{proxy,11211},{direct,11210}]}}]}

      [ns_server:debug,2013-02-01T13:30:05.170,ns_1@10.3.121.93:couch_stats_reader-bucket-1<0.21124.120>:couch_stats_reader:vbuckets_aggregation_loop:126]Failed to open vbucket: 360 ({not_found,no_db_file}

      ). Ignoring

      ...................................................................
      and then

      [cluster:debug,2013-02-01T13:30:35.518,ns_1@10.3.121.93:ns_cluster<0.279.0>:ns_cluster:do_add_node_engaged_inner:478]Reply from complete_join on "10.3.121.94:8091":
      {error,rest_error,
      <<"Timeout connecting to \"10.3.121.94\" on port \"8091\". This could be due to an incorrect host/port combination or a firewall in place between the servers.">>,
      {error,timeout}}

      [cluster:error,2013-02-01T13:30:35.519,ns_1@10.3.121.93:ns_cluster<0.279.0>:ns_cluster:node_add_transaction:508]Add transaction of 'ns_1@10.3.121.94' failed because of {error,complete_join,
      <<"Join completion call failed. Timeout connecting to \"10.3.121.94\" on port \"8091\". This could be due to an incorrect host/port combination or a firewall in place between the servers.">>,
      {error,rest_error,
      <<"Timeout connecting to \"10.3.121.94\" on port \"8091\". This could be due to an incorrect host/port combination or a firewall in place between the servers.">>,

      {error,timeout}}}

      [cluster:debug,2013-02-01T13:30:35.520,ns_1@10.3.121.93:ns_cluster<0.279.0>:ns_cluster:shun:235]Shunning 'ns_1@10.3.121.94'
      [ns_server:debug,2013-02-01T13:30:35.520,ns_1@10.3.121.93:capi_set_view_manager-bucket-0<0.21041.120>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
      [ns_server:debug,2013-02-01T13:30:35.520,ns_1@10.3.121.93:capi_set_view_manager-bucket-1<0.21068.120>:capi_set_view_manager:handle_info:349]doing replicate_newnodes_docs
      [cluster:debug,2013-02-01T13:30:35.521,ns_1@10.3.121.93:ns_cluster<0.279.0>:ns_cluster:handle_call:121]add_node("10.3.121.94", 8091, ..) -> {error,complete_join,
      <<"Join completion call failed. Timeout connecting to \"10.3.121.94\" on port \"8091\". This could be due to an incorrect host/port combination or a firewall in place between the servers.">>,
      {error,rest_error,
      <<"Timeout connecting to \"10.3.121.94\" on port \"8091\". This could be due to an incorrect host/port combination or a firewall in place between the servers.">>,
      {error,timeout}

      }}

      [ns_server:debug,2013-02-01T13:30:35.521,ns_1@10.3.121.93:ns_config_rep<0.18536.120>:ns_config_rep:do_push_keys:317]Replicating some config keys ([nodes_wanted]..)
      [user:info,2013-02-01T13:30:35.541,ns_1@10.3.121.93:<0.24468.120>:ns_cluster:add_node:74]Failed to add node 10.3.121.94:8091 to cluster. Join completion call failed. Timeout connecting to "10.3.121.94" on port "8091". This could be due to an incorrect host/port combination or a firewall in place between the servers.

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Show
        andreibaranouski Andrei Baranouski added a comment - - edited Aliaksey, it's very similar to the problem MB-7635 https://s3.amazonaws.com/bugdb/jira/MB-7674/f33daca6-5b1b-4f46-b882-c9ec9754f61b-10.3.121.92-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7674/f33daca6-5b1b-4f46-b882-c9ec9754f61b-10.3.121.93-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7674/f33daca6-5b1b-4f46-b882-c9ec9754f61b-10.3.121.94-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7674/f33daca6-5b1b-4f46-b882-c9ec9754f61b-10.3.121.95-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7674/f33daca6-5b1b-4f46-b882-c9ec9754f61b-10.3.121.96-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7674/f33daca6-5b1b-4f46-b882-c9ec9754f61b-10.3.121.97-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7674/f33daca6-5b1b-4f46-b882-c9ec9754f61b-10.3.121.98-diag.txt.gz
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        Andrei

        Is this 100% reproducible ?

        Show
        farshid Farshid Ghods (Inactive) added a comment - Andrei Is this 100% reproducible ?
        Hide
        andreibaranouski Andrei Baranouski added a comment -

        do not think so, probably, it is a rare case

        Show
        andreibaranouski Andrei Baranouski added a comment - do not think so, probably, it is a rare case
        Hide
        Aliaksey Artamonau Aliaksey Artamonau added a comment -

        memcached on node .94 took more than 30 seconds to terminate:

        [ns_server:debug,2013-02-01T13:30:05.226,ns_1@10.3.121.94:ns_port_memcached<0.4082.35>:ns_port_server:terminate:143]Sending 'shutdown' to port
        [ns_server:info,2013-02-01T13:30:05.455,ns_1@10.3.121.94:ns_port_memcached<0.4082.35>:ns_port_server:log:171]memcached<0.4082.35>: EOL on stdin. Initiating shutdown
        ....
        [ns_server:info,2013-02-01T13:30:40.454,ns_1@10.3.121.94:ns_port_memcached<0.4082.35>:ns_port_server:handle_info:104]Port server memcached exited with status 0

        All buckets had been deleted prior to this.

        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - memcached on node .94 took more than 30 seconds to terminate: [ns_server:debug,2013-02-01T13:30:05.226,ns_1@10.3.121.94:ns_port_memcached<0.4082.35>:ns_port_server:terminate:143] Sending 'shutdown' to port [ns_server:info,2013-02-01T13:30:05.455,ns_1@10.3.121.94:ns_port_memcached<0.4082.35>:ns_port_server:log:171] memcached<0.4082.35>: EOL on stdin. Initiating shutdown .... [ns_server:info,2013-02-01T13:30:40.454,ns_1@10.3.121.94:ns_port_memcached<0.4082.35>:ns_port_server:handle_info:104] Port server memcached exited with status 0 All buckets had been deleted prior to this.
        Hide
        chiyoung Chiyoung Seo added a comment -

        For bug distributions in the engine team.

        Show
        chiyoung Chiyoung Seo added a comment - For bug distributions in the engine team.
        Hide
        xiaoqin Xiaoqin Ma (Inactive) added a comment -

        Andrei,
        Can try it again? I want to see the log when the test succeeds, so I can compare the two logs see what are the differences are. Thanks!

        Show
        xiaoqin Xiaoqin Ma (Inactive) added a comment - Andrei, Can try it again? I want to see the log when the test succeeds, so I can compare the two logs see what are the differences are. Thanks!
        Show
        andreibaranouski Andrei Baranouski added a comment - - edited Xiaoqin, the issue is not reproduced at the latest 2.0.1 build, for example - 2.0.1-170-rel http://qa.hq.northscale.net/view/2.0.1/job/centos-64-2.0-rebalance-regressions/191/ logs from succeed test: https://s3.amazonaws.com/bugdb/jira/MB-7674/10.3.121.93-8091-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7674/10.3.121.94-8091-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7674/10.3.121.95-8091-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7674/10.3.121.96-8091-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7674/10.3.121.97-8091-diag.txt.gz https://s3.amazonaws.com/bugdb/jira/MB-7674/10.3.121.98-8091-diag.txt.gz
        Hide
        xiaoqin Xiaoqin Ma (Inactive) added a comment -

        As it is not reproducible in the latest 2.0.1 build. Will close it for now. If it happens again in the lated build. Feel free to reopen it.

        Show
        xiaoqin Xiaoqin Ma (Inactive) added a comment - As it is not reproducible in the latest 2.0.1 build. Will close it for now. If it happens again in the lated build. Feel free to reopen it.

          People

          • Assignee:
            xiaoqin Xiaoqin Ma (Inactive)
            Reporter:
            andreibaranouski Andrei Baranouski
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes