Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7651

[system test] "Failed to grab remote bucket info from any of known nodes">> when create remote xdcr setup

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0.1
    • Component/s: None
    • Security Level: Public
    • Labels:
    • Environment:
      centos 5.8 64bit

      Description

      Environment:

      • Source:
        Each node has 4 core CPU, 4GB RAM
        Install couchbase server 2.0.0-1976 on 2 node cluster
        2 nodes cluster using host name (not IP), one node with default data path and other node with custom data path
        Create 2 bucket, one default (2GB) with one replica and one sasl (1.1GB) bucket with 2 replica
      • Destination:
        Each node has 4 core CPU, 4GB RAM
        Install couchbase server 2.0.0-1976 on 2 node cluster
        2 nodes cluster using IP, one node with default data path and other node with custom data path
        Create 2 bucket, one default (2GB) with one replica and one sasl (1.1GB) bucket with 2 replica

      Both buckets are empty.
      From source cluster, setup a xdcr to destination cluster. Right after setup xdcr in few minutes, errors showing up in "Ongoing Replications" session

      2013-01-30 17:55:25 - Error replicating vbucket 958: {badmatch, {error, all_nodes_failed, <<"Failed to grab remote bucket info from any of known nodes">>}}
      2013-01-30 17:55:20 - Error replicating vbucket 831: {badmatch, {error, all_nodes_failed, <<"Failed to grab remote bucket info from any of known nodes">>}}
      2013-01-30 17:55:20 - Error replicating vbucket 538: {badmatch, {error, all_nodes_failed, <<"Failed to grab remote bucket info from any of known nodes">>}}
      2013-01-30 17:55:19 - Error replicating vbucket 479: {badmatch, {error, all_nodes_failed, <<"Failed to grab remote bucket info from any of known nodes">>}}
      2013-01-30 17:55:19 - Error replicating vbucket 448: {badmatch, {error, all_nodes_failed, <<"Failed to grab remote bucket info from any of known nodes">>}}
      2013-01-30 17:55:19 - Error replicating vbucket 303: {badmatch, {error, all_nodes_failed, <<"Failed to grab remote bucket info from any of known nodes">>}}
      2013-01-30 17:55:15 - Error replicating vbucket 719: {badmatch, {error, all_nodes_failed, <<"Failed to grab remote bucket info from any of known nodes">>}}

      Link to collect info of source nodes https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_1/201301/2nodes-200GA-source-xdcr-badmatch-20130130-183047.tgz

      LInk to collect info of destination nodes https://s3.amazonaws.com/packages.couchbase/collect_info/2_0_1/201301/2nodes-200GA-destination-xdcr-badmatch-20130130-183136.tgz

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        Aliaksey Artamonau Aliaksey Artamonau added a comment -

        On the destination cluster I see a bunch of timeouts about the same time when replication was created:

        [menelaus:warn,2013-01-30T17:37:04.312,ns_1@10.3.3.9:<0.21026.229>:menelaus_web:loop:430]Server error during processing: ["web request failed",

        {path,"/pools/default"},
        {type,exit},
        {what,
        {timeout,
        {gen_server,call, [ns_node_disco,nodes_wanted]}}},
        {trace,
        [{gen_server,call,2},
        {ns_cluster_membership,active_nodes,1},
        {ns_storage_conf,cluster_storage_info,0},
        {menelaus_web,build_pool_info,4},
        {menelaus_web,handle_pool_info,2},
        {menelaus_web,loop,3},
        {mochiweb_http,headers,5},
        {proc_lib,init_p_do_apply,3}]}]
        [menelaus:warn,2013-01-30T17:37:04.296,ns_1@10.3.3.9:<0.21086.229>:menelaus_web:loop:430]Server error during processing: ["web request failed",
        {path,"/pools/default"}

        ,

        {type,exit},
        {what,
        {timeout,
        {gen_server,call, [ns_node_disco,nodes_wanted]}}},
        {trace,
        [{gen_server,call,2},
        {menelaus_web,build_pool_info,4},
        {menelaus_web,handle_pool_info,2},
        {menelaus_web,loop,3},
        {mochiweb_http,headers,5},
        {proc_lib,init_p_do_apply,3}]}]
        [menelaus:warn,2013-01-30T17:37:04.313,ns_1@10.3.3.9:<0.21012.229>:menelaus_web:loop:430]Server error during processing: ["web request failed",
        {path,"/pools/default"},
        {type,exit}

        ,
        {what,
        {timeout,

        {gen_server,call, [ns_node_disco,nodes_wanted]}

        }},
        {trace,
        [

        {gen_server,call,2}

        ,

        {ns_cluster_membership,active_nodes,1}

        ,

        {ns_storage_conf,cluster_storage_info,0}

        ,

        {menelaus_web,build_pool_info,4}

        ,

        {menelaus_web,handle_pool_info,2}

        ,

        {menelaus_web,loop,3}

        ,

        {mochiweb_http,headers,5}

        ,

        {proc_lib,init_p_do_apply,3}

        ]}]

        And this is linux box where we have not seen this issue recently. Can we test if this is reproducible? And please collect cpu delays info while reproducing.

        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - On the destination cluster I see a bunch of timeouts about the same time when replication was created: [menelaus:warn,2013-01-30T17:37:04.312,ns_1@10.3.3.9:<0.21026.229>:menelaus_web:loop:430] Server error during processing: ["web request failed", {path,"/pools/default"}, {type,exit}, {what, {timeout, {gen_server,call, [ns_node_disco,nodes_wanted]}}}, {trace, [{gen_server,call,2}, {ns_cluster_membership,active_nodes,1}, {ns_storage_conf,cluster_storage_info,0}, {menelaus_web,build_pool_info,4}, {menelaus_web,handle_pool_info,2}, {menelaus_web,loop,3}, {mochiweb_http,headers,5}, {proc_lib,init_p_do_apply,3}]}] [menelaus:warn,2013-01-30T17:37:04.296,ns_1@10.3.3.9:<0.21086.229>:menelaus_web:loop:430] Server error during processing: ["web request failed", {path,"/pools/default"} , {type,exit}, {what, {timeout, {gen_server,call, [ns_node_disco,nodes_wanted]}}}, {trace, [{gen_server,call,2}, {menelaus_web,build_pool_info,4}, {menelaus_web,handle_pool_info,2}, {menelaus_web,loop,3}, {mochiweb_http,headers,5}, {proc_lib,init_p_do_apply,3}]}] [menelaus:warn,2013-01-30T17:37:04.313,ns_1@10.3.3.9:<0.21012.229>:menelaus_web:loop:430] Server error during processing: ["web request failed", {path,"/pools/default"}, {type,exit} , {what, {timeout, {gen_server,call, [ns_node_disco,nodes_wanted]} }}, {trace, [ {gen_server,call,2} , {ns_cluster_membership,active_nodes,1} , {ns_storage_conf,cluster_storage_info,0} , {menelaus_web,build_pool_info,4} , {menelaus_web,handle_pool_info,2} , {menelaus_web,loop,3} , {mochiweb_http,headers,5} , {proc_lib,init_p_do_apply,3} ]}] And this is linux box where we have not seen this issue recently. Can we test if this is reproducible? And please collect cpu delays info while reproducing.
        Hide
        ketaki Ketaki Gangal added a comment -

        Hi Tony,

        On the next run, can you repro this? If yes, can you add
        information on cpu delays info while reproducing.

        -Ketaki

        Show
        ketaki Ketaki Gangal added a comment - Hi Tony, On the next run, can you repro this? If yes, can you add information on cpu delays info while reproducing. -Ketaki
        Hide
        Aliaksey Artamonau Aliaksey Artamonau added a comment -
        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - MB-7697
        Hide
        ketaki Ketaki Gangal added a comment -

        Hi TOny,

        Are these failing on 2.0 or 2.0.1?

        -Ketaki

        Show
        ketaki Ketaki Gangal added a comment - Hi TOny, Are these failing on 2.0 or 2.0.1? -Ketaki
        Hide
        thuan Thuan Nguyen added a comment -

        On 2.0.0

        Show
        thuan Thuan Nguyen added a comment - On 2.0.0

          People

          • Assignee:
            thuan Thuan Nguyen
            Reporter:
            thuan Thuan Nguyen
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes