Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-4773

Rebalance failing on latest 2.0 build due to wait_for_memcached failures (ns_rebalancer:wait_for_memcached)

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0-developer-preview-4
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Labels:
      None
    • Environment:
      Centos 64 bit. 6 node cluster

      2.0.0r-651-g02c3154

      Description

      The latest test run, failed with rebalance failures. This is a regression from the previous test runs. No cores files generated.

      http://qa.hq.northscale.net/job/centos-64-view-tests/317/

      From the logs
      [rebalance:info] [2012-02-07 2:01:50] [ns_1@10.1.2.30:<0.4598.3>:ns_rebalancer:wait_for_memcached:296] Waiting for ['ns_1@10.1.2.30']
      [user:info] [2012-02-07 2:01:50] [ns_1@10.1.2.30:<0.317.0>:ns_orchestrator:handle_info:234] Rebalance exited with reason

      {wait_for_memcached_failed, "default", ['ns_1@10.1.2.30']}

      [ns_server:info] [2012-02-07 2:01:50] [ns_1@10.1.2.30:ns_log:ns_log:handle_cast:115] suppressing duplicate log ns_orchestrator:2("Rebalance exited with reason

      {wait_for_memcached_failed,\"default\",\n ['ns_1@10.1.2.30']}

      \n") because it's been seen 1 times in the past 254.737081 secs (last seen 254.737081 secs ago
      [ns_server:info] [2012-02-07 2:01:50] [ns_1@10.1.2.30:ns_config_events:ns_config_log:handle_event:60] config change:
      counters ->
      [

      {rebalance_fail,2}

      ,

      {rebalance_start,11}

      ,

      {rebalance_success,9}

      ]
      [ns_server:info] [2012-02-07 2:01:50] [ns_1@10.1.2.30:ns_config_events:ns_node_disco_conf_events:handle_event:56] ns_node_disco_conf_events config all
      [ns_server:info] [2012-02-07 2:01:50] [ns_1@10.1.2.30:ns_config_rep:ns_config_rep:handle_info:181] Pushing config
      [ns_server:info] [2012-02-07 2:01:50] [ns_1@10.1.2.30:ns_config_rep:ns_config_rep:handle_info:183] Pushing config done
      [ns_server:info] [2012-02-07 2:01:50] [ns_1@10.1.2.30:ns_config_events:ns_config_log:handle_event:60] config change:
      rebalance_status ->

      {none,<<"Rebalance failed. See logs for detailed reason. You can try rebalance again.">>}
      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Damien, this looks exactly like shutdown race we've discussed yesterday

        ns_memcached is stuck deleting one of databases during bucket deletion.

        {<0.20901.2>,
        [

        {registered_name,[]},
        {status,waiting},
        {initial_call,{proc_lib,init_p,5}},
        {backtrace,
        [<<"Program counter: 0x00002b89521a8ef0 (gen:do_call/4 + 576)">>,
        <<"CP: 0x0000000000000000 (invalid)">>,
        <<"arity = 0">>,<<>>,
        <<"0x00002aaaaf7bd1a0 Return addr 0x00002b8952226498 (gen_server:call/3 + 128)">>,
        <<"y(0) #Ref<0.0.101.52421>">>,
        <<"y(1) 'ns_1@10.1.2.30'">>,<<"y(2) []">>,
        <<"y(3) infinity">>,
        <<"y(4) {delete,<<10 bytes>>,[]}">>,
        <<"y(5) '$gen_call'">>,<<"y(6) <0.161.0>">>,
        <<>>,
        <<"0x00002aaaaf7bd1e0 Return addr 0x00002aaaaea06290 (ns_storage_conf:delete_database/1 + 72)">>,

        it is part of 10.1.2.30's diag

        then look at

        {<0.28202.2>,
        [{registered_name,[]}

        ,

        {status,waiting},
        {initial_call,{proc_lib,init_p,5}},
        {backtrace,
        [<<"Program counter: 0x00002aaaac4c7a18 (couch_util:shutdown_sync/1 + 352)">>,
        <<"CP: 0x0000000000000000 (invalid)">>,
        <<"arity = 0">>,<<>>,
        <<"0x00002aaaae939a08 Return addr 0x00002aaaadf092e0 (couch_db:terminate/2 + 160)">>,
        <<"y(0) #Ref<0.0.101.52424>">>,<<"y(1) []">>,
        <<"y(2) Catch 0x00002aaaac4c7b30 (couch_util:shutdown_sync/1 + 632)">>,
        <<"y(3) []">>,<<>>,
        <<"0x00002aaaae939a30 Return addr 0x00002b895222c788 (gen_server:terminate/6 + 184)">>,
        <<>>,
        <<"0x00002aaaae939a38 Return addr 0x00002b89521adfe8 (proc_lib:init_p_do_apply/3 + 56)">>,
        <<"y(0) []">>,
        <<"(1) {db,<0.28202.2>,<0.28203.2>,<0.32369.2>,<<16 bytes>>,<0.28197.2>,<0.28204.2>,{db_h">>,
        <<"y(2) couch_db">>,
        <<"y(3) {'EXIT',<0.161.0>,shutdown}">>,
        <<"y(4) <0.28202.2>">>,<<"y(5) shutdown">>,
        <<"y(6) Catch 0x00002b895222c788 (gen_server:terminate/6 + 184)">>,
        <<>>,
        <<"0x00002aaaae939a78 Return addr 0x000000000088e318 (<terminate process normally>)">>,
        <<"y(0) Catch 0x00002b89521ae008 (proc_lib:init_p_do_apply/3 + 88)">>,
        <<>>]},
        {error_handler,error_handler},
        {garbage_collection,
        [{min_bin_vheap_size,46368},
        {min_heap_size,233},
        {fullsweep_after,0},
        {minor_gcs,0}]},
        {heap_size,987},
        {total_heap_size,987},
        {links,[]},
        {memory,9184},
        {message_queue_len,1},
        {reductions,2868},
        {trap_exit,true}]},
        {<0.28203.2>,
        [{registered_name,[]},
        {status,waiting}

        ,
        {initial_call,{proc_lib,init_p,5}},
        {backtrace,
        [<<"Program counter: 0x00002b89521a8ef0 (gen:do_call/4 + 576)">>,
        <<"CP: 0x0000000000000000 (invalid)">>,
        <<"arity = 0">>,<<>>,
        <<"0x00002aaaabcda110 Return addr 0x00002b8952226498 (gen_server:call/3 + 128)">>,
        <<"y(0) #Ref<0.0.101.52434>">>,
        <<"y(1) 'ns_1@10.1.2.30'">>,<<"y(2) []">>,
        <<"y(3) infinity">>,
        <<"(4) {db_updated,{db,<0.28202.2>,<0.28203.2>,nil,<<16 bytes>>,<0.32412.2>,<0.32420.2>,{">>,
        <<"y(5) '$gen_call'">>,
        <<"y(6) <0.28202.2>">>,<<>>,
        <<"0x00002aaaabcda150 Return addr 0x00002aaaadf0fb50 (couch_db_updater:handle_call/3 + 5368)">>,
        <<"y(0) infinity">>,
        <<"(1) {db_updated,{db,<0.28202.2>,<0.28203.2>,nil,<<16 bytes>>,<0.32412.2>,<0.32420.2>,{">>,
        <<"y(2) <0.28202.2>">>,
        <<"y(3) Catch 0x00002b8952226498 (gen_server:call/3 + 128)">>,
        <<>>,
        <<"0x00002aaaabcda178 Return addr 0x00002b895222a618 (gen_server:handle_msg/5 + 272)">>,
        <<"y(0) []">>,<<"y(1) []">>,
        <<"y(2) <0.32412.2>">>,
        <<"(3) {db,<0.28202.2>,<0.28203.2>,<0.32369.2>,<<16 bytes>>,<0.28197.2>,<0.28204.2>,{db_h">>,
        <<"y(4) []">>,
        <<"y(5) \"/opt/couchbase/var/lib/couchdb/default/27.couch.1\"">>,
        <<"y(6) <<10 bytes>>">>,
        <<"y(7) \"/opt/couchbase/var/lib/couchdb/default/27.couch.2\"">>,
        <<"y(8) []">>,<<"y(9) <0.28197.2>">>,
        <<"(10) {db,<0.28202.2>,<0.28203.2>,nil,<<16 bytes>>,<0.32412.2>,<0.32420.2>,{db_header,8,">>,
        <<"y(11) []">>,
        <<"y(12) \"/opt/couchbase/var/lib/couchdb\"">>,
        <<>>,
        <<"0x00002aaaabcda1e8 Return addr 0x00002b89521adfe8 (proc_lib:init_p_do_apply/3 + 56)">>,
        <<"y(0) couch_db_updater">>,
        <<"(1) {db,<0.28202.2>,<0.28203.2>,<0.32369.2>,<<16 bytes>>,<0.28197.2>,<0.28204.2>,{db_h">>,
        <<"y(2) <0.28203.2>">>,
        <<"y(3) <0.28202.2>">>,
        <<"y(4)

        {compact_done,\"/opt/couchbase/var/lib/couchdb/default/27.couch.1.compact\"}

        ">>,
        <<"y(5)

        {<0.32369.2>,#Ref<0.0.101.52370>}

        ">>,
        <<"y(6) Catch 0x00002b895222a618 (gen_server:handle_msg/5 + 272)">>,
        <<>>,
        <<"0x00002aaaabcda228 Return addr 0x000000000088e318 (<terminate process normally>)">>,
        <<"y(0) Catch 0x00002b89521ae008 (proc_lib:init_p_do_apply/3 + 88)">>,
        <<>>]},

        {error_handler,error_handler}

        ,
        {garbage_collection,
        [

        {min_bin_vheap_size,46368}

        ,

        {min_heap_size,233}

        ,

        {fullsweep_after,0}

        ,

        {minor_gcs,0}

        ]},

        {heap_size,1597}

        ,

        {total_heap_size,1597}

        ,

        {links,[<0.32369.2>]}

        ,

        {memory,14104}

        ,

        {message_queue_len,1}

        ,

        {reductions,12954}

        ,

        {trap_exit,true}

        ]},

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Damien, this looks exactly like shutdown race we've discussed yesterday ns_memcached is stuck deleting one of databases during bucket deletion. {<0.20901.2>, [ {registered_name,[]}, {status,waiting}, {initial_call,{proc_lib,init_p,5}}, {backtrace, [<<"Program counter: 0x00002b89521a8ef0 (gen:do_call/4 + 576)">>, <<"CP: 0x0000000000000000 (invalid)">>, <<"arity = 0">>,<<>>, <<"0x00002aaaaf7bd1a0 Return addr 0x00002b8952226498 (gen_server:call/3 + 128)">>, <<"y(0) #Ref<0.0.101.52421>">>, <<"y(1) 'ns_1@10.1.2.30'">>,<<"y(2) []">>, <<"y(3) infinity">>, <<"y(4) {delete,<<10 bytes>>,[]}">>, <<"y(5) '$gen_call'">>,<<"y(6) <0.161.0>">>, <<>>, <<"0x00002aaaaf7bd1e0 Return addr 0x00002aaaaea06290 (ns_storage_conf:delete_database/1 + 72)">>, it is part of 10.1.2.30's diag then look at {<0.28202.2>, [{registered_name,[]} , {status,waiting}, {initial_call,{proc_lib,init_p,5}}, {backtrace, [<<"Program counter: 0x00002aaaac4c7a18 (couch_util:shutdown_sync/1 + 352)">>, <<"CP: 0x0000000000000000 (invalid)">>, <<"arity = 0">>,<<>>, <<"0x00002aaaae939a08 Return addr 0x00002aaaadf092e0 (couch_db:terminate/2 + 160)">>, <<"y(0) #Ref<0.0.101.52424>">>,<<"y(1) []">>, <<"y(2) Catch 0x00002aaaac4c7b30 (couch_util:shutdown_sync/1 + 632)">>, <<"y(3) []">>,<<>>, <<"0x00002aaaae939a30 Return addr 0x00002b895222c788 (gen_server:terminate/6 + 184)">>, <<>>, <<"0x00002aaaae939a38 Return addr 0x00002b89521adfe8 (proc_lib:init_p_do_apply/3 + 56)">>, <<"y(0) []">>, <<"(1) {db,<0.28202.2>,<0.28203.2>,<0.32369.2>,<<16 bytes>>,<0.28197.2>,<0.28204.2>,{db_h">>, <<"y(2) couch_db">>, <<"y(3) {'EXIT',<0.161.0>,shutdown}">>, <<"y(4) <0.28202.2>">>,<<"y(5) shutdown">>, <<"y(6) Catch 0x00002b895222c788 (gen_server:terminate/6 + 184)">>, <<>>, <<"0x00002aaaae939a78 Return addr 0x000000000088e318 (<terminate process normally>)">>, <<"y(0) Catch 0x00002b89521ae008 (proc_lib:init_p_do_apply/3 + 88)">>, <<>>]}, {error_handler,error_handler}, {garbage_collection, [{min_bin_vheap_size,46368}, {min_heap_size,233}, {fullsweep_after,0}, {minor_gcs,0}]}, {heap_size,987}, {total_heap_size,987}, {links,[]}, {memory,9184}, {message_queue_len,1}, {reductions,2868}, {trap_exit,true}]}, {<0.28203.2>, [{registered_name,[]}, {status,waiting} , {initial_call,{proc_lib,init_p,5}}, {backtrace, [<<"Program counter: 0x00002b89521a8ef0 (gen:do_call/4 + 576)">>, <<"CP: 0x0000000000000000 (invalid)">>, <<"arity = 0">>,<<>>, <<"0x00002aaaabcda110 Return addr 0x00002b8952226498 (gen_server:call/3 + 128)">>, <<"y(0) #Ref<0.0.101.52434>">>, <<"y(1) 'ns_1@10.1.2.30'">>,<<"y(2) []">>, <<"y(3) infinity">>, <<"(4) {db_updated,{db,<0.28202.2>,<0.28203.2>,nil,<<16 bytes>>,<0.32412.2>,<0.32420.2>,{">>, <<"y(5) '$gen_call'">>, <<"y(6) <0.28202.2>">>,<<>>, <<"0x00002aaaabcda150 Return addr 0x00002aaaadf0fb50 (couch_db_updater:handle_call/3 + 5368)">>, <<"y(0) infinity">>, <<"(1) {db_updated,{db,<0.28202.2>,<0.28203.2>,nil,<<16 bytes>>,<0.32412.2>,<0.32420.2>,{">>, <<"y(2) <0.28202.2>">>, <<"y(3) Catch 0x00002b8952226498 (gen_server:call/3 + 128)">>, <<>>, <<"0x00002aaaabcda178 Return addr 0x00002b895222a618 (gen_server:handle_msg/5 + 272)">>, <<"y(0) []">>,<<"y(1) []">>, <<"y(2) <0.32412.2>">>, <<"(3) {db,<0.28202.2>,<0.28203.2>,<0.32369.2>,<<16 bytes>>,<0.28197.2>,<0.28204.2>,{db_h">>, <<"y(4) []">>, <<"y(5) \"/opt/couchbase/var/lib/couchdb/default/27.couch.1\"">>, <<"y(6) <<10 bytes>>">>, <<"y(7) \"/opt/couchbase/var/lib/couchdb/default/27.couch.2\"">>, <<"y(8) []">>,<<"y(9) <0.28197.2>">>, <<"(10) {db,<0.28202.2>,<0.28203.2>,nil,<<16 bytes>>,<0.32412.2>,<0.32420.2>,{db_header,8,">>, <<"y(11) []">>, <<"y(12) \"/opt/couchbase/var/lib/couchdb\"">>, <<>>, <<"0x00002aaaabcda1e8 Return addr 0x00002b89521adfe8 (proc_lib:init_p_do_apply/3 + 56)">>, <<"y(0) couch_db_updater">>, <<"(1) {db,<0.28202.2>,<0.28203.2>,<0.32369.2>,<<16 bytes>>,<0.28197.2>,<0.28204.2>,{db_h">>, <<"y(2) <0.28203.2>">>, <<"y(3) <0.28202.2>">>, <<"y(4) {compact_done,\"/opt/couchbase/var/lib/couchdb/default/27.couch.1.compact\"} ">>, <<"y(5) {<0.32369.2>,#Ref<0.0.101.52370>} ">>, <<"y(6) Catch 0x00002b895222a618 (gen_server:handle_msg/5 + 272)">>, <<>>, <<"0x00002aaaabcda228 Return addr 0x000000000088e318 (<terminate process normally>)">>, <<"y(0) Catch 0x00002b89521ae008 (proc_lib:init_p_do_apply/3 + 88)">>, <<>>]}, {error_handler,error_handler} , {garbage_collection, [ {min_bin_vheap_size,46368} , {min_heap_size,233} , {fullsweep_after,0} , {minor_gcs,0} ]}, {heap_size,1597} , {total_heap_size,1597} , {links,[<0.32369.2>]} , {memory,14104} , {message_queue_len,1} , {reductions,12954} , {trap_exit,true} ]},
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        changing to blocker because it's deadlock

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - changing to blocker because it's deadlock
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Filipe fixed it by this: http://review.couchbase.org/13032

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Filipe fixed it by this: http://review.couchbase.org/13032

          People

          • Assignee:
            FilipeManana Filipe Manana (Inactive)
            Reporter:
            karan Karan Kumar (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes