Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-4635

rebalancing nodes getting stuck in 1->6 or 2->1 with 100k items

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0-developer-preview-3
    • Fix Version/s: 2.0-developer-preview-4
    • Component/s: couchbase-bucket, ns_server
    • Security Level: Public
    • Labels:
      None
    • Environment:
      build 498

      Description

      diags attached

      in the same test run rebalance got stuck twice
      the first time i stopped it after 10 minutes.

      the diag was taken when rebalance was still stuck in second attempt

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        erros i se in the diag

        [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying
        [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying
        [ns_server:info] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_events:ns_node_disco_conf_events:handle_event:56] ns_node_disco_conf_events config all
        [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying
        [ns_server:info] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep:ns_config_rep:handle_info:181] Pushing config
        [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying
        [ns_server:info] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep:ns_config_rep:handle_info:183] Pushing config done
        [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying
        [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying
        [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying
        [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying

        no_db_file}
        [couchdb:info] [2012-01-12 13:51:42] [ns_1@10.1.6.149:'capi_set_view_manager-default':couch_log:error:42] MC daemon: Error opening vb 11 in <<"default">>: {not_found,

        Show
        farshid Farshid Ghods (Inactive) added a comment - erros i se in the diag [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying [ns_server:info] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_events:ns_node_disco_conf_events:handle_event:56] ns_node_disco_conf_events config all [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying [ns_server:info] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep:ns_config_rep:handle_info:181] Pushing config [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying [ns_server:info] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep:ns_config_rep:handle_info:183] Pushing config done [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying [ns_server:warn] [2012-01-12 12:37:50] [ns_1@10.1.6.149:ns_config_rep_merger:ns_config_rep:do_merge:276] config cas failed. Retrying no_db_file} [couchdb:info] [2012-01-12 13:51:42] [ns_1@10.1.6.149:'capi_set_view_manager-default':couch_log:error:42] MC daemon: Error opening vb 11 in <<"default">>: {not_found,
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        memcached crashed i think

        more errors here

        Port server memcached on node 'ns_1@10.1.6.152' exited with status 134. Restarting. Messages: Trying to connect to mccouch: "localhost:11213"
        Connected to mccouch: "localhost:11213"
        Trying to connect to mccouch: "localhost:11213"
        Connected to mccouch: "localhost:11213"
        Extension support isn't implemented in this version of bucket_engine
        Preloaded 0 keys (with metadata)
        The second phase of warmup took 7344 (us).
        memcached: objectregistry.cc:104: static void ObjectRegistry::onDeleteItem(Item*): Assertion `stats.memOverhead.get() < ((size_t)1<<(sizeof(size_t)*8-1))' failed. ns_port_server000 ns_1@10.1.6.152 13:57:59 - Thu Jan 12, 2012
        Port server memcached on node 'ns_1@10.1.6.154' exited with status 134. Restarting. Messages: Trying to connect to mccouch: "localhost:11213"
        Connected to mccouch: "localhost:11213"
        Trying to connect to mccouch: "localhost:11213"
        Connected to mccouch: "localhost:11213"
        Extension support isn't implemented in this version of bucket_engine
        Preloaded 0 keys (with metadata)
        The second phase of warmup took 2826 (us).
        memcached: objectregistry.cc:84: static void ObjectRegistry::onDeleteQueuedItem(QueuedItem*): Assertion `stats.memOverhead.get() < ((size_t)1<<(sizeof(size_t)*8-1))' failed.

        Show
        farshid Farshid Ghods (Inactive) added a comment - memcached crashed i think more errors here Port server memcached on node 'ns_1@10.1.6.152' exited with status 134. Restarting. Messages: Trying to connect to mccouch: "localhost:11213" Connected to mccouch: "localhost:11213" Trying to connect to mccouch: "localhost:11213" Connected to mccouch: "localhost:11213" Extension support isn't implemented in this version of bucket_engine Preloaded 0 keys (with metadata) The second phase of warmup took 7344 (us). memcached: objectregistry.cc:104: static void ObjectRegistry::onDeleteItem(Item*): Assertion `stats.memOverhead.get() < ((size_t)1<<(sizeof(size_t)*8-1))' failed. ns_port_server000 ns_1@10.1.6.152 13:57:59 - Thu Jan 12, 2012 Port server memcached on node 'ns_1@10.1.6.154' exited with status 134. Restarting. Messages: Trying to connect to mccouch: "localhost:11213" Connected to mccouch: "localhost:11213" Trying to connect to mccouch: "localhost:11213" Connected to mccouch: "localhost:11213" Extension support isn't implemented in this version of bucket_engine Preloaded 0 keys (with metadata) The second phase of warmup took 2826 (us). memcached: objectregistry.cc:84: static void ObjectRegistry::onDeleteQueuedItem(QueuedItem*): Assertion `stats.memOverhead.get() < ((size_t)1<<(sizeof(size_t)*8-1))' failed.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        btw. cas failed message is very very likely harmless. We do optimistic concurrency when merging config changes. CAS failed simply means we will retry again and likely succeed.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - btw. cas failed message is very very likely harmless. We do optimistic concurrency when merging config changes. CAS failed simply means we will retry again and likely succeed.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        At least part of issue was set view manager race(s). Aliaksey is continuing work here.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - At least part of issue was set view manager race(s). Aliaksey is continuing work here.

          People

          • Assignee:
            Aliaksey Artamonau Aliaksey Artamonau
            Reporter:
            farshid Farshid Ghods (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes