Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-8126

request /pools/default/rebalanceProgress error timed out (cluster hangs)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 2.1.0
    • Fix Version/s: 2.1.0
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
      None

      Description

      I see 2 tests with the same symptoms:
      1. http://qa.hq.northscale.net/job/ubuntu-64-2.0-upgrade/108/consoleFull
      Rebalance exited with reason {important_nodes_went_down,
      {ns_node_disco_events,
      ['ns_1@10.3.3.19','ns_1@10.3.3.24',
      'ns_1@10.3.3.26','ns_1@10.3.3.27'],
      ['ns_1@10.3.3.19','ns_1@10.3.3.24',
      'ns_1@10.3.3.27']}}
      ns_orchestrator002 ns_1@10.3.3.24 00:14:51 - Thu Apr 18, 2013

      see MB-8127, perhaps it is the same reason

      2.http://qa.hq.northscale.net/job/ubuntu-64-2.0-biXDCR-all/303/console
      Rebalance completed successfully in this run

      2013-04-19 05:19:12 | ERROR | MainProcess | Cluster_Thread | [rest_client._rebalance_progress] unable to reach the host @ 10.3.121.56
      2013-04-19 05:21:23 | ERROR | MainProcess | Cluster_Thread | [rest_client._http_request] socket error while connecting to http://10.3.121.56:8091/pools/default/rebalanceProgress error timed out

      andrey@baranouski:~/repository/testrunner$ curl -u admin:password 'http://Administrator:Password@10.3.121.156:8091/pools/default/rebalanceProgress'
      curl: (7) couldn't connect to host

      ps: for both cases UI/clusters hang

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Show
        andreibaranouski Andrei Baranouski added a comment - https://s3.amazonaws.com/bugdb/jira/MB-8126/9c507c5f/10.3.121.56-4192013-532-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8126/9c507c5f/10.3.121.57-4192013-532-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8126/9c507c5f/10.3.121.58-4192013-533-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8126/9c507c5f/10.3.121.59-4192013-534-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8126/9c507c5f/10.3.121.60-4192013-535-diag.zip https://s3.amazonaws.com/bugdb/jira/MB-8126/9c507c5f/10.3.121.61-4192013-535-diag.zip
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Very possibly something serious. But I'd like to try to complete rebalance progress work today.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Very possibly something serious. But I'd like to try to complete rebalance progress work today.
        Hide
        Aliaksey Artamonau Aliaksey Artamonau added a comment -

        Need live cluster for debugging.

        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - Need live cluster for debugging.
        Hide
        andreibaranouski Andrei Baranouski added a comment -

        blocker is MB-8163 to provide info/reproduce

        Show
        andreibaranouski Andrei Baranouski added a comment - blocker is MB-8163 to provide info/reproduce
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Weird. MB-8163 is about mixed cluster rebalance. How mixed cluster rebalance may block this ?

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Weird. MB-8163 is about mixed cluster rebalance. How mixed cluster rebalance may block this ?
        Hide
        andreibaranouski Andrei Baranouski added a comment -

        I saw the errors on http://qa.hq.northscale.net/job/ubuntu-64-2.0-upgrade/108/consoleFull ( online upgrade - mixed cluster)

        Show
        andreibaranouski Andrei Baranouski added a comment - I saw the errors on http://qa.hq.northscale.net/job/ubuntu-64-2.0-upgrade/108/consoleFull ( online upgrade - mixed cluster)
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        We've found few processes stuck on global name resolution. Could be related to lack of -hidden on babysitter node. Will post fix for the latter soon. And lets hope it's 'it'.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - We've found few processes stuck on global name resolution. Could be related to lack of -hidden on babysitter node. Will post fix for the latter soon. And lets hope it's 'it'.
        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - http://review.couchbase.org/26090
        Hide
        maria Maria McDuff (Inactive) added a comment -

        pls verify / close.

        Show
        maria Maria McDuff (Inactive) added a comment - pls verify / close.

          People

          • Assignee:
            andreibaranouski Andrei Baranouski
            Reporter:
            andreibaranouski Andrei Baranouski
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes