Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7923

[windows] Rebalance failed due to reason not_all_nodes_are_ready_yet as memcached exited on one node

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.1.0
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
      None
    • Environment:
      2.0.1-179-rel

      Description

      Rebalance failed due to reason not_all_nodes_are_ready_yet as memcached exited on one node.

      Test to reproduce:

      ./testrunner -i vm-list.ini -t swaprebalance.SwapRebalanceBasicTests.do_test,replica=1,num-buckets=2,num-swap=2,swap-orchestrator=True,GROUP=P1

      Error received:

      [2013-03-18 03:37:42,330] - [rest_client:913] ERROR -

      {u'status': u'none', u'errorMessage': u'Rebalance failed. See logs for detailed reason. You can try rebalance again.'}

      - rebalance failed
      [2013-03-18 03:37:42,330] - [rest_client:914] INFO - Latest logs from UI:
      [2013-03-18 03:37:42,378] - [rest_client:915] ERROR - {u'node': u'ns_1@10.142.174.97', u'code': 2, u'text': u"Rebalance exited with reason

      {not_all_nodes_are_ready_yet ,\n ['ns_1@10.131.33.89']}

      \n", u'shortText': u'message', u'module': u'ns_orchestrator', u'tstamp': 1363577863726, u'type': u'info'}
      [2013-03-18 03:37:42,379] - [rest_client:915] ERROR -

      {u'node': u'ns_1@10.131.33.89', u'code': 0, u'text': u"Port server memcached on node 'ns_1@10.131.33.89' exited w ith status 71. Restarting. Messages: Mon Mar 18 03:37:43.505136 Coordinated Universal Time 3: bind(): No error\nMon Mar 18 03:37:43.505136 Coordinated Universal Time 3 : bind(): No error\nMon Mar 18 03:37:43.505136 Coordinated Universal Time 3: failed to listen on TCP port 11210: No error", u'shortText': u'message', u'module': u'ns_p ort_server', u'tstamp': 1363577863526, u'type': u'info'}

      [2013-03-18 03:37:42,379] - [rest_client:915] ERROR -

      {u'node': u'ns_1@10.131.33.89', u'code': 0, u'text': u"Port server memcached on node 'ns_1@10.131.33.89' exited w ith status 71. Restarting. Messages: Mon Mar 18 03:37:38.278834 Coordinated Universal Time 3: bind(): No error\nMon Mar 18 03:37:38.278834 Coordinated Universal Time 3 : bind(): No error\nMon Mar 18 03:37:38.278834 Coordinated Universal Time 3: failed to listen on TCP port 11210: No error", u'shortText': u'message', u'module': u'ns_p ort_server', u'tstamp': 1363577858300, u'type': u'info'}

      [2013-03-18 03:37:42,379] - [rest_client:915] ERROR -

      {u'node': u'ns_1@10.131.33.89', u'code': 0, u'text': u"Port server memcached on node 'ns_1@10.131.33.89' exited w ith status 71. Restarting. Messages: Mon Mar 18 03:37:33.114936 Coordinated Universal Time 3: bind(): No error\nMon Mar 18 03:37:33.114936 Coordinated Universal Time 3 : bind(): No error\nMon Mar 18 03:37:33.114936 Coordinated Universal Time 3: failed to listen on TCP port 11210: No error", u'shortText': u'message', u'module': u'ns_p ort_server', u'tstamp': 1363577853136, u'type': u'info'}

      [2013-03-18 03:37:42,379] - [rest_client:915] ERROR -

      {u'node': u'ns_1@10.131.33.89', u'code': 0, u'text': u"Port server memcached on node 'ns_1@10.131.33.89' exited w ith status 71. Restarting. Messages: Mon Mar 18 03:37:27.936385 Coordinated Universal Time 3: bind(): No error\nMon Mar 18 03:37:27.936385 Coordinated Universal Time 3 : bind(): No error\nMon Mar 18 03:37:27.937385 Coordinated Universal Time 3: failed to listen on TCP port 11210: No error", u'shortText': u'message', u'module': u'ns_p ort_server', u'tstamp': 1363577847969, u'type': u'info'}

      [2013-03-18 03:37:42,379] - [rest_client:915] ERROR -

      {u'node': u'ns_1@10.131.33.89', u'code': 1, u'text': u"Service memcached exited on node 'ns_1@10.131.33.89' in 0. 22s\n", u'shortText': u'port exited too soon after restart', u'module': u'supervisor_cushion', u'tstamp': 1363577842806, u'type': u'warning'}

      [2013-03-18 03:37:42,379] - [rest_client:915] ERROR -

      {u'node': u'ns_1@10.131.33.89', u'code': 0, u'text': u"Port server memcached on node 'ns_1@10.131.33.89' exited with status 71. Restarting. Messages: Mon Mar 18 03:37:22.777869 Coordinated Universal Time 3: bind(): No error\nMon Mar 18 03:37:22.778869 Coordinated Universal Time 3: bind(): No error\nMon Mar 18 03:37:22.778869 Coordinated Universal Time 3: failed to listen on TCP port 11210: No error", u'shortText': u'message', u'module': u'ns_port_server', u'tstamp': 1363577842806, u'type': u'info'}

      [2013-03-18 03:37:42,379] - [rest_client:915] ERROR -

      {u'node': u'ns_1@10.131.33.89', u'code': 1, u'text': u"Service memcached exited on node 'ns_1@10.131.33.89' in 0.20s\n", u'shortText': u'port exited too soon after restart', u'module': u'supervisor_cushion', u'tstamp': 1363577837582, u'type': u'warning'}

      [2013-03-18 03:37:42,380] - [rest_client:915] ERROR -

      {u'node': u'ns_1@10.131.33.89', u'code': 0, u'text': u"Port server memcached on node 'ns_1@10.131.33.89' exited with status 71. Restarting. Messages: Mon Mar 18 03:37:17.551347 Coordinated Universal Time 3: bind(): No error\nMon Mar 18 03:37:17.551347 Coordinated Universal Time 3: bind(): No error\nMon Mar 18 03:37:17.551347 Coordinated Universal Time 3: failed to listen on TCP port 11210: No error", u'shortText': u'message', u'module': u'ns_port_server', u'tstamp': 1363577837582, u'type': u'info'}

      [2013-03-18 03:37:42,380] - [rest_client:915] ERROR -

      {u'node': u'ns_1@10.131.33.89', u'code': 1, u'text': u"Service memcached exited on node 'ns_1@10.131.33.89' in 0.16s\n", u'shortText': u'port exited too soon after restart', u'module': u'supervisor_cushion', u'tstamp': 1363577832372, u'type': u'warning'}

      ERROR

      The logs have rolled over and not available for this timestamp. Will try to repro again and attach logs.

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        siri Sriram Melkote added a comment - - edited

        Deep, it looks like the old memcached process didn't exit completely, probably stuck in zombie state, thereby preventing the new one from starting up. A similar bug was fixed, MB-5388 - perhaps the fix was not effective on Windows. I've reassigned this to you, if it reproduces, can you please capture logs and assign back to me? Thanks.

        Show
        siri Sriram Melkote added a comment - - edited Deep, it looks like the old memcached process didn't exit completely, probably stuck in zombie state, thereby preventing the new one from starting up. A similar bug was fixed, MB-5388 - perhaps the fix was not effective on Windows. I've reassigned this to you, if it reproduces, can you please capture logs and assign back to me? Thanks.
        Hide
        maria Maria McDuff (Inactive) added a comment -

        Deep, pls update this bug today, 4/30 (your local time). We need to know if this is still an issue. thanks.

        Show
        maria Maria McDuff (Inactive) added a comment - Deep, pls update this bug today, 4/30 (your local time). We need to know if this is still an issue. thanks.
        Hide
        deepkaran.salooja Deepkaran Salooja added a comment -

        Not able to repro again after repeated attempts.

        Show
        deepkaran.salooja Deepkaran Salooja added a comment - Not able to repro again after repeated attempts.

          People

          • Assignee:
            deepkaran.salooja Deepkaran Salooja
            Reporter:
            deepkaran.salooja Deepkaran Salooja
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes