Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7771

[RN 2.0.1]online upgrade 2.0.0 -> 2.0.1 rebalance exited with {{{linked_process_died,<18357.6680.0>,normal}, {gen_server,call, ['capi_set_view_manager-sasl', {set_vbucket_states,

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.0.1
    • Component/s: None
    • Security Level: Public
    • Flagged:
      Release Note

      Description

      Build 160

      performed test to verify MB-7571

      1976 nodes
      ip:10.3.121.112
      ip:10.3.121.113
      ip:10.3.121.114

      2.0.1-160 nodes
      ip:10.3.121.115
      ip:10.3.121.116

      Starting rebalance, KeepNodes = ['ns_1@10.3.121.112','ns_1@10.3.121.115',
      'ns_1@10.3.121.116'], EjectNodes = ['ns_1@10.3.121.113',
      'ns_1@10.3.121.114']

      Rebalance exited with reason {{

      {linked_process_died,<18357.6680.0>,normal}

      ,
      {gen_server,call,
      ['capi_set_view_manager-sasl',
      {set_vbucket_states,

      https://s3.amazonaws.com/bugdb/jira/MB-7571/10.3.121.112-8091-diag.txt.gz
      https://s3.amazonaws.com/bugdb/jira/MB-7571/10.3.121.116-2182013-1018-diag.zip
      https://s3.amazonaws.com/bugdb/jira/MB-7571/10.3.121.114-2182013-1051-diag.zip
      https://s3.amazonaws.com/bugdb/jira/MB-7571/10.3.121.115-2182013-113-diag.zip
      https://s3.amazonaws.com/bugdb/jira/MB-7571/10.3.121.113-2182013-1056-diag.zip

      during the fall of the rebalance the resident ration was 65-75%

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        does rebalance succeed second attempt ?

        is this a new test that you are running ?

        is this regression ?

        Show
        farshid Farshid Ghods (Inactive) added a comment - does rebalance succeed second attempt ? is this a new test that you are running ? is this regression ?
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        Alk,

        I am still waiting for Andrei to update the ticket with more info.

        Show
        farshid Farshid Ghods (Inactive) added a comment - Alk, I am still waiting for Andrei to update the ticket with more info.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Massive erlang timeouts.

        No signs of heavy swapping.

        Double check your environment again.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Massive erlang timeouts. No signs of heavy swapping. Double check your environment again.
        Hide
        andreibaranouski Andrei Baranouski added a comment -

        created automation test for it http://review.couchbase.org/#/c/24787/
        it was not reproduced with 500K*3
        try it with a large number of data

        Show
        andreibaranouski Andrei Baranouski added a comment - created automation test for it http://review.couchbase.org/#/c/24787/ it was not reproduced with 500K*3 try it with a large number of data
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        i saw the same issue with just 1M items and do swap rebalance , it failed IMMEDIATELY then i tried over and over and over and eventually it made some progress.

        Jin,

        I do highly recommend that engineers do try this scenario on ec2 or other well sized environments and see the failures. QE has reported many of these issues. in this example i did not see a rebalance failure on 2.0 only cluster but easily saw it on mixed cluster.

        Show
        farshid Farshid Ghods (Inactive) added a comment - i saw the same issue with just 1M items and do swap rebalance , it failed IMMEDIATELY then i tried over and over and over and eventually it made some progress. Jin, I do highly recommend that engineers do try this scenario on ec2 or other well sized environments and see the failures. QE has reported many of these issues. in this example i did not see a rebalance failure on 2.0 only cluster but easily saw it on mixed cluster.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        i am moving this bug out of 2.0.1 release because it has been seen many times and we know that timeouts can occur but making it a documentation issue.

        Show
        farshid Farshid Ghods (Inactive) added a comment - i am moving this bug out of 2.0.1 release because it has been seen many times and we know that timeouts can occur but making it a documentation issue.
        Hide
        kzeller kzeller added a comment -

        Added to RN 2.0.1 as Known Issue:

        <para>
        If you perform an online upgrade from Couchbase Server 2.0.0 to 2.0.1, rebalance may exit with the error:
        </para>
        <programlisting>
        {{

        {linked_process_died,<18357.6680.0>,normal}

        ,
        {gen_server,call,
        ['capi_set_view_manager-sasl',
        {set_vbucket_states,
        </programlisting>

        Show
        kzeller kzeller added a comment - Added to RN 2.0.1 as Known Issue: <para> If you perform an online upgrade from Couchbase Server 2.0.0 to 2.0.1, rebalance may exit with the error: </para> <programlisting> {{ {linked_process_died,<18357.6680.0>,normal} , {gen_server,call, ['capi_set_view_manager-sasl', {set_vbucket_states, </programlisting>
        Hide
        kzeller kzeller added a comment -

        Added to RN 2.0.1 as Known Issue:

        <para>
        If you perform an online upgrade from Couchbase Server 2.0.0 to 2.0.1, rebalance may exit with the error:
        </para>
        <programlisting>
        {{

        {linked_process_died,<18357.6680.0>,normal}

        ,
        {gen_server,call,
        ['capi_set_view_manager-sasl',
        {set_vbucket_states,
        </programlisting>

        Show
        kzeller kzeller added a comment - Added to RN 2.0.1 as Known Issue: <para> If you perform an online upgrade from Couchbase Server 2.0.0 to 2.0.1, rebalance may exit with the error: </para> <programlisting> {{ {linked_process_died,<18357.6680.0>,normal} , {gen_server,call, ['capi_set_view_manager-sasl', {set_vbucket_states, </programlisting>
        Hide
        TimSmith Tim Smith (Inactive) added a comment -

        Karen, the release note should mention what to do in this case. If the rebalance fails due to this problem, the correct procedure is to wait 5 minutes and then restart the rebalance. It may require several restarts until the rebalance completes 100%.

        Show
        TimSmith Tim Smith (Inactive) added a comment - Karen, the release note should mention what to do in this case. If the rebalance fails due to this problem, the correct procedure is to wait 5 minutes and then restart the rebalance. It may require several restarts until the rebalance completes 100%.

          People

          • Assignee:
            kzeller kzeller
            Reporter:
            andreibaranouski Andrei Baranouski
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes