Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7799

[system test]: swap rebalance in online upgrade from 1.8.1 to 2.0.1-160 failed due to timeout ( erlang crash dump detected)

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 2.0.1
    • Fix Version/s: 2.0.1
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
    • Environment:
      windows server 2008 R2 64bit

      Description

      Do online upgrade in windows from 1.8.1 to 2.0.1-160

      Create 4 nodes cluster with version 1.8.1. (system has 8GB RAM, use 6GB for buckets)
      Load 10 M items to default bucket and 2 M items to sasl bucket
      Create 2.0.1-160 node and do swap rebalance to remove one 1.8.1 node out of cluster
      Rebalance failed due to time out on node with build 2.0.1

      Rebalance exited with reason {timeout,
      {gen_server,call,
      [

      {'ns_memcached-sasl', 'ns_1@win-1605.hq.couchbase.com'}

      ,

      {set_vbucket,217,replica}

      ,
      180000]}}
      ns_orchestrator002 ns_1@win-1605.hq.couchbase.com 15:51:07 - Wed Feb 20, 2013
      Node 'ns_1@10.3.2.75' saw that node 'ns_1@win-1605.hq.couchbase.com' came up. ns_node_disco004 ns_1@10.3.2.75 15:21:26 - Wed Feb 20, 2013
      Node 'ns_1@10.3.2.77' saw that node 'ns_1@win-1605.hq.couchbase.com' came up. ns_node_disco004 ns_1@10.3.2.77 15:21:23 - Wed Feb 20, 2013
      Node 'ns_1@10.3.2.76' saw that node 'ns_1@win-1605.hq.couchbase.com' came up. ns_node_disco004 ns_1@10.3.2.76 15:21:22 - Wed Feb 20, 2013
      Started node add transaction by adding node 'ns_1@win-1605.hq.couchbase.com' to nodes_wanted
      ns_cluster000 ns_1@10.3.2.76 15:21:21 - Wed Feb 20, 2013
      Node 'ns_1@10.3.121.182' saw that node 'ns_1@win-1605.hq.couchbase.com' came up. ns_node_disco004 ns_1@10.3.121.182 15:21:14 - Wed Feb 20, 2013
      Current master is older (before 2.0.1) and I'll try to takeover (repeated 1 times) mb_master000 ns_1@win-1605.hq.couchbase.com 15:20:57 - Wed Feb 20, 2013
      Bucket "sasl" rebalance appears to be swap rebalance ns_vbucket_mover000 ns_1@win-1605.hq.couchbase.com 15:20:23 - Wed Feb 20, 2013
      Bucket "sasl" loaded on node 'ns_1@win-1605.hq.couchbase.com' in 0 seconds. ns_memcached001 ns_1@win-1605.hq.couchbase.com 15:20:21 - Wed Feb 20, 2013
      Started rebalancing bucket sasl ns_rebalancer000 ns_1@win-1605.hq.couchbase.com 15:20:20 - Wed Feb 20, 2013
      Deleting old data files of bucket "default" ns_storage_conf000 ns_1@win-1605.hq.couchbase.com 15:20:10 - Wed Feb 20, 2013
      Starting rebalance, KeepNodes = ['ns_1@win-1605.hq.couchbase.com',
      'ns_1@10.3.2.76','ns_1@10.3.121.182',
      'ns_1@10.3.2.75'], EjectNodes = ['ns_1@10.3.2.77']
      ns_orchestrator004 ns_1@win-1605.hq.couchbase.com 15:20:10 - Wed Feb 20, 2013
      Haven't heard from a higher priority node or a master, so I'm taking over. mb_master000 ns_1@win-1605.hq.couchbase.com 15:20:10 - Wed Feb 20, 2013
      Node ns_1@win-1605.hq.couchbase.com joined cluster ns_cluster003 ns_1@win-1605.hq.couchbase.com 15:19:58 - Wed Feb 20, 2013
      Couchbase Server has started on web port 8091 on node 'ns_1@win-1605.hq.couchbase.com'. menelaus_sup001 ns_1@win-1605.hq.couchbase.com 15:19:58 - Wed Feb 20, 2013
      Current master is older (before 2.0.1) and I'll try to takeover mb_master000 ns_1@win-1605.hq.couchbase.com 15:19:58 - Wed Feb 20, 2013

      Will add collect info files soon

      1. ss_2013-02-26_at_12.15.59 AM.png
        45 kB
      # Subject Project Status CR V
      For Gerrit Dashboard: &For+MB-7799=message:MB-7799

        Activity

        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - http://review.couchbase.org/24882
        Hide
        Aliaksey Artamonau Aliaksey Artamonau added a comment -

        We merged the fix. Please rerun when new build is ready.

        Show
        Aliaksey Artamonau Aliaksey Artamonau added a comment - We merged the fix. Please rerun when new build is ready.
        Hide
        jin Jin Lim (Inactive) added a comment -

        Will mark this resolved as the fix is available, please close it after verification.

        Show
        jin Jin Lim (Inactive) added a comment - Will mark this resolved as the fix is available, please close it after verification.
        Hide
        thuan Thuan Nguyen added a comment -

        Verify online upgrade from 1.8.1-945 to 2.0.1-168, swap rebalance passed. All 1.8.1 nodes swapped out with 2.0.1-168 nodes. I saw mem usage of beam.smp was stable.
        I will close this bug.

        Show
        thuan Thuan Nguyen added a comment - Verify online upgrade from 1.8.1-945 to 2.0.1-168, swap rebalance passed. All 1.8.1 nodes swapped out with 2.0.1-168 nodes. I saw mem usage of beam.smp was stable. I will close this bug.
        Hide
        thuan Thuan Nguyen added a comment -

        The cluster used to verify this bug is ubuntu 11.04 64bit in EC2 with 15GB RAM, 4 core CPU.

        Show
        thuan Thuan Nguyen added a comment - The cluster used to verify this bug is ubuntu 11.04 64bit in EC2 with 15GB RAM, 4 core CPU.

          People

          • Assignee:
            thuan Thuan Nguyen
            Reporter:
            thuan Thuan Nguyen
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes