Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-7383

active item resident ratio drop significantly when adding a 2.0 node to 1.8.1 cluster for upgrade ( sasl bucket )

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Cannot Reproduce
    • Affects Version/s: 2.0
    • Fix Version/s: 2.0.1
    • Component/s: couchbase-bucket
    • Security Level: Public
    • Labels:

      Description

      scenario:
      add 2x2.0 nodes to a 3 node cluster with 2 buckets ( default and sasl where active resident ration is 60% ) few minutes after upgrade process begins active item ration on 10.3.2.43 which is an existing 1.8.1 node , drops from 60% > 58>48->38-15->5 and then we decided to stop rebalance

      i grabbed diags from the node which resident ratio dropped from 63 percent to 5 percent.
      https://s3.amazonaws.com/bugdb/jira/systemtest/resident-ratio-drop-2853bb89.zip

      please open a bug asap and mention this there. assign the bug to chiyoung and hoping that he or mike can take a look at the cluster.

      [stats:debug] [2012-12-08 17:43:52]
      active item resident ratio is at 63% and everything looks normal

      and then at
      [stats:debug] [2012-12-08 17:45:32]
      vb_active_perc_mem_resident 58

      and at
      [stats:debug] [2012-12-08 17:47:12]
      vb_active_perc_mem_resident 49

      and at
      [stats:debug] [2012-12-08 18:03:51]
      vb_active_perc_mem_resident 35

      what has happened between 17:45 and 17:47 or between 17:56 and 18:03 that pushed the resident ratio this low.
      whatever it is there is a combination of 1.8.1 and 2.0 that is causing the issue

      collect info from other nodes:

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        farshid Farshid Ghods (Inactive) created issue -
        Show
        farshid Farshid Ghods (Inactive) added a comment - existing 1.8.1 node : https://s3.amazonaws.com/bugdb/jira/MB-7383/10.3.2.122-1292012-1553-diag.zip 2.0 node : https://s3.amazonaws.com/bugdb/jira/MB-7383/10.3.2.41-1292012-161-diag.zip existing 1.8.1 node : https://s3.amazonaws.com/bugdb/jira/MB-7383/10.3.2.43-1292012-1551-diag.zip existing 1.8.1 node : https://s3.amazonaws.com/bugdb/jira/MB-7383/10.3.2.47-1292012-1558-diag.zip
        farshid Farshid Ghods (Inactive) made changes -
        Field Original Value New Value
        Summary active item resident ratio drop significantly when adding a 2.0 node to 1.8.1 cluster for upgrade active item resident ratio drop significantly when adding a 2.0 node to 1.8.1 cluster for upgrade ( sasl bucket )
        Labels 2.0.0-hotfix
        Fix Version/s 2.0.1 [ 10399 ]
        Fix Version/s 2.0 [ 10114 ]
        Priority Major [ 3 ] Blocker [ 1 ]
        Component/s couchbase-bucket [ 10173 ]
        Show
        farshid Farshid Ghods (Inactive) added a comment - 2.0 node : 10.3.2.85 https://s3.amazonaws.com/bugdb/jira/MB-7383/10.3.2.85-1292012-1616-diag.zip
        farshid Farshid Ghods (Inactive) made changes -
        Assignee Chiyoung Seo [ chiyoung ]
        Hide
        chiyoung Chiyoung Seo added a comment -

        Farshid,

        This might be caused by the memory leak bug in 1.8.1. Can you please test it with the latest 1.8.1 patch (build 943?)?

        Show
        chiyoung Chiyoung Seo added a comment - Farshid, This might be caused by the memory leak bug in 1.8.1. Can you please test it with the latest 1.8.1 patch (build 943?)?
        Hide
        farshid Farshid Ghods (Inactive) added a comment - - edited

        from the email

        Chisheng,

        Lets patch the ep.so file from the one which is available from 1.8.1-943-rel build and apply it to all nodes in this cluster , then add another 2.0 node and rebalance the cluster again

        Please keep chiyoung in the loop after the experiment.

        Show
        farshid Farshid Ghods (Inactive) added a comment - - edited from the email Chisheng, Lets patch the ep.so file from the one which is available from 1.8.1-943-rel build and apply it to all nodes in this cluster , then add another 2.0 node and rebalance the cluster again Please keep chiyoung in the loop after the experiment.
        Hide
        farshid Farshid Ghods (Inactive) added a comment -

        QE will update the ticket when results are available

        Show
        farshid Farshid Ghods (Inactive) added a comment - QE will update the ticket when results are available
        farshid Farshid Ghods (Inactive) made changes -
        Assignee Chiyoung Seo [ chiyoung ] Chisheng Hong [ chisheng ]
        mikew Mike Wiederhold made changes -
        Sprint Status Current Sprint
        Show
        Chisheng Chisheng Hong (Inactive) added a comment - https://github.com/couchbaselabs/couchbase-qe-docs/blob/master/system-tests/pine-cluster/12-10-2012.txt Can not repro this on EC2 cluster for Centos. This problem is caused by slow disk speed in previous test: https://github.com/couchbaselabs/couchbase-qe-docs/blob/master/system-tests/pine-cluster/12-08-2012.txt
        Chisheng Chisheng Hong (Inactive) made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Cannot Reproduce [ 5 ]
        mikew Mike Wiederhold made changes -
        Sprint Status Current Sprint
        Hide
        raju Raju Suravarjjala added a comment -

        Bulk closing all invalid bugs that are duplicate, user error, invalid. Please feel free to reopen them if you feel otherwise

        Show
        raju Raju Suravarjjala added a comment - Bulk closing all invalid bugs that are duplicate, user error, invalid. Please feel free to reopen them if you feel otherwise
        raju Raju Suravarjjala made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            Chisheng Chisheng Hong (Inactive)
            Reporter:
            farshid Farshid Ghods (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes