Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-5546

Increasing the default timeouts on ns_server to avoid rebalance failures due to ep-engine stats timeout issues in large cluster or clusters where some nodes are actively using swap

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 1.8.1-release-candidate
    • Fix Version/s: 1.8.1
    • Component/s: ns_server
    • Security Level: Public
    • Labels:
      None
    • Environment:
      Windows small/large cluster
      Linux small/large cluster

      Bucket 1, default
      vbuckets 1024
      RAM 18.7G
      Nodes 4 ( 2 form the base-cluster)
      Items Setup for 20M items

      Description

      Related issues:-
      MB-5360
      MB-5352

      We have multiple bugs related to the timeouts we are hitting on ns_server :-
      1) When in swap
      2) On windows even on a small cluster.

      This bug is to recommend increasing the default timeouts.

      We used the following timeouts on most of the params, its not all in one solution, but hopefully would cover basic secnarios.
      ns_memcached_outer, 60000
      ns_memcached_open_checkpoint, 60000
      ns_memcached_outer_heavy, 60000
      ns_memcached_outer_very_heavy, 120000
      ns_memcached_connected, 10000
      ebucketmigrator_connect, 60000

      Summary, some error messages and fixes that worked:-
      1) Rebalance exited with reason

      {exited}

      {'EXIT',<0.22700.12>,{timeout,{gen_server,call,[

      {'ns_memcached-default','ns_1@10.3.2.81'},{stats,<<"tap">>},30000]}}}}
      Fix : adjust timeout value - 120sec - ns_memcached_outer_very_heavy
      2) Rebalance exited with reason {exited,
      {replicator_died,

      Fix: Adjust timeout value - 120 sec - ns_memcached_outer_heavy

      3) Rebalance exited with reason {exited,
      {'EXIT',<0.24287.15>,
      {timeout,
      {gen_server,call,
      [{'ns_memcached-default','ns_1@10.3.2.81'}

      ,

      {stats,<<"tap">>}

      ,
      30000]}}}}

      Fix : Adjust timeout to 120sec
      4) Rebalance exited with reason {{change_filter_failed,
      {'EXIT',
      {timeout,

      Fix : Adjust timeout values -
      ebucketmigrator_connect 120 secs
      ns_memcached_connected 1 sec

      No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

        karan Karan Kumar (Inactive) created issue -
        alkondratenko Aleksey Kondratenko (Inactive) made changes -
        Field Original Value New Value
        Assignee Aleksey Kondratenko [ alkondratenko ]
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        I'm a little bit reluctant to change ns_memcached_connected timeout. It's timeout we're using when asking if ns_memcached is alive. It'll just mark bucket as not quite healthy without failing anything. So raising timeout has some effects on autofailover and other things. Something I don't want to do.

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - I'm a little bit reluctant to change ns_memcached_connected timeout. It's timeout we're using when asking if ns_memcached is alive. It'll just mark bucket as not quite healthy without failing anything. So raising timeout has some effects on autofailover and other things. Something I don't want to do.
        Hide
        alkondratenko Aleksey Kondratenko (Inactive) added a comment -

        Timeouts were bumped in a commit merged for branch-181 and merged up to master. Except, as noted above, ns_memcached_connected timeout

        Show
        alkondratenko Aleksey Kondratenko (Inactive) added a comment - Timeouts were bumped in a commit merged for branch-181 and merged up to master. Except, as noted above, ns_memcached_connected timeout
        Hide
        karan Karan Kumar (Inactive) added a comment -

        Thanks Alk.
        Duly noted the concerns.
        http://review.couchbase.org/#change,17230

        Show
        karan Karan Kumar (Inactive) added a comment - Thanks Alk. Duly noted the concerns. http://review.couchbase.org/#change,17230
        karan Karan Kumar (Inactive) made changes -
        Status Open [ 1 ] Resolved [ 5 ]
        Resolution Fixed [ 1 ]
        Hide
        thuan Thuan Nguyen added a comment -

        Integrated in github-ns-server-2-0 #374 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/374/)
        MB-5546: raised some timeouts to cope with some paging (Revision 0998b9c92a78185eae31dcbdd55ad92e07e0e6a8)

        Result = SUCCESS
        Aliaksey Artamonau :
        Files :

        • src/ns_memcached.erl
        • src/ebucketmigrator_srv.erl
        Show
        thuan Thuan Nguyen added a comment - Integrated in github-ns-server-2-0 #374 (See http://qa.hq.northscale.net/job/github-ns-server-2-0/374/ ) MB-5546 : raised some timeouts to cope with some paging (Revision 0998b9c92a78185eae31dcbdd55ad92e07e0e6a8) Result = SUCCESS Aliaksey Artamonau : Files : src/ns_memcached.erl src/ebucketmigrator_srv.erl
        dipti Dipti Borkar made changes -
        Sprint Status Current Sprint
        Sprint Priority 0
        farshid Farshid Ghods (Inactive) made changes -
        Summary Increasing the default timeouts on ns_server Increasing the default timeouts on ns_server to avoid rebalance failures due to ep-engine stats timeout issues in large cluster or clusters where some nodes are actively using swap
        farshid Farshid Ghods (Inactive) made changes -
        Status Resolved [ 5 ] Closed [ 6 ]

          People

          • Assignee:
            alkondratenko Aleksey Kondratenko (Inactive)
            Reporter:
            karan Karan Kumar (Inactive)
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Gerrit Reviews

              There are no open Gerrit changes