Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-33875

Auto failover time increased by 500ms/600ms in MH

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Critical
    • Resolution: Fixed
    • 6.5.0
    • 6.5.0
    • ns_server
    • build : 6.5.0-2864
      hestia cluster, centos 7
    • Untriaged
    • Yes

    Description

      After following change in ns_server the failover  time increased from 110ms to 200+ ms:

      https://github.com/couchbase/ns_server/commit/bed1e6ea9959c99e9cf4cb6942a236be4c7702d1

      See logs attached.

      There was another failover time increase between 2864 and 2943. So total regression is about 500/600ms . I'll update the ticket as soon as I track that one as well.

      http://showfast.sc.couchbase.com/#/timeline/Linux/reb/failover/all

       

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-33875
          # Subject Branch Project Status CR V

          Activity

            wayne Wayne Siu added a comment -

            Ajit Yagaty [X]
            Just checking. Can you let us know if you have any estimate when you may look at this ticket? Thanks.

            wayne Wayne Siu added a comment - Ajit Yagaty [X] Just checking. Can you let us know if you have any estimate when you may look at this ticket? Thanks.

            Abhijeeth Nuthan - Can you please take a look at this?

            ajit.yagaty Ajit Yagaty [X] (Inactive) added a comment - Abhijeeth Nuthan - Can you please take a look at this?

            Apart from fixing MB-34378 , we can try to pipeline the set_vbucket requests as an optimization. 

            Abhijeeth.Nuthan Abhijeeth Nuthan added a comment - Apart from fixing MB-34378 , we can try to pipeline the set_vbucket requests as an optimization. 

            After http://review.couchbase.org/#/c/117246/ performance had improved. No we see failover time around 200ms (vs. 135ms before durability related changes)

            All set_vbucket calls together take arounf 66ms, so we decided that trying to improve this number with pipelining is not worth the additional code. Therefore closing the bug and abandoning pipelining related commits.

            artem Artem Stemkovski added a comment - After http://review.couchbase.org/#/c/117246/ performance had improved. No we see failover time around 200ms (vs. 135ms before durability related changes) All set_vbucket calls together take arounf 66ms, so we decided that trying to improve this number with pipelining is not worth the additional code. Therefore closing the bug and abandoning pipelining related commits.

            Dave Finlay is this an acceptable regression? up to 200ms from 120ms, 66% increase.

            korrigan.clark Korrigan Clark (Inactive) added a comment - Dave Finlay  is this an acceptable regression? up to 200ms from 120ms, 66% increase.
            dfinlay Dave Finlay added a comment - - edited

            Hi Korry - yes, given the extra work that's happening this is fine - and the fact that this is a relatively small percentage change in the overall failover time. Thanks.

            dfinlay Dave Finlay added a comment - - edited Hi Korry - yes, given the extra work that's happening this is fine - and the fact that this is a relatively small percentage change in the overall failover time. Thanks.

            People

              korrigan.clark Korrigan Clark (Inactive)
              oleksandr.gyryk Alex Gyryk (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty