Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-23074

Performance issues when running Couchbase Server on CentOS 7.3 with kernel 3.10.0-514.6

    XMLWordPrintable

Details

    Description

      We upgraded our performance clusters to CentOS 7.3 a few days ago.
      Unfortunately that upgrade caused a lot of troubles:

      • There was ~60% drop in DGM cases.
      • KV latency in non-DGM cases became more inconsistent.

      I started analyzing the most basic case with the initial data load. I noticed that the drain rate became more choppy on 3 boxes (see screenshot) while one server was working just fine.

      I tried to examine IO performance using standalone benchmarks but I didn't manage to find anything interesting. Only read and write performance of Couchbase Server was affected.

      Eventually I noticed a tiny difference between those boxes. "Bad" machines had kernel 3.10.0-514.6.2 and "good" machine had 3.10.0-514.2.2. A few experiments confirmed that upgrade from *.514.2.2 to *.514.6.2. caused all those problems.

      I downgraded our servers all the way to 3.10.0-317 and relaxed. Until I started working with a setup provided by one of our partners. That setup has RHEL 7.3 with 3.10.0-514.6.2 and I am supposed to run some heavy DGM workloads...

      RHEL/CentOS is a very conservative distribution. Who knows how long this issue will remain open. I think we better find out what exactly happened before other people start hitting the same problem.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              wayne Wayne Siu
              pavelpaulau Pavel Paulau (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty