Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-28457

Replication is less efficient on 5.5.0-1970

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Critical
    • 5.5.0
    • 5.5.0
    • couchbase-bucket
    • Cluster: hebe_kv
      OS: CentOS 7
      CPU: E5-2680 v3 (48 vCPU)
      Memory: 64GB
      Disk: Samsung Pro 850
    • Untriaged
    • Yes

    Description

      Test env and scenario:
      3 nodes, 1 replica
      20M items in the bucket, 1M ops/sec (50/50 R/W) ongoing

       

      Despite similar replication rate the replication queue on 5.5.0-1970 grows much faster
      causing overall performance  degradation due to low-mem scenarios like DGM.

       

      Changes in 5.5.0-1970:

      [+] 4fa4905 -------MB-26021------- [6/6]: Limit #checkpoint items flushed in a single batch
      https://github.com/couchbase/kv_engine/commit/4fa490526120424e82227b431ec0bb84b487ed37

      [+] 90c76d4 -------MB-26021------- [5/6]: Set max_checkpoints=100 & chk_max_items=10000
      https://github.com/couchbase/kv_engine/commit/90c76d4f0d99ef68ff5adb2fb667a4e20383a728

       

      Servers logs:
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hebe-tmp-32/172.23.100.204.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hebe-tmp-32/172.23.100.205.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-hebe-tmp-32/172.23.100.206.zip

       

          5.5.0-1969 versus 5.5.0-1970, replication queue:

      All stats:
      http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=hebe_550-1979_access_e9ad&snapshot=hebe_550-1911_access_f673

       

       

      Also, similar comparison but using pillowfight tests results:

      http://cbmonitor.sc.couchbase.com/reports/html/?snapshot=ares_550-1911_access_8d15&snapshot=ares_550-1979_access_6601&label=5.5.0-1911&label=5.5.0-1979

      Logs form 2-node pillowfight test:

      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-ares-7547/172.23.133.13.zip
      https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-ares-7547/172.23.133.14.zip

       

      Attachments

        1. 2054-ops-limit-100k.png
          90 kB
          Dave Rigby
        2. 2054-ops-old_cfg.png
          88 kB
          Dave Rigby
        3. 7d953b61958286f86d6bacb30330e47f.png
          74 kB
          Dave Rigby
        4. replication comparison 1969 to 1970.png
          57 kB
          Alex Gyryk
        5. Screen Shot 2018-03-02 at 17.02.21.png
          60 kB
          Dave Rigby
        6. Screen Shot 2018-03-08 at 21.30.28.png
          86 kB
          Dave Rigby
        7. Screen Shot 2018-03-08 at 21.31.21.png
          92 kB
          Dave Rigby
        8. Screen Shot 2018-03-12 at 12.27.51.png
          90 kB
          Dave Rigby
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            oleksandr.gyryk Alex Gyryk (Inactive)
            oleksandr.gyryk Alex Gyryk (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty