Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-42761

Slow KV requests to some nodes

    XMLWordPrintable

Details

    • Bug
    • Resolution: Not a Bug
    • Major
    • 6.6.1
    • 6.6.1
    • couchbase-bucket
    • None
    • 6.6.1-9182 and 6.6.1-9183
    • Triaged
    • Centos 64-bit
    • 1
    • No

    Description

      Script to Repro

      ./testrunner -i /tmp/win10-bucket-ops.ini rerun=False -t volumetests.test_system_orchestrator_heartbeats_and_timeouts.volume.test_volume_MB_41562,nodes_init=7,initial_load=3000000,replicas=2
      

       
      This is a new system test written to test MB-41562.

      Steps to Repro
      1. Create a 7 node cluster as shown below
      ------------------------++-------------

      Nodes Services Status

      ------------------------++-------------

      172.23.105.175 kv Cluster node
      172.23.106.233 None <--- IN —
      172.23.106.236 ['kv'] <--- IN —
      172.23.106.238 ['kv'] <--- IN —
      172.23.106.250 ['kv'] <--- IN —
      172.23.106.251 [‘index’] <--- IN —
      172.23.121.74 [‘n1ql’] <--- IN —

      ------------------------++-------------

      2. Set non default orchestrator heartbeats and timeouts.

      2020-11-01 22:12:51,023 | test  | INFO    | MainThread | [test_system_orchestrator_heartbeats_and_timeouts:test_volume_MB_41562:639] Step 1: Set Non default orchestrator heartbeats and timeouts
       
      curl http://localhost:9000/diag/eval -u Administrator:asdasd -d 'ns_config:set({mb_master, heartbeat_interval}, 500).'
      curl http://localhost:9000/diag/eval -u Administrator:asdasd -d 'ns_config:set({mb_master, timeout_interval_count}, 3).’
      curl http://localhost:9000/diag/eval -u Administrator:asdasd -d 'ns_config:set({leader_lease_acquire_worker, lease_time}, 5000).'
      curl http://localhost:9000/diag/eval -u Administrator:asdasd -d 'ns_config:set({leader_lease_acquire_worker, lease_grace_time}, 2000).'
      curl http://localhost:9000/diag/eval -u Administrator:asdasd -d 'ns_config:set({leader_lease_acquire_worker, lease_renew_after}, 500).'
      

      3. Do initial Data load and start running n1ql queries in the background.
      2020-11-13 18:49:55,753 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
      ---------------------------------------------------------------------------

      Bucket Type Replicas Durability TTL Items RAM Quota RAM Used Disk Used

      ---------------------------------------------------------------------------

      bucket1 membase 2 none 0 3000000 24950865920 1825757816 1774463492
      bucket2 membase 2 none 0 3000000 24950865920 1794652760 1595313108
      bucket3 membase 2 none 0 3000000 24950865920 1798160104 1669257546
      bucket4 membase 2 none 0 3000000 24950865920 1802871744 1839075769

      ---------------------------------------------------------------------------

      4. Do a rebalance in
      2020-11-13 18:49:58,676 | test | INFO | pool-1-thread-3 | [table_view:display:72] Rebalance Overview
      ------------------------------------

      Nodes Services Status

      ------------------------------------

      172.23.105.175 kv Cluster node
      172.23.106.250 kv Cluster node
      172.23.106.236 kv Cluster node
      172.23.106.251 index Cluster node
      172.23.106.233 kv Cluster node
      172.23.106.238 kv Cluster node
      172.23.121.74 n1ql Cluster node
      172.23.121.78 None <--- IN —

      ------------------------------------

      5. Find the orchestrator node, kill babysitter on orchestrator, do a hard failover, start couchbase-server, start delta recovery and rebalance. This is step is repeated 5 times in a loop.

      6. Do a rebalance out
      2020-11-13 21:29:57,996 | test | INFO | pool-1-thread-10 | [table_view:display:72] Rebalance Overview
      ------------------------------------

      Nodes Services Status

      ------------------------------------

      172.23.105.175 kv Cluster node
      172.23.121.78 [u'kv'] — OUT --->
      172.23.106.250 kv Cluster node
      172.23.106.236 kv Cluster node
      172.23.106.251 index Cluster node
      172.23.106.233 kv Cluster node
      172.23.106.238 kv Cluster node
      172.23.121.74 n1ql Cluster node

      ------------------------------------

      7. Flush all the buckets.

      Repeat steps 3-7 multiple times.

      I had done cbcollect to share with Ns_serv team. I saw the following line in the nutshell.

      • See CBSE-4320 for details about checkpoint usage.*

      Wanted to check if there is something of concern here.

      Attachments

        1. 175-mem.png
          175-mem.png
          641 kB
        2. 175-rr.png
          175-rr.png
          437 kB
        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          People

            Balakumaran.Gopal Balakumaran Gopal
            Balakumaran.Gopal Balakumaran Gopal
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty