Details
-
Bug
-
Resolution: Not a Bug
-
Major
-
6.6.1
-
None
-
6.6.1-9182 and 6.6.1-9183
-
Triaged
-
Centos 64-bit
-
1
-
No
Description
Script to Repro
./testrunner -i /tmp/win10-bucket-ops.ini rerun=False -t volumetests.test_system_orchestrator_heartbeats_and_timeouts.volume.test_volume_MB_41562,nodes_init=7,initial_load=3000000,replicas=2
|
This is a new system test written to test MB-41562.
Steps to Repro
1. Create a 7 node cluster as shown below
------------------------++-------------
Nodes | Services | Status |
------------------------++-------------
172.23.105.175 | kv | Cluster node |
172.23.106.233 | None | <--- IN — |
172.23.106.236 | ['kv'] | <--- IN — |
172.23.106.238 | ['kv'] | <--- IN — |
172.23.106.250 | ['kv'] | <--- IN — |
172.23.106.251 | [‘index’] | <--- IN — |
172.23.121.74 | [‘n1ql’] | <--- IN — |
------------------------++-------------
2. Set non default orchestrator heartbeats and timeouts.
2020-11-01 22:12:51,023 | test | INFO | MainThread | [test_system_orchestrator_heartbeats_and_timeouts:test_volume_MB_41562:639] Step 1: Set Non default orchestrator heartbeats and timeouts
|
|
curl http://localhost:9000/diag/eval -u Administrator:asdasd -d 'ns_config:set({mb_master, heartbeat_interval}, 500).'
|
curl http://localhost:9000/diag/eval -u Administrator:asdasd -d 'ns_config:set({mb_master, timeout_interval_count}, 3).’
|
curl http://localhost:9000/diag/eval -u Administrator:asdasd -d 'ns_config:set({leader_lease_acquire_worker, lease_time}, 5000).'
|
curl http://localhost:9000/diag/eval -u Administrator:asdasd -d 'ns_config:set({leader_lease_acquire_worker, lease_grace_time}, 2000).'
|
curl http://localhost:9000/diag/eval -u Administrator:asdasd -d 'ns_config:set({leader_lease_acquire_worker, lease_renew_after}, 500).'
|
3. Do initial Data load and start running n1ql queries in the background.
2020-11-13 18:49:55,753 | test | INFO | MainThread | [table_view:display:72] Bucket statistics
---------------------------------------------------------------------------
Bucket | Type | Replicas | Durability | TTL | Items | RAM Quota | RAM Used | Disk Used |
---------------------------------------------------------------------------
bucket1 | membase | 2 | none | 0 | 3000000 | 24950865920 | 1825757816 | 1774463492 |
bucket2 | membase | 2 | none | 0 | 3000000 | 24950865920 | 1794652760 | 1595313108 |
bucket3 | membase | 2 | none | 0 | 3000000 | 24950865920 | 1798160104 | 1669257546 |
bucket4 | membase | 2 | none | 0 | 3000000 | 24950865920 | 1802871744 | 1839075769 |
---------------------------------------------------------------------------
4. Do a rebalance in
2020-11-13 18:49:58,676 | test | INFO | pool-1-thread-3 | [table_view:display:72] Rebalance Overview
------------------------------------
Nodes | Services | Status |
------------------------------------
172.23.105.175 | kv | Cluster node |
172.23.106.250 | kv | Cluster node |
172.23.106.236 | kv | Cluster node |
172.23.106.251 | index | Cluster node |
172.23.106.233 | kv | Cluster node |
172.23.106.238 | kv | Cluster node |
172.23.121.74 | n1ql | Cluster node |
172.23.121.78 | None | <--- IN — |
------------------------------------
5. Find the orchestrator node, kill babysitter on orchestrator, do a hard failover, start couchbase-server, start delta recovery and rebalance. This is step is repeated 5 times in a loop.
6. Do a rebalance out
2020-11-13 21:29:57,996 | test | INFO | pool-1-thread-10 | [table_view:display:72] Rebalance Overview
------------------------------------
Nodes | Services | Status |
------------------------------------
172.23.105.175 | kv | Cluster node |
172.23.121.78 | [u'kv'] | — OUT ---> |
172.23.106.250 | kv | Cluster node |
172.23.106.236 | kv | Cluster node |
172.23.106.251 | index | Cluster node |
172.23.106.233 | kv | Cluster node |
172.23.106.238 | kv | Cluster node |
172.23.121.74 | n1ql | Cluster node |
------------------------------------
7. Flush all the buckets.
Repeat steps 3-7 multiple times.
I had done cbcollect to share with Ns_serv team. I saw the following line in the nutshell.
- See CBSE-4320 for details about checkpoint usage.*
Wanted to check if there is something of concern here.