Details
-
Bug
-
Resolution: Duplicate
-
Major
-
7.6.0
-
7.6.0-1805
-
Untriaged
-
Linux x86_64
-
-
0
-
Yes
Description
Steps:
- 3 node KV cluster with one magma bucket
+---------------+--------+-----------+----------+----------------------+
| Nodes | CPU | Mem_total | Mem_free | Swap_mem_used |
+---------------+--------+-----------+----------+----------------------+
| 172.23.108.69 | 15.549 | 4.03 GiB | 3.07 GiB | 23.25 MiB / 4.24 GiB |
| 172.23.108.67 | 27.050 | 4.03 GiB | 3.04 GiB | 20.50 MiB / 4.24 GiB |
| 172.23.108.68 | 15.749 | 4.03 GiB | 3.05 GiB | 4.75 MiB / 4.24 GiB |
+---------------+--------+-----------+----------+----------------------+
+---------+-------------------+----------+-------+-----------------------+-----------+
| Bucket | Type / Storage | Replicas | Items | RAM Quota / Used | Disk Used |
+---------+-------------------+----------+-------+-----------------------+-----------+
| default | couchbase / magma | 2 | 3322 | 9.37 GiB / 299.40 MiB | 37.21 MiB |
+---------+-------------------+----------+-------+-----------------------+-----------+
- Induce failure on one of the node (.68) to trigger auto-failover
Observation:
Just after failover starts, seeing 'Janitor cleanup failed on the error induced node'.
And failover completes successfully as expected apart from this error.
Logs:
[rebalance:error,2023-11-28T02:12:57.287-08:00,ns_1@172.23.108.67:<0.9458.848>:failover:janitor_buckets:615] Janitor cleanup of ["default"] failed after failover of ['ns_1@172.23.108.68']:
|
{error, {badmatch, false},
|
[{leader_activities, start_activity, 6,
|
[{file, "src/leader_activities.erl"},
|
{line, 185}]},
|
{leader_activities, run_activity, 6,
|
[{file, "src/leader_activities.erl"},
|
{line, 141}]},
|
{ns_janitor, run_buckets_cleanup_activity, 3,
|
[{file, "src/ns_janitor.erl"},
|
{line, 86}]},
|
{ns_janitor, cleanup_buckets, 2,
|
[{file, "src/ns_janitor.erl"},
|
{line, 78}]},
|
{failover, janitor_buckets, 2,
|
[{file, "src/failover.erl"},
|
{line, 597}]},
|
{failover, janitor_membase_buckets_group, 2,
|
[{file, "src/failover.erl"},
|
{line, 324}]},
|
{lists, flatmap_1, 2,
|
[{file, "lists.erl"},
|
{line, 1335}]},
|
{failover, handle_buckets_failover, 2,
|
[{file, "src/failover.erl"},
|
{line, 369}]}]}
|
TAF test:
guides/gradlew --refresh-dependencies testrunner -P jython=/opt/jython/bin/jython -P 'args=-i node.ini -p get-cbcollect-info=False,skip_cluster_reset=False,skip_collections_cleanup=True -t failover.AutoFailoverTests.AutoFailoverTests.test_autofailover,timeout=5,num_node_failures=1,nodes_init=3,failover_action=stop_server,num_items=10000,transaction_timeout=150,atomicity=True,durability=MAJORITY,replicas=2'
|
Issue not seen on 7.6.0-1767
Attachments
Issue Links
- is caused by
-
MB-59662 Failover incomplete issues: Failover couldn't complete on some nodes
- Closed