Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 7.1.0
Affects Version/s: 7.1.0
Component/s: storage-engine
Labels:
Environment:
Enterprise Edition 7.1.0 build 1694 ‧

Triage:
Triaged
Operating System:
Centos 64-bit
Epic Link:
KV: Magma
Story Points:
1
Is this a Regression?:
No
Sprint:
KV 2021-Dec, KV 2022-Feb, KV March-22

Description

Script to Repro

This can happen in the tearDown part of the any test. So, in tearDown method we drop all the buckets and remove all the nodes in the cluster. This fails as shown below.

172.23.120.206 10:05:03 PM 11 Nov, 2021 ( 2021-11-11T22:05:03.228-08:00 )

Rebalance exited with reason {buckets_shutdown_wait_failed,

[{'ns_1@172.23.120.206',

{'EXIT',

{old_buckets_shutdown_wait_failed,

["-6AT-Evkts1eHVShDkwV6uJIF5j5BxpFu2DwiLTw0PnB0bYy-33-378000"]}}}]}.

Rebalance Operation Id = dbb8d76ebc02c654f2c23fbbabac68e9

Even retried rebalance failed.
172.23.120.206 10:06:28 PM 11 Nov, 2021

Rebalance exited with reason {buckets_shutdown_wait_failed,

[{'ns_1@172.23.120.206',

{'EXIT',

{old_buckets_shutdown_wait_failed,

["-6AT-Evkts1eHVShDkwV6uJIF5j5BxpFu2DwiLTw0PnB0bYy-33-378000"]}}}]}.

Rebalance Operation Id = 1754a1e783a53551c9f546338cebb3d7

Based on the failures it does look like the previously dropped bucket too longer than expected to get deleted.

172.23.104.186 10:03:32 PM 11 Nov, 2021

Shutting down bucket "-6AT-Evkts1eHVShDkwV6uJIF5j5BxpFu2DwiLTw0PnB0bYy-33-378000" on 'ns_1@172.23.104.186' for deletion

Maybe we need to figure out a way to disable the rebalance button until the bucket is fully deleted.

cbcollect_info attached.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

172.23.100.38.zip
24.32 MB
14/Mar/22 8:00 PM
172.23.100.39.zip
28.18 MB
14/Mar/22 8:54 PM
consoleText_MB-49512_rerun.txt
320 kB
25/Feb/22 9:24 PM
consoleText_MB-49512_run2_2211.txt
3.08 MB
02/Feb/22 8:23 PM
screenshot-1.png
36 kB
03/Dec/21 2:30 AM
Screenshot 2022-02-26 at 4.19.57 PM.png
294 kB
26/Feb/22 2:51 AM
UI_MB-49512.png
595 kB
22/Feb/22 5:07 AM

Issue Links

causes

MB-51132 SET Latency (99.9th percentile or higher) during Rebalance-swap increased on build 2335

Closed

is duplicated by

MB-48872 Bucket deletion can be blocked if another Bucket is performing higher-priority long-running tasks (e.g. Rollback)

Closed

relates to

MB-51477 [Magma] Rebalance exited with reason buckets_shutdown_wait_failed

Resolved

MB-50988 Rescheduled Compaction tasks do not obey the concurrency limit

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-49512
#	Subject	Branch	Project	Status	CR	V
170364,5	MB-49512: Folly executor subclass weirdness	master	kv_engine	Status: NEW	0	-1

Activity

People

Assignee:: Apaar Gupta

Reporter:: Balakumaran Gopal

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Due:: 10/Mar/22

Created:: 11/Nov/21 10:18 PM

Updated:: 21/Mar/22 6:25 AM

Resolved:: 16/Mar/22 11:00 AM

Gerrit Reviews

There is 1 open Gerrit change

MB-49512: Folly executor subclass weirdness

-1 Gerrit Review:

Show There are 43 closed Gerrit changes

Hide There are 43 closed Gerrit changes

MB-49512 magma: Add task groups to ThreadPool: Gerrit Review:

MB-49512 magma: Allow for deletion of thread pool group tasks: Gerrit Review:

MB-49512 magma: Add a cleanup function to worker Tasks: Gerrit Review:

MB-49512: Remove magma checkpoint creation during shutdown: Gerrit Review:

MB-49512 magma: Cancel all pending tasks during shutdown: Gerrit Review:

MB-49512 magma: Create checkpoints during Magma::Close(): Gerrit Review:

MB-49512 lsm: Allow for scheduled LSMTree flushes to be cancelled: Gerrit Review:

MB-49512: Obey concurrent compaction limit when rescheduling: Gerrit Review:

MB-49512: Use min of AuxIO and Writer threads to calc compaction limit: Gerrit Review:

MB-49512: Move logging and common executor code to GlobalTask::execute: Gerrit Review:

MB-49512: FollyExecutorPool use custom queue for actual work: Gerrit Review:

MB-49512: Cancel compactions during shutdown: Gerrit Review:

MB-49512: Remove tasks from custom cpuPool queue on unregister: Gerrit Review:

MB-49512: Reset tasks on scheduler thread [2/2]: Gerrit Review:

MB-49512: Join cpuPool threads before reset: Gerrit Review:

MB-49512: Avoid deadlock in cancel_can_schedule test: Gerrit Review:

MB-49512: Wait for flusher in test multiple vb compactions: Gerrit Review:

MB-50988: Obey concurrency limit for rescheduled tasks: Gerrit Review:

MB-49512: Split DurabilityEPBucketTest into smaller suites: Gerrit Review:

MB-49512: Don't run STItemPagerTest for nexus: Gerrit Review:

MB-49512: Unregister taskable in PoolThreadsAreRegisteredWithPhosphor: Gerrit Review:

MB-50941: Revert "MB-49512: Obey concurrent compaction limit when rescheduling": Gerrit Review:

MB-49512: Always flush stdout on engine_testapp result: Gerrit Review:

MB-49512: engine_testapp use std::cout over printf for Running...: Gerrit Review:

MB-49512: FollyExecutorPool - allow scheduler thread re-entrancy [1/2]: Gerrit Review:

MB-49512: Return enum status from KVStore::compactDB: Gerrit Review:

MB-49512: Don't treat aborted compactions as failures: Gerrit Review:

MB-49512: Remove commented out include: Gerrit Review:

MB-49512: Drop abort compaction log to debug: Gerrit Review:

MB-49512: Cancel compactions during EWB cancel: Gerrit Review:

MB-49512: Move compaction failure stat out of KVStore: Gerrit Review:

MB-49512: Make Taskable::isShutdown() const: Gerrit Review:

MB-51132: Only reset tasks on eventBase during unregisterTaskable: Gerrit Review:

MB-50988: Limit compaction concurrency with LimitedConcurrencyTask: Gerrit Review:

MB-49512: Disconnect compaction cookies on EWB cancel immediately: Gerrit Review:

Merge remote-tracking branch 'couchbase/master' into neo: Gerrit Review:

MB-49512: Remove redundant if statement: Gerrit Review:

MB-49512: Make Taskable::isShutdown() const: Gerrit Review:

MB-49512: Reset task ptr on scheduler thread during taskable shutdown: Gerrit Review:

MB-51319 lsm: Fix setting up cancel method for purger task: Gerrit Review:

MB-49512: Reset task ptr on scheduler thread during taskable shutdown: Gerrit Review:

Merge branch 'neo': Gerrit Review:

Merge branch 'neo' into 'master': Gerrit Review:

[Magma] - Cleaning up of the cluster fails with "Rebalance exited with reason {buckets_shutdown_wait_failed"

Details

Description

Attachments

Attachments

Issue Links

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty