Details
-
Bug
-
Resolution: Unresolved
-
Major
-
6.0.5, 6.5.2, 6.6.5, 7.0.4, 7.1.1
-
None
-
Untriaged
-
1
-
Unknown
Description
This issue is very similar to MB-47267. During delta node recovery ns_server imposes a 1 minute timeout for the janitor to find that Buckets have been created on the incoming node(s). As part of Bucket initialization we schedule (but don't wait to run) the Warmup tasks which drive a lot of IO work. Should we have many Buckets we may see the disk work required during initialization of a Bucket become slow enough that we hit this 1 minute timeout if the disk cannot cope with the warmup of other Buckets + initialization of some given Bucket. Whilst we'd probably chalk this up as a slow disk issue if we saw this with a single Bucket, it has been observed that Warmup of other Buckets has an impact on the time it takes for us to initialize any given Bucket.
Potential solution
- We could perhaps remove the scheduling of the Warmup tasks on Bucket initialization and instead expose some API to ns_server to schedule warmup. ns_server could then start warmup on all Buckets when it finds that all Buckets have been created (kv_engine does not know how many Buckets will be created).