In MB-47387 (magma buckets slow to open) one cause of the rebalance failure is due to the janitor_agent using the wrong stat to determine bucket readiness for dcp connections. Here's some applicable snippets from that ticket.
Steve: Is it possible memcached is returning the vbucket stats when it's not completed all it's warm up activities (which means it'll return Etmpfail for dcp connections)?
|
Ben: It is indeed. https://github.com/couchbase/kv_engine/blob/master/engines/ep/src/warmup.h#L112-L219 Gives a good description of the various phases of warmup. After the PopulateVBucketMap phase vBucket stats should be retrievable. However, it's not until the Done state that Dcp Consumers are createable. The bulk of warmp (certainly for couchstore buckets) is going to be in some of the data loading phases done after PopulateVBucketMap. The changes that I mentioned earlier didn't change this though. They changed it so that after we enable traffic (mutations etc.) we also have to wait for all warmup threads to finish before we accept a Dcp Consumer (to prevent a race condition).(Side note - comment on isFinishedLoading() function allowing creation of DcpConsumers is out of data, it should be isComplete() now - will update).
|
Steve: Is there a way for ns_server to determine, via Stats, the bucket is ready to have dcp connections created successfully?
|
Aliaksei: https://github.com/couchbase/ns_server/blob/master/src/ns_memcached.erl#L1358.
|
and that code is
has_started_inner({ok, WarmupStats}) ->
|
case lists:keyfind(<<"ep_warmup_thread">>, 1, WarmupStats) of
|
{_, <<"complete">>} ->
|
true;
|
{_, V} when is_binary(V) ->
|
false
|
end.
|