Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-49191

ns_server starting dcp connections before warmup complete

    XMLWordPrintable

Details

    • 1
    • KV 2022-Jan

    Description

      In MB-47387 (magma buckets slow to open) one cause of the rebalance failure is due to the janitor_agent using the wrong stat to determine bucket readiness for dcp connections. Here's some applicable snippets from that ticket.

      Steve: Is it possible memcached is returning the vbucket stats when it's not completed all it's warm up activities (which means it'll return Etmpfail for dcp connections)?

      Ben: It is indeed. https://github.com/couchbase/kv_engine/blob/master/engines/ep/src/warmup.h#L112-L219 Gives a good description of the various phases of warmup. After the PopulateVBucketMap phase vBucket stats should be retrievable. However, it's not until the Done state that Dcp Consumers are createable. The bulk of warmp (certainly for couchstore buckets) is going to be in some of the data loading phases done after PopulateVBucketMap. The changes that I mentioned earlier didn't change this though. They changed it so that after we enable traffic (mutations etc.) we also have to wait for all warmup threads to finish before we accept a Dcp Consumer (to prevent a race condition).(Side note - comment on isFinishedLoading() function allowing creation of DcpConsumers is out of data, it should be isComplete() now - will update). 

      Steve: Is there a way for ns_server to determine, via Stats, the bucket is ready to have dcp connections created successfully? 

      Aliaksei: https://github.com/couchbase/ns_server/blob/master/src/ns_memcached.erl#L1358. 

      and that code is

      has_started_inner({ok, WarmupStats}) ->
          case lists:keyfind(<<"ep_warmup_thread">>, 1, WarmupStats) of
              {_, <<"complete">>} ->
                  true;
              {_, V} when is_binary(V) ->
                  false
          end.
      

      Attachments

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

        Activity

          Created MB-50266 to refactor the code further to re-use the same code used for stats in Warmup::addStats for the stats also in EventuallyPersistentEngine::doEngineStatsLowCardinality().

          ben.huddleston Ben Huddleston added a comment - Created MB-50266 to refactor the code further to re-use the same code used for stats in Warmup::addStats for the stats also in EventuallyPersistentEngine::doEngineStatsLowCardinality().
          ben.huddleston Ben Huddleston added a comment - - edited

          Both changes have been merged. I think that this can be resolved but will let Steve Watanabe do it as there is a pending ns_server change.

          ben.huddleston Ben Huddleston added a comment - - edited Both changes have been merged. I think that this can be resolved but will let Steve Watanabe do it as there is a pending ns_server change.

          Build couchbase-server-7.1.0-2011 contains kv_engine commit 0fec938 with commit message:
          MB-49191: Use Warmup::isComplete() for ep_engine ep_warmup_thread

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-2011 contains kv_engine commit 0fec938 with commit message: MB-49191 : Use Warmup::isComplete() for ep_engine ep_warmup_thread

          Build couchbase-server-7.1.0-2011 contains kv_engine commit f2fd0ff with commit message:
          MB-49191: Make addStat in warmup.cc a lambda function

          build-team Couchbase Build Team added a comment - Build couchbase-server-7.1.0-2011 contains kv_engine commit f2fd0ff with commit message: MB-49191 : Make addStat in warmup.cc a lambda function

          Resolving ticket. Will reopen or create a new ticket should the issue reoccur.

          steve.watanabe Steve Watanabe added a comment - Resolving ticket. Will reopen or create a new ticket should the issue reoccur.

          People

            steve.watanabe Steve Watanabe
            steve.watanabe Steve Watanabe
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Gerrit Reviews

                There are no open Gerrit changes

                PagerDuty