Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-28468

Blacklist FTS dcp logging due to FTS repeatedly attempting to setup DCP streams to non-existing vBuckets

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 5.5.0
    • Fix Version/s: 5.5.0
    • Component/s: couchbase-bucket
    • Labels:
    • Environment:
      5.5.0-2017; CentOS longevity (see MB-28453).
    • Triage:
      Untriaged
    • Is this a Regression?:
      Unknown

      Description

      When investigating MB-28453, I found that the memcached logs wrapped early (after 4H) due to the volume of log messages; and MB-28456 was opened to investigate.

      Amongst legitimate excessive logging by the memcached process (which we are fixing now); there were also "legitimate" warnings where FTS was attempting to Stream non-existenct vBuckets:

      $ grep "WARNING (default) DCP (Producer) eq_dcpq:fts" memcached.log |grep "Stream request failed because this vbucket doesn't exist" | cut -f 2- -d ' ' |wc -l
         61282
      

      i.e. 61,282 instances over 4 hours; average of ~255 per minute.

      This seems to be pointing at a problem in FTS - I wouldn't expect it would need to poll for Streams this way (should be driven by the vBucket map).

      Could you take a look and see if this is a problem on the FTS side?

        Attachments

          Issue Links

          For Gerrit Dashboard: MB-28468
          # Subject Branch Project Status CR V

            Activity

            Hide
            build-team Couchbase Build Team added a comment -

            Build couchbase-server-6.0.0-1074 contains kv_engine commit 2955fca7dd63f39cc84300b22e3cd485f722454d with commit message:
            MB-28468: Reduce log level for FTS DCP streams
            https://github.com/couchbase/kv_engine/commit/2955fca7dd63f39cc84300b22e3cd485f722454d

            Show
            build-team Couchbase Build Team added a comment - Build couchbase-server-6.0.0-1074 contains kv_engine commit 2955fca7dd63f39cc84300b22e3cd485f722454d with commit message: MB-28468 : Reduce log level for FTS DCP streams https://github.com/couchbase/kv_engine/commit/2955fca7dd63f39cc84300b22e3cd485f722454d
            Hide
            apiravi Aruna Piravi (Inactive) added a comment -

            I do not understand this fix completely. How do we know if we have blacklisted the logging at all?

            [root@localhost logs]# grep "WARNING (default) DCP (Producer) eq_dcpq:fts" memcached.log* |grep "Stream request failed because this vbucket doesn't exist" | cut -f 2- -d ' ' |wc -l
            22
            [root@localhost logs]# grep "WARNING (default) DCP (Producer) eq_dcpq:fts" memcached.log* | cut -f 2- -d ' ' |wc -l
            82
            

            Say we blacklisted the logging of fts dcp streams around the last message -

            memcached.log.000001.txt:2018-06-20T15:50:45.151688Z WARNING (default) DCP (Producer) eq_dcpq:fts:default_index_7bc3e7d6b67feceb_54820232-6c56d931 - (vb 344) Stream request failed because this vbucket doesn't exist
            

            I still see WARNING messages from eq_dcpq:fts even after 15:50:45-

            ady marked as dead
            memcached.log.000001.txt:2018-06-20T15:50:53.267629Z WARNING (default) DCP (Producer) eq_dcpq:fts:default_index_7bc3e7d6b67feceb_f4e0a48a-232fd257 - (vb 331) Cannot close stream because stream is already marked as dead
            memcached.log.000001.txt:2018-06-20T15:50:53.268935Z WARNING (default) DCP (Producer) eq_dcpq:fts:default_index_7bc3e7d6b67feceb_f4e0a48a-232fd257 - (vb 238) Cannot close stream because stream is already marked as dead
            memcached.log.000001.txt:2018-06-20T15:50:53.269079Z WARNING (default) DCP (Producer) eq_dcpq:fts:default_index_7bc3e7d6b67feceb_f4e0a48a-232fd257 - (vb 326) Cannot close stream because stream is already marked as dead
            memcached.log.000001.txt:2018-06-20T15:50:53.269324Z WARNING (default) DCP (Producer) eq_dcpq:fts:default_index_7bc3e7d6b67feceb_f4e0a48a-232fd257 - (vb 269) Cannot close stream because stream is already marked as dead
            memcached.log.000001.txt:2018-06-20T15:50:55.653005Z WARNING (default) DCP (Producer) eq_dcpq:fts:default_index_7bc3e7d6b67feceb_f4e0a48a-232fd257 - (vb 248) Cannot close stream because stream is already marked as dead
            

            The above is from node .139, logs below -

            https://s3.amazonaws.com/cb-engineering/Aruna/collectinfo-2018-06-21T223700-ns_1%40172.23.106.139.zip
            https://s3.amazonaws.com/cb-engineering/Aruna/collectinfo-2018-06-21T223700-ns_1%40172.23.106.175.zip
            https://s3.amazonaws.com/cb-engineering/Aruna/collectinfo-2018-06-21T223700-ns_1%40172.23.106.176.zip

            Show
            apiravi Aruna Piravi (Inactive) added a comment - I do not understand this fix completely. How do we know if we have blacklisted the logging at all? [root@localhost logs]# grep "WARNING (default) DCP (Producer) eq_dcpq:fts" memcached.log* |grep "Stream request failed because this vbucket doesn't exist" | cut -f 2- -d ' ' |wc -l 22 [root@localhost logs]# grep "WARNING (default) DCP (Producer) eq_dcpq:fts" memcached.log* | cut -f 2- -d ' ' |wc -l 82 Say we blacklisted the logging of fts dcp streams around the last message - memcached.log. 000001 .txt: 2018 - 06 -20T15: 50 : 45 .151688Z WARNING ( default ) DCP (Producer) eq_dcpq:fts:default_index_7bc3e7d6b67feceb_54820232-6c56d931 - (vb 344 ) Stream request failed because this vbucket doesn't exist I still see WARNING messages from eq_dcpq:fts even after 15:50:45- ady marked as dead memcached.log. 000001 .txt: 2018 - 06 -20T15: 50 : 53 .267629Z WARNING ( default ) DCP (Producer) eq_dcpq:fts:default_index_7bc3e7d6b67feceb_f4e0a48a-232fd257 - (vb 331 ) Cannot close stream because stream is already marked as dead memcached.log. 000001 .txt: 2018 - 06 -20T15: 50 : 53 .268935Z WARNING ( default ) DCP (Producer) eq_dcpq:fts:default_index_7bc3e7d6b67feceb_f4e0a48a-232fd257 - (vb 238 ) Cannot close stream because stream is already marked as dead memcached.log. 000001 .txt: 2018 - 06 -20T15: 50 : 53 .269079Z WARNING ( default ) DCP (Producer) eq_dcpq:fts:default_index_7bc3e7d6b67feceb_f4e0a48a-232fd257 - (vb 326 ) Cannot close stream because stream is already marked as dead memcached.log. 000001 .txt: 2018 - 06 -20T15: 50 : 53 .269324Z WARNING ( default ) DCP (Producer) eq_dcpq:fts:default_index_7bc3e7d6b67feceb_f4e0a48a-232fd257 - (vb 269 ) Cannot close stream because stream is already marked as dead memcached.log. 000001 .txt: 2018 - 06 -20T15: 50 : 55 .653005Z WARNING ( default ) DCP (Producer) eq_dcpq:fts:default_index_7bc3e7d6b67feceb_f4e0a48a-232fd257 - (vb 248 ) Cannot close stream because stream is already marked as dead The above is from node .139, logs below - https://s3.amazonaws.com/cb-engineering/Aruna/collectinfo-2018-06-21T223700-ns_1%40172.23.106.139.zip https://s3.amazonaws.com/cb-engineering/Aruna/collectinfo-2018-06-21T223700-ns_1%40172.23.106.175.zip https://s3.amazonaws.com/cb-engineering/Aruna/collectinfo-2018-06-21T223700-ns_1%40172.23.106.176.zip
            Hide
            drigby Dave Rigby added a comment -

            This is due to the fact that we only blacklist log messages which at the stream level (i.e. ActiveStream / PassiveStream objects); as those are the most chatty; we generally don't blacklist log messages for the overall producer.

            This is essentially because the original blacklisting mechanism was added for views; which essentially reconnects every 10s to every vBucket (and produced a huge amount of per-stream logging). We only included log messages which the view engine produced in normal operation (stream request; backfilling, etc) in the set of log messages which can be optionally disabled.

            The instance above ("Cannot close stream because stream is already marked as dead") comes from the DcpProducer function closeStream; i.e. it's a management operation at the connection level:

            ENGINE_ERROR_CODE DcpProducer::closeStream(uint32_t opaque, uint16_t vbucket) {
            ...
                } else {
                    if (!stream->isActive()) {
                        LOG(EXTENSION_LOG_WARNING, "%s (vb %d) Cannot close stream because "
                            "stream is already marked as dead", logHeader(), vbucket);
            

            Note it's using the global logger LOG() and not the per-connection ConnHandler::logger; as such it is always printed irrespective of if it's from FTS or not.

            Given that the original MB-28456 did highlight a huge number of messages which currently use global logging ("Stream request failed because this vbucket doesn't exist"); we currently arn't blacklisting those, and should probably expand the set of log messages which can be blacklisted to include that one (plus others).

            However, FTS seem to have addressed the excessive logging (given you only saw 82 messages in the last comment); I would suggest not making any further changes in Vulcan given where we are in the development process. (Also given that disabling logging isn't ideal as it makes debugging harder, I suggest we stick with the set of blacklisted logs we currently have).

            Show
            drigby Dave Rigby added a comment - This is due to the fact that we only blacklist log messages which at the stream level (i.e. ActiveStream / PassiveStream objects); as those are the most chatty; we generally don't blacklist log messages for the overall producer. This is essentially because the original blacklisting mechanism was added for views; which essentially reconnects every 10s to every vBucket (and produced a huge amount of per-stream logging). We only included log messages which the view engine produced in normal operation (stream request; backfilling, etc) in the set of log messages which can be optionally disabled. The instance above ("Cannot close stream because stream is already marked as dead") comes from the DcpProducer function closeStream ; i.e. it's a management operation at the connection level: ENGINE_ERROR_CODE DcpProducer::closeStream(uint32_t opaque, uint16_t vbucket) { ... } else { if (!stream->isActive()) { LOG(EXTENSION_LOG_WARNING, "%s (vb %d) Cannot close stream because " "stream is already marked as dead" , logHeader(), vbucket); Note it's using the global logger LOG() and not the per-connection ConnHandler::logger ; as such it is always printed irrespective of if it's from FTS or not. Given that the original MB-28456 did highlight a huge number of messages which currently use global logging ("Stream request failed because this vbucket doesn't exist"); we currently arn't blacklisting those, and should probably expand the set of log messages which can be blacklisted to include that one (plus others). However , FTS seem to have addressed the excessive logging (given you only saw 82 messages in the last comment); I would suggest not making any further changes in Vulcan given where we are in the development process. (Also given that disabling logging isn't ideal as it makes debugging harder, I suggest we stick with the set of blacklisted logs we currently have).
            Hide
            apiravi Aruna Piravi (Inactive) added a comment -

            thanks Dave Rigby, these logs are not from longevity test that the MB originated from.

            Assigning this Arunkumar Senthilnathan to check against logs from last run longevity test that contains this fix.

            Arunkumar Senthilnathan, pls just run

            $ grep "WARNING (default) DCP (Producer) eq_dcpq:fts" memcached.log |grep "Stream request failed because this vbucket doesn't exist" | cut f 2 -d ' ' |wc -l

            and see if the count is much lower. Thanks!

            Show
            apiravi Aruna Piravi (Inactive) added a comment - thanks Dave Rigby , these logs are not from longevity test that the MB originated from. Assigning this Arunkumar Senthilnathan to check against logs from last run longevity test that contains this fix. Arunkumar Senthilnathan , pls just run $ grep "WARNING (default) DCP (Producer) eq_dcpq:fts" memcached.log |grep "Stream request failed because this vbucket doesn't exist" | cut f 2 -d ' ' |wc -l and see if the count is much lower. Thanks!
            Hide
            arunkumar Arunkumar Senthilnathan added a comment -

            Count returned as 0 from the longevity cluster against RC1 run - closing this issue

            supportal: https://supportal.couchbase.com/snapshot/84f9764289331b1d4d1d133b40ada944::0

            Show
            arunkumar Arunkumar Senthilnathan added a comment - Count returned as 0 from the longevity cluster against RC1 run - closing this issue supportal: https://supportal.couchbase.com/snapshot/84f9764289331b1d4d1d133b40ada944::0

              People

              Assignee:
              arunkumar Arunkumar Senthilnathan
              Reporter:
              drigby Dave Rigby
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Gerrit Reviews

                  There are no open Gerrit changes

                    PagerDuty