Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-19503

Pauses in DCP stream sending

    XMLWordPrintable

Details

    • Untriaged
    • Unknown

    Description

      The can be huge pauses between creating and closing a DCP stream although there are not many mutions (a single one) to be sent. It happens in 4.1.1 but not in 4.1.0 (for more details about which commits exactly see below).

      I reproduce the issue mentioned on the forum [1]. I'm well aware that we don't support such undersized setups (1 core, 1GB ram), but I think we should have a look if the observations are a real issue or expected due to the recent changes (I fear that it might also show up in a bigger context, so it's great to have such a reduced reproducible way of showing it).

      It could be a view-engine issue not receiving the mutation, but I suspect ep-engine. I was able to bisect it down to 3 possible commits that might have introduced it:

      Given the nature out of the commits it's probably https://github.com/couchbase/ep-engine/commit/87869fd39dc4e2795d51554b549990a44aa38943

      Here's how to reproduce it:

      • Get a VM with 1 core and 1GB RAM (2GB RAM are also OK, which you need to compile Couchbase)
      • Set the auto-updater to 1 mutation and 1 second:

            curl -X POST 'http://Administrator:asdasd@10.142.200.101:8091/settings/viewUpdateDaemon' -d 'updateInterval=1000&updateMinChanges=1&replicaUpdateMinChanges=1'
        

      • Add a design document:

            curl "http://Administrator:asdasd@10.142.200.101:8092/default/_design/foo" -X PUT -H 'Content-Type: application/json' -d '{"views":{"bar":{"map":"function (doc, meta) {\n emit(meta.id, null);\n}"}}}'
        

      • Watch the indexing time via:
        tail -f var/lib/couchbase/logs/couchdb.log |grep 'Indexing time'
      • Run the attached `slowload.py` script (out Python SDK is needed to run it)

      You will see that the indexing time is in the 100s milliseconds range. But sometimes it takes several seconds (6-7 seconds on my machine). You can stop the script and have a look at the logs.

      When you look at the couchdb log, search for the high Indexing usage, you'll find something:

      [couchdb:info,2016-05-04T13:56:24.516Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]Updater reading changes from active partitions to update main set view group `_design/foo` from set `default`
      [couchdb:info,2016-05-04T13:56:24.523Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (in-memory) for partition 48 from sequence 72 to 73
      [couchdb:info,2016-05-04T13:56:24.524Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (in-memory) for partition 90 from sequence 85 to 86
      [couchdb:info,2016-05-04T13:56:32.149Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (in-memory) for partition 261 from sequence 119 to 120
      [couchdb:info,2016-05-04T13:56:32.151Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (on-disk) for partition 339 from sequence 69 to 70
      [couchdb:info,2016-05-04T13:56:32.153Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (on-disk) for partition 514 from sequence 119 to 120
      [couchdb:info,2016-05-04T13:56:32.155Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (on-disk) for partition 596 from sequence 69 to 70
      [couchdb:info,2016-05-04T13:56:32.156Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (on-disk) for partition 701 from sequence 61 to 62
      [couchdb:info,2016-05-04T13:56:32.158Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (on-disk) for partition 823 from sequence 72 to 73
      [couchdb:info,2016-05-04T13:56:32.159Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (on-disk) for partition 861 from sequence 85 to 86
      [couchdb:info,2016-05-04T13:56:32.160Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]Updater for main set view group `_design/foo`, set `default`, read a total of 1 changes
      [couchdb:info,2016-05-04T13:56:32.245Z,couchdb_ns_1@127.0.0.1:<0.1039.0>:couch_log:info:41]Updater checkpointing set view `default` update for main group `_design/foo`
      [couchdb:info,2016-05-04T13:56:32.249Z,couchdb_ns_1@127.0.0.1:<0.295.0>:couch_log:info:41]Set view `default`, main (prod) group `_design/foo`, updater finished
      Indexing time: 7.736 seconds
      Blocked time:  0.000 seconds
      Inserted IDs:  9
      Deleted IDs:   0
      Inserted KVs:  9
      Deleted KVs:   0
      Cleaned KVs:   0
      # seqs done:   9
      

      Between the snapshot marker from partition 90 and partition 261 (btw: partitions==vBuckets) there's a few seconds long gap:

      [couchdb:info,2016-05-04T13:56:24.524Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (in-memory) for partition 90 from sequence 85 to 86
      [couchdb:info,2016-05-04T13:56:32.149Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (in-memory) for partition 261 from sequence 119 to 120
      

      If you now look at the memcached log messages for partition 261:

      $ grep 'vb 261' var/lib/couchbase/logs/memcached.log.12.txt|tail -n 3
      2016-05-04T13:56:24.525384Z WARNING (default) DCP (Producer) eq_dcpq:mapreduce_view: default _design/foo (prod/main) - (vb 261) Creating stream with start seqno 119 and end seqno 120
      2016-05-04T13:56:24.525420Z WARNING (default) DCP (Producer) eq_dcpq:mapreduce_view: default _design/foo (prod/main) - (vb 261) stream created!
      2016-05-04T13:56:32.149689Z WARNING (default) DCP (Producer) eq_dcpq:mapreduce_view: default _design/foo (prod/main) - (vb 261) Stream closing, 0 items sent from backfill phase, 1 items sent from memory phase, 120 was last seqno sent, reason: The stream ended due to all items being streamed
      

      You see that there's also a long gab between the creation and closing the stream.

      [1]: https://forums.couchbase.com/t/4-1-0-ee-vs-4-1-1-ee-indexer-too-slow/8054

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-19503
          # Subject Branch Project Status CR V

          Activity

            People

              jwalker Jim Walker
              vmx Volker Mische
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty