Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: 3.1.6, 4.1.2, 4.5.0
Affects Version/s: 3.1.5, 4.1.1
Component/s: couchbase-bucket
Labels:
- releasenote

Triage:
Untriaged
Is this a Regression?:
Unknown

Description

The can be huge pauses between creating and closing a DCP stream although there are not many mutions (a single one) to be sent. It happens in 4.1.1 but not in 4.1.0 (for more details about which commits exactly see below).

I reproduce the issue mentioned on the forum [1]. I'm well aware that we don't support such undersized setups (1 core, 1GB ram), but I think we should have a look if the observations are a real issue or expected due to the recent changes (I fear that it might also show up in a bigger context, so it's great to have such a reduced reproducible way of showing it).

It could be a view-engine issue not receiving the mutation, but I suspect ep-engine. I was able to bisect it down to 3 possible commits that might have introduced it:

Given the nature out of the commits it's probably https://github.com/couchbase/ep-engine/commit/87869fd39dc4e2795d51554b549990a44aa38943

Here's how to reproduce it:

Get a VM with 1 core and 1GB RAM (2GB RAM are also OK, which you need to compile Couchbase)

Set the auto-updater to 1 mutation and 1 second:

    curl -X POST 'http://Administrator:asdasd@10.142.200.101:8091/settings/viewUpdateDaemon' -d 'updateInterval=1000&updateMinChanges=1&replicaUpdateMinChanges=1'

Add a design document:

    curl "http://Administrator:asdasd@10.142.200.101:8092/default/_design/foo" -X PUT -H 'Content-Type: application/json' -d '{"views":{"bar":{"map":"function (doc, meta) {\n emit(meta.id, null);\n}"}}}'

Watch the indexing time via:
tail -f var/lib/couchbase/logs/couchdb.log |grep 'Indexing time'
Run the attached `slowload.py` script (out Python SDK is needed to run it)

You will see that the indexing time is in the 100s milliseconds range. But sometimes it takes several seconds (6-7 seconds on my machine). You can stop the script and have a look at the logs.

When you look at the couchdb log, search for the high Indexing usage, you'll find something:

[couchdb:info,2016-05-04T13:56:24.516Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]Updater reading changes from active partitions to update main set view group `_design/foo` from set `default`

[couchdb:info,2016-05-04T13:56:24.523Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (in-memory) for partition 48 from sequence 72 to 73

[couchdb:info,2016-05-04T13:56:24.524Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (in-memory) for partition 90 from sequence 85 to 86

[couchdb:info,2016-05-04T13:56:32.149Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (in-memory) for partition 261 from sequence 119 to 120

[couchdb:info,2016-05-04T13:56:32.151Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (on-disk) for partition 339 from sequence 69 to 70

[couchdb:info,2016-05-04T13:56:32.153Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (on-disk) for partition 514 from sequence 119 to 120

[couchdb:info,2016-05-04T13:56:32.155Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (on-disk) for partition 596 from sequence 69 to 70

[couchdb:info,2016-05-04T13:56:32.156Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (on-disk) for partition 701 from sequence 61 to 62

[couchdb:info,2016-05-04T13:56:32.158Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (on-disk) for partition 823 from sequence 72 to 73

[couchdb:info,2016-05-04T13:56:32.159Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (on-disk) for partition 861 from sequence 85 to 86

[couchdb:info,2016-05-04T13:56:32.160Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]Updater for main set view group `_design/foo`, set `default`, read a total of 1 changes

[couchdb:info,2016-05-04T13:56:32.245Z,couchdb_ns_1@127.0.0.1:<0.1039.0>:couch_log:info:41]Updater checkpointing set view `default` update for main group `_design/foo`

[couchdb:info,2016-05-04T13:56:32.249Z,couchdb_ns_1@127.0.0.1:<0.295.0>:couch_log:info:41]Set view `default`, main (prod) group `_design/foo`, updater finished

Indexing time: 7.736 seconds

Blocked time:  0.000 seconds

Inserted IDs:  9

Deleted IDs:   0

Inserted KVs:  9

Deleted KVs:   0

Cleaned KVs:   0

# seqs done:   9

Between the snapshot marker from partition 90 and partition 261 (btw: partitions==vBuckets) there's a few seconds long gap:

[couchdb:info,2016-05-04T13:56:24.524Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (in-memory) for partition 90 from sequence 85 to 86

[couchdb:info,2016-05-04T13:56:32.149Z,couchdb_ns_1@127.0.0.1:<0.1040.0>:couch_log:info:41]set view `default`, main (prod) group `_design/foo`: received a snapshot marker (in-memory) for partition 261 from sequence 119 to 120

If you now look at the memcached log messages for partition 261:

$ grep 'vb 261' var/lib/couchbase/logs/memcached.log.12.txt|tail -n 3

2016-05-04T13:56:24.525384Z WARNING (default) DCP (Producer) eq_dcpq:mapreduce_view: default _design/foo (prod/main) - (vb 261) Creating stream with start seqno 119 and end seqno 120

2016-05-04T13:56:24.525420Z WARNING (default) DCP (Producer) eq_dcpq:mapreduce_view: default _design/foo (prod/main) - (vb 261) stream created!

2016-05-04T13:56:32.149689Z WARNING (default) DCP (Producer) eq_dcpq:mapreduce_view: default _design/foo (prod/main) - (vb 261) Stream closing, 0 items sent from backfill phase, 1 items sent from memory phase, 120 was last seqno sent, reason: The stream ended due to all items being streamed

You see that there's also a long gab between the creation and closing the stream.

[1]: https://forums.couchbase.com/t/4-1-0-ee-vs-4-1-1-ee-indexer-too-slow/8054

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

cbcollect-mb-19503-volker.zip
1.38 MB
06/May/16 12:43 AM
slowload.py
0.6 kB
04/May/16 8:19 AM

Issue Links

blocks

MB-19323 3.1.6 release

Closed

has to be done before

MB-19640 swap rebalance failed while querying views in parallel

Closed

relates to

MB-19428 Rebalance hung in Swap Rebalance test with Views

Closed

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

For Gerrit Dashboard: MB-19503
#	Subject	Branch	Project	Status	CR	V
63934,4	MB-19503: Fix ConnMap so notifications don't go missing.	3.0.x	ep-engine	Status: MERGED	+2	+1
63977,4	MB-19503: Unit test to demonstrate notifications can go missing.	watson	ep-engine	Status: MERGED	+2	+1
64025,2	MB-19503: Fix ConnMap so notifications don't go missing.	sherlock	ep-engine	Status: MERGED	-1	+1
64044,1	Merge remote-tracking branch 'couchbase/sherlock' into watson	watson	ep-engine	Status: MERGED	+2	+1
64069,2	MB-19503: Enable connmap_notify unit test	watson	ep-engine	Status: MERGED	+2	+1
64072,3	MB-19503: Fix ConnMap so notifications don't go missing [2]	sherlock	ep-engine	Status: MERGED	-1	+1
64092,3	Revert "MB-19503: Fix ConnMap so notifications don't go missing."	watson	ep-engine	Status: MERGED	+2	+1
64102,2	MB-19503: Fix ConnMap so notifications don't go missing.	watson	ep-engine	Status: MERGED	+2	+1
64103,1	MB-19503: Fix ConnMap so notifications don't go missing [2]	watson	ep-engine	Status: ABANDONED	0	0
64104,4	MB-19503: Expand ConnMap notify unit test for unpaused case	watson	ep-engine	Status: MERGED	+2	+1
64114,2	Merge remote-tracking branch 'couchbase/sherlock' into watson	watson	ep-engine	Status: MERGED	+2	+1
64115,2	MB-19503: Fix ConnMap so notifications don't go missing [2]	3.0.x	ep-engine	Status: MERGED	+2	+1
64117,1	Merge remote-tracking branch 'couchbase/watson' into master	master	ep-engine	Status: MERGED	+2	+1
64122,3	MB-19503: Fully restore server cookie API in unit test	watson	ep-engine	Status: MERGED	+2	+1
64154,1	Merge remote-tracking branch 'couchbase/watson' into master	master	ep-engine	Status: MERGED	+2	+1
64158,3	Merge remote-tracking branch 'couchbase/watson' into master	master	ep-engine	Status: MERGED	+2	+1
64332,1	Merge remote-tracking branch 'couchbase/watson' into master	master	ep-engine	Status: MERGED	+2	+1
65571,1	Merge remote-tracking branch 'couchbase/3.0.x' into sherlock	sherlock	ep-engine	Status: MERGED	+2	+1
66014,3	Merge commit 'couchbase/sherlock' into 'couchbase/watson'	watson	ep-engine	Status: ABANDONED	0	-1
66064,4	Merge remote-tracking branch 'couchbase/sherlock' into 'couchbase/watson'	watson	ep-engine	Status: MERGED	+2	+1