Details
-
Bug
-
Resolution: Fixed
-
Test Blocker
-
3.0
-
Security Level: Public
-
None
-
Untriaged
-
Unknown
-
June 30 - July 18
Description
Related to view bug ticket, MB-10490
For views, we open a single connection and reuse that connection for all the tasks such as gathering stats and streaming mutations. At at time we use this connection for requesting only one stream. But it is simultaneously used for querying stats.
Scenario:
1. Create a couchbase node with 1024 vbuckets, insert 10240 documents (no duplicates)
2. Create a default view and publish it
3. Create another couchbase node 1024 vbuckets
4. Added the second node to the cluster and rebalance
On the second node, when building index, view engine request stream for each vbucket and try to read mutations. But it is observed that after few streams or even for first stream, the open stream (seq 0 - x) succeeds with failover log. Then instead of receiving mutations 0 to x with stream_end, I am receiving a snapshot_marker and thats it. No more mutations coming for that stream and it gets stuck.
Please apply the attached couchdb patch (to keep erlang upr client timeout infinity) before reproducing it.
Also attached some debug logs with comments. Please refer streams.txt to see sequence of operations and ops.txt to see response coming from server.
Packet trace of upr (port 12002) with repro test other than the one corresponding to the debug log is also attached.
Attachments
Issue Links
- blocks
-
MB-10490 Simple-test Rebalance failure with badmatch on couch_set_view_group
- Closed
-
MB-10548 Views tests failing with error "vbucket_stream_already_exists" while querying
- Closed
-
MB-10730 Rebalance exited with reason "bulk_set_vbucket_state_failed" during rebalance+views test
- Closed
- duplicates
-
MB-10490 Simple-test Rebalance failure with badmatch on couch_set_view_group
- Closed
- is duplicated by
-
MB-10879 Rebalance fails sporadically on employee dataset test (make simple-test)
- Closed
-
MB-10910 Rebalance with views after failover fails due to "wait_checkpoint_persisted_failed"
- Closed
- is triggering
-
MB-10908 beam.smp RSS grows to 50GB during delta recovery causing OOM killer invocation and rebalance failure
- Closed
- relates to
-
MB-10772 During rebalance, getting timeout for the UPR stream.
- Closed