Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Blocker
Fix Version/s: 5.5.0
Affects Version/s: 5.5.0
Component/s: secondary-index
Labels:
- system-test

Triage:
Untriaged
Link to Log File, atop/blg, CBCollectInfo, Core dump:

Hide
https://s3.amazonaws.com/bugdb/jira/systest_reb_out/collectinfo-2018-03-16T172713-ns_1%40172.23.104.16.zip
https://s3.amazonaws.com/bugdb/jira/systest_reb_out/collectinfo-2018-03-16T172713-ns_1%40172.23.104.17.zip
https://s3.amazonaws.com/bugdb/jira/systest_reb_out/collectinfo-2018-03-16T172713-ns_1%40172.23.104.18.zip
https://s3.amazonaws.com/bugdb/jira/systest_reb_out/collectinfo-2018-03-16T172713-ns_1%40172.23.104.19.zip
https://s3.amazonaws.com/bugdb/jira/systest_reb_out/collectinfo-2018-03-16T172713-ns_1%40172.23.104.21.zip
https://s3.amazonaws.com/bugdb/jira/systest_reb_out/collectinfo-2018-03-16T172713-ns_1%40172.23.104.23.zip
https://s3.amazonaws.com/bugdb/jira/systest_reb_out/collectinfo-2018-03-16T172713-ns_1%40172.23.104.25.zip

Show
https://s3.amazonaws.com/bugdb/jira/systest_reb_out/collectinfo-2018-03-16T172713-ns_1%40172.23.104.16.zip https://s3.amazonaws.com/bugdb/jira/systest_reb_out/collectinfo-2018-03-16T172713-ns_1%40172.23.104.17.zip https://s3.amazonaws.com/bugdb/jira/systest_reb_out/collectinfo-2018-03-16T172713-ns_1%40172.23.104.18.zip https://s3.amazonaws.com/bugdb/jira/systest_reb_out/collectinfo-2018-03-16T172713-ns_1%40172.23.104.19.zip https://s3.amazonaws.com/bugdb/jira/systest_reb_out/collectinfo-2018-03-16T172713-ns_1%40172.23.104.21.zip https://s3.amazonaws.com/bugdb/jira/systest_reb_out/collectinfo-2018-03-16T172713-ns_1%40172.23.104.23.zip https://s3.amazonaws.com/bugdb/jira/systest_reb_out/collectinfo-2018-03-16T172713-ns_1%40172.23.104.25.zip
Is this a Regression?:
No

Description

Build : 5.5.0-2211

In the system test for secondary indexing, following are the steps performed :
1. 6 node cluster : 2 kv, 1 query and 3 indexer node
2. 4 buckets and 4 indexes on each of them, including 1 partitioned indexes.
3. Start constant kv ops
4. Start constant queries including aggregate pushdown queries
5. Leave the system idle for a few minutes.
6. Rebalance in another indexer node.
7. Rebalance out another index node.

There is a failure observed twice at this step. Indexer on node added in Step 6 fails. Here is the error shown in diag logs.

Service 'indexer' exited with status 134. Restarting. Messages:

/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/select.go:423 +0x1235 fp=0xc465b97b88 sp=0xc465b97928

runtime.selectgo(0xc465b97c38)

/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/select.go:238 +0x1c fp=0xc465b97bb0 sp=0xc465b97b88

github.com/couchbase/indexing/secondary/indexer.(*Rebalancer).tokenMergeOrReady.func1(0xc4211df600, 0xc49dfcd994, 0x24, 0xc4bf0a5880, 0x20, 0xc4bf0a58a0, 0x20, 0xc4bf0a58c0, 0x20, 0xc4bf0a58e0, ...)

/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/rebalancer.go:788 +0x480 fp=0xc465b97d00 sp=0xc465b97bb0

runtime.goexit()

/home/couchbase/.cbdepscache/exploded/x86_64/go-1.7.6/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc465b97d08 sp=0xc465b97d00

created by github.com/couchbase/indexing/secondary/indexer.(*Rebalancer).tokenMergeOrReady

/home/couchbase/jenkins/workspace/couchbase-server-unix/goproj/src/github.com/couchbase/indexing/secondary/indexer/rebalancer.go:829 +0x279

[goport(/opt/couchbase/bin/indexer)] 2018/03/16 10:22:54 child process exited with status 134

One observation was that even though the UI logs showed message Rebalance completed successfully for Step 6, it was stuck at 99.4% overall progress for >2 mins after that message.

cbcollectinfo attached.

The cluster is currently available for debugging if needed. It may be repurposed over the weekend.
http://172.23.104.18:8091/

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: John Liang

Reporter:: Mihir Kamdar (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 16/Mar/18 10:34 AM

Updated:: 21/Mar/18 2:40 AM

Resolved:: 20/Mar/18 10:35 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

MB-28750: Merge partitions of specific buckets after flush: Gerrit Review:

MB-28750: Do not clear out snapshot map when cloning: Gerrit Review:

System test : Indexer crash when rebalancing out another indexer node

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty