[Backport MB-52790 to 7.0.5] - perf tests stuck due to failed cbindex

Description

While running experiments on aether cluster for https://couchbasecloud.atlassian.net/browse/MB-51755#icft=MB-51755, I noticed that cbindex fails to execute build index and the test gets stuck.

For example, in aether/2081:

12:10:25 2022-06-28T23:40:25 [INFO] Running: /opt/couchbase/bin/cbindex -auth=Administrator:password -server 172.23.110.72:8091 -type build -indexes bucket-1:myindex 12:10:25  12:10:25 [172.23.110.53] run: /opt/couchbase/bin/cbindex -auth=Administrator:password -server 172.23.110.72:8091 -type build -indexes bucket-1:myindex 12:10:25 [172.23.110.53] out: 2022-06-28T23:40:25.726-07:00 [Error] PeerPipe.doRecieve() : ecounter error when received mesasage from Peer 172.23.110.72:9100.  Error = EOF. Kill Pipe. 12:10:25 [172.23.110.53] out: 2022-06-28T23:40:25.727-07:00 [Error] FollowerSyncProxy.receiveAndUpdateAcceptedEpoch(): Error encountered = Server Error : SyncProxy.listen(): channel closed. Terminate 12:10:25 [172.23.110.53] out: 2022-06-28T23:40:25.727-07:00 [Error] WatcherServer.runOnce() : Watcher fail to synchronized with peer 172.23.110.72:9100 12:10:25 [172.23.110.53] out: Index building for: [] 12:10:26 [172.23.110.53] out:

In indexer log at the same timestamp:

2022-06-28T23:40:25.724-07:00 [Error] PeerListener.handleConnection error in authfn Protocol Error : IndexManager:ServerAuth: Expect message Request, Receive message FollowerInfo for conn 172.23.110.72:9100:172.23.110.53:46254

 

aether runs that got stuck:

  1. aether/2181 - logs: http://supportal.couchbase.com/snapshot/00f716993c837f142ce7be66d795d3a6::0

    1. I was able to ssh into the machine and successfully execute the cbindex build. Logs are from before this.

  2. aether/2080 - logs: no logs

  3. aether/2179 - logs: http://supportal.couchbase.com/snapshot/8a7321512e5e867a7b0fc1499f37d232::0

 

These experiments were with various toy builds built on top of 7.1.1-3097

Components

Affects versions

Fix versions

Labels

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Activity

Dhruvil Shah October 24, 2022 at 10:29 AM

Thank you for running the perf test!

The output from the above run is missing log:

20XX-XX-XXTXX:XX:XXX [WARN] watcher:ClientAuth cluster Ver ()/Internal Version () not yet initialised

This confirms that the issue is fixed. Closing the issue.

Devansh Srivastava October 21, 2022 at 1:30 PM

Hey you requested job has been scheduled .. [[2394]|http://perf.jenkins.couchbase.com/job/aether/2394/]

CB robot October 21, 2022 at 12:30 PM

Build couchbase-server-7.0.5-7621 contains indexing commit b52fce8 with commit message:
https://couchbasecloud.atlassian.net/browse/MB-54025#icft=MB-54025 initialise indexer internal verison

CB robot October 21, 2022 at 12:29 PM

Build couchbase-server-7.0.5-7621 contains indexing commit 103113f with commit message:
https://couchbasecloud.atlassian.net/browse/MB-54025#icft=MB-54025 - update clusterVer, intVer source

Dhruvil Shah October 21, 2022 at 9:48 AM

can you run perf tests and confirm the backports are fixing the issue?

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Is this a Regression?

Unknown

Triage

Untriaged

Story Points

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created October 7, 2022 at 5:40 AM
Updated February 23, 2023 at 3:04 PM
Resolved October 21, 2022 at 9:46 AM
Instabug