Search: 1K indexes: One of index creation failed with i/o timeout

Description

Build: 7.1.0-1566
Test: -test tests/fts/cheshire-cat/test_fts_clusterops_cheshire_cat_coll_crud_freetier.yml -scope tests/fts/cheshire-cat/scope_fts_cheshire_cat_free_tier.yml

  • Cluster with 3 nodes having kv,n1ql, search, index on all the nodes

  • Create 1 bucket, 100 scopes and 10 collections in each scopes

  • Create 2500 GSI indexes ( 5 on each collection)

  • Load documents on some of the collections

  • Created 1000 indexes: one index (1 partition) on each collection

  • Run queries on each collection

  • Mutate the documents on each collection and wait for all the index to process mutation

  • Run queries on each collection

  • Delete all the indexes

During creation of 1000 indexes, one of the index which is requested to create on scope_78.coll_1, failed with below error:

test log:

Do not see a significant info in fts log for this request. But here are the logs at the above timestamp.

Logs:
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1635430507/collectinfo-2021-10-28T141509-ns_1%40172.23.100.161.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1635430507/collectinfo-2021-10-28T141509-ns_1%40172.23.100.162.zip
url : https://cb-jira.s3.us-east-2.amazonaws.com/logs/systestmon-1635430507/collectinfo-2021-10-28T141509-ns_1%40172.23.100.163.zip

Components

Affects versions

Fix versions

Environment

None

Link to Log File, atop/blg, CBCollectInfo, Core dump

None

Release Notes Description

None

Activity

Show:

Sreekanth Sivasankaran January 25, 2022 at 10:41 AM

Fixes for the issue  would help fix this one as well. 

Abhi Dangeti January 12, 2022 at 12:55 AM

Would you share the logs where you have seen this issue while using builds later than 7.1.0-1643.

CB robot November 4, 2021 at 6:21 PM

Build couchbase-server-7.1.0-1643 contains cbft commit 956e193 with commit message:
: Prefix context to restRequestParser errors

Abhi Dangeti November 3, 2021 at 2:03 PM

 Been looking at so many metaKV issues, guessed this could've been another. This call in the preparePerms() code path can involve a metaKV fetch ..

https://github.com/couchbase/cbft/blob/master/rest_auth.go#L364

The change I reverted is to continue testing of enforcing limits (on fts indexes - we can chat on this separately). Since I thought it could affect this test, I asked for a retest.

But now that I look at the error again ..

Looks like the i/o timeout is between node 172.23.100.161 and 172.23.107.77 - which is not even part of the cluster. So possibly the client? If this is the request parser timing out - believe it's the first time we're seeing this.

Sreekanth Sivasankaran November 3, 2021 at 5:35 AM
Edited

, may I know which of your commits from the other ticket is supposed to address this timeout?

Not sure whether I could see a metakv fetch here in the preparePerms() call.

Doesn't this look more like a socket read i/o timeout while reading the request contents itself?

 

Aside from this, now that you reverted and brought back metakv fetches, aren't we supposed to see the former metakv related timeouts too going forward?

Duplicate
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Girish Benakappa

Reporter

Is this a Regression?

Unknown

Triage

Untriaged

Story Points

1

Priority

Instabug

Open Instabug

PagerDuty

Sentry

Zendesk Support

Created November 2, 2021 at 2:25 AM
Updated January 25, 2022 at 10:42 AM
Resolved January 25, 2022 at 10:42 AM
Instabug
Loading...