Details
Description
Setup:
- A nine (9) node c6gd.2xlarge AWS system with 20 buckets. 3 KV, 2 query, 2 index, 2 fts.
- Load 35190 docs (replicated 10X travel-sample._default._default with UUID keys)
- Build a Search index on all 20 nodes concurrently (as per "Elixir Performance & Scalability")
I used a modified HTTP benchmarking tool like 'ab' based on tool 'bombardier'
Randomly select a bucket via 125 concurrent clients to to make 1M Search requests
[ec2-user@ip-10-0-0-133 query02]$ ./cb_fts_bench -m POST -H "Content-Type: application/json" -t 15s -c 125 -k -a -u admin:$PASSWORD -n 1000000 -b '{"query": { "query": "+balcony +whales +ritz" }}' http://10.0.1.28:8094/api/index/ts[[SEQ:1:20]]._default.ts[[SEQ:1:20]]_fts_01/query
|
Bombarding http://10.0.1.28:8094/api/index/ts[[SEQ:1:20]]._default.ts[[SEQ:1:20]]_fts_01/query with 1000000 request(s) using 125 connection(s)
|
1000000 / 1000000 [===================================================] 100.00% 16104/s 1m2s
|
Done!
|
Statistics Avg Stdev Max
|
Reqs/sec 16148.31 2623.01 21654.98
|
Latency 7.74ms 6.28ms 580.30ms
|
HTTP codes:
|
1xx - 0, 2xx - 1000000, 3xx - 0, 4xx - 0, 5xx - 0
|
others - 0
|
Throughput: 35.24MB/s
|
HTTP 429 retries 0, pct. 0.000% across 1000000 reqs
|
Ave Sizes
|
resp.body: 1893 bytes (across 1000000 reqs)
|
resp.body.hit[]: 1541 bytes (across 1000000 reqs having hits [100.00%])
|
Other
|
total_hits 10000000, ave 10.000
|
tot_hits_docreads 10000000, ave 10.000
|
The above shows 100% success as in all 2xx codes with good latancies.
Repeat a few times (more 1M Search requests) but adjust the query command
'{"query": { "query": "+balcony +ritz" }}'
|
|
or
|
|
'{"query": { "query": "+balcony" }}'
|
After 3-4 runs the system exhibits hard timeouts (I hit ctrl-C) note the timeout is set to 15 sec. "-t 15s" flag
[ec2-user@ip-10-0-0-133 query02]$ ./cb_fts_bench -m POST -H "Content-Type: application/json" -t 15s -c 125 -k -a -u admin:$PASSWORD -n 1000000 -b '{"query": { "query": "+balcony +whales +ritz" }}' http://10.0.1.28:8094/api/index/ts[[SEQ:1:20]]._default.ts[[SEQ:1:20]]_fts_01/query
|
Bombarding http://10.0.1.28:8094/api/index/ts[[SEQ:1:20]]._default.ts[[SEQ:1:20]]_fts_01/query with 1000000 request(s) using 125 connection(s)
|
375 / 1000000 [>------------------------] 0.04% 8/s 1d9h22m41s 1000000 / 1000000 [=========================================] 100.00% 22085/s 45s
|
Done!
|
Statistics Avg Stdev Max
|
Reqs/sec 8.42 219.19 6322.17
|
Latency 15.00s 369.61us 15.00s
|
HTTP codes:
|
1xx - 0, 2xx - 0, 3xx - 0, 4xx - 0, 5xx - 0
|
others - 500
|
Errors:
|
timeout - 500
|
Throughput: 2.01KB/s
|
HTTP 429 retries 0, pct. NaN% across 0 reqs
|
Ave Sizes
|
resp.body: n/a bytes (across 0 reqs)
|
resp.body.hit[]: n/a bytes (across 0 reqs having hits [ NaN%])
|
Other
|
total_hits 0, ave NaN
|
tot_hits_docreads 0, ave NaN
|
The system became very unresponsive even the UI when I go to the search page merely shows Gateway Timeout
It seems that there are some very odd WARN message in the fts.log file
2022-08-11T18:08:23.955+00:00 [WARN] (GOCBCORE) Pipeline Client 0x40001fe1c0 failed to bootstrap: ambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"ambiguous timeout"}},"OperationID":"CMD_SASLAUTH","Opaque":"0x6","TimeObserved":7942,"RetryReasons":null,"RetryAttempts":0,"LastDispatchedTo":"10.0.1.72:11210","LastDispatchedFrom":"10.0.1.28:47652","LastConnectionID":"e211ae45503c88a0/41a5c677f50a0dba"} -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:748
|
2022-08-11T18:08:24.615+00:00 [WARN] ns_server: retrieve partition seqs: gocbcore_utils: CBPartitionSeqs, fetchClient err: gocbcore_utils: createAgents (1), setup err: agent setup failed, err: ambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"ambiguous timeout"}},"OperationID":"CMD_SASLAUTH","Opaque":"0x6","TimeObserved":5711,"RetryReasons":null,"RetryAttempts":0,"LastDispatchedTo":"10.0.0.42:11210","LastDispatchedFrom":"10.0.1.28:39100","LastConnectionID":"e211ae45503c88a0/7975e2682b4795e2"} -- cbft.RunSourcePartitionSeqs() at ns_server.go:1121
|
2022-08-11T18:08:27.623+00:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:748
|
2022-08-11T18:08:30.623+00:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:748
|
2022-08-11T18:08:31.624+00:00 [WARN] (GOCBCORE) Pipeline Client 0x400053a0e0 failed to bootstrap: ambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"ambiguous timeout"}},"OperationID":"CMD_SASLAUTH","Opaque":"0x6","TimeObserved":5645,"RetryReasons":null,"RetryAttempts":0,"LastDispatchedTo":"10.0.1.72:11210","LastDispatchedFrom":"10.0.1.28:46532","LastConnectionID":"bc5dff67559585a6/444b9ac50706da8f"} -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:748
|
2022-08-11T18:08:31.624+00:00 [WARN] (GOCBCORE) Pipeline Client 0x400053a070 failed to bootstrap: ambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"ambiguous timeout"}},"OperationID":"CMD_SASLAUTH","Opaque":"0x6","TimeObserved":10306,"RetryReasons":null,"RetryAttempts":0,"LastDispatchedTo":"10.0.0.42:11210","LastDispatchedFrom":"10.0.1.28:41644","LastConnectionID":"bc5dff67559585a6/56307db006506e17"} -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:748
|
2022-08-11T18:08:31.624+00:00 [WARN] (GOCBCORE) Pipeline Client 0x400053a000 failed to bootstrap: ambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"ambiguous timeout"}},"OperationID":"CMD_SASLAUTH","Opaque":"0x6","TimeObserved":10199,"RetryReasons":null,"RetryAttempts":0,"LastDispatchedTo":"10.0.0.20:11210","LastDispatchedFrom":"10.0.1.28:50380","LastConnectionID":"bc5dff67559585a6/041ff9f08e481174"} -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:748
|
2022-08-11T18:08:35.294+00:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:748
|
2022-08-11T18:08:38.294+00:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:748
|
2022-08-11T18:08:39.294+00:00 [WARN] (GOCBCORE) Pipeline Client 0x4000454230 failed to bootstrap: ambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"ambiguous timeout"}},"OperationID":"CMD_SASLAUTH","Opaque":"0x6","TimeObserved":10108,"RetryReasons":null,"RetryAttempts":0,"LastDispatchedTo":"10.0.1.72:11210","LastDispatchedFrom":"10.0.1.28:46542","LastConnectionID":"e64d3b08de2d94de/adf3cbb128b938bc"} -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:748
|
2022-08-11T18:08:39.294+00:00 [WARN] (GOCBCORE) Pipeline Client 0x40004541c0 failed to bootstrap: ambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"ambiguous timeout"}},"OperationID":"CMD_SASLAUTH","Opaque":"0x6","TimeObserved":18986,"RetryReasons":null,"RetryAttempts":0,"LastDispatchedTo":"10.0.0.42:11210","LastDispatchedFrom":"10.0.1.28:41660","LastConnectionID":"e64d3b08de2d94de/3825347be8d0730a"} -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:748
|
2022-08-11T18:08:39.294+00:00 [WARN] (GOCBCORE) Pipeline Client 0x4000454070 failed to bootstrap: ambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"ambiguous timeout"}},"OperationID":"CMD_SASLAUTH","Opaque":"0x6","TimeObserved":15376,"RetryReasons":null,"RetryAttempts":0,"LastDispatchedTo":"10.0.0.20:11210","LastDispatchedFrom":"10.0.1.28:50384","LastConnectionID":"e64d3b08de2d94de/22c002e7ed08c2ee"} -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:748
|
2022-08-11T18:08:42.957+00:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:748
|
2022-08-11T18:08:46.957+00:00 [WARN] (GOCBCORE) Pipeline Client 0x4000338540 failed to bootstrap: ambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"ambiguous timeout"}},"OperationID":"CMD_SASLAUTH","Opaque":"0x6","TimeObserved":13169,"RetryReasons":null,"RetryAttempts":0,"LastDispatchedTo":"10.0.0.42:11210","LastDispatchedFrom":"10.0.1.28:39216","LastConnectionID":"3bb9c755d2e793c1/e1a22e1561cae4d6"} -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:748
|
2022-08-11T18:08:47.621+00:00 [WARN] ns_server: retrieve partition seqs: gocbcore_utils: CBPartitionSeqs, fetchClient err: gocbcore_utils: createAgents (1), setup err: agent setup failed, err: ambiguous timeout | {"InnerError":{"InnerError":{"InnerError":{},"Message":"ambiguous timeout"}},"OperationID":"CMD_SASLAUTH","Opaque":"0x6","TimeObserved":13169,"RetryReasons":null,"RetryAttempts":0,"LastDispatchedTo":"10.0.0.42:11210","LastDispatchedFrom":"10.0.1.28:39216","LastConnectionID":"3bb9c755d2e793c1/e1a22e1561cae4d6"} -- cbft.RunSourcePartitionSeqs() at ns_server.go:1121
|
2022-08-11T18:08:50.630+00:00 [WARN] (GOCBCORE) CCCPPOLL: Failed to retrieve CCCP config. ambiguous timeout -- cbgt.GocbcoreLogger.Log() at gocbcore_utils.go:748
|
2022-08-11T18:08:54.868+00:00 [WARN] janitor: JanitorOnce, err: janitor: JanitorOnce errors: 2, []string{"#0: janitor: adding feed, err: feed_dcp_gocbcore: StartGocbcoreDCPFeed, could not prepare DCP feed, name: ts05._default.ts05_fts_01_1f1e2ba304a8de31_4c1c5584, server: http://127.0.0.1:8091, bucketName: ts05, indexName: ts05._default.ts05_fts_01, err: newGocbcoreDCPFeed: error in setting up feed's stream options, err: agent setup failed, err: gocbcore_utils: createAgents (1), setup err: agent setup failed, err: ambiguous timeout | {\"InnerError\":{\"InnerError\":{\"InnerError\":{},\"Message\":\"ambiguous timeout\"}},\"OperationID\":\"CMD_SASLAUTH\",\"Opaque\":\"0x6\",\"TimeObserved\":10166,\"RetryReasons\":null,\"RetryAttempts\":0,\"LastDispatchedTo\":\"10.0.0.42:11210\",\"LastDispatchedFrom\":\"10.0.1.28:60876\",\"LastConnectionID\":\"fa1d034c7329ef90/09d2d78f77089071\"}", "#1: janitor: adding feed, err: feed_dcp_gocbcore: StartGocbcoreDCPFeed, could not prepare DCP feed, name: ts19._default.ts19_fts_01_40870f3600bd713b_4c1c5584, server: http://127.0.0.1:8091, bucketName: ts19, indexName: ts19._default.ts19_fts_01, err: newGocbcoreDCPFeed: error in setting up feed's stream options, err: agent setup failed, err: gocbcore_utils: createAgents (1), setup err: agent setup failed, err: ambiguous timeout | {\"InnerError\":{\"InnerError\":{\"InnerError\":{},\"Message\":\"ambiguous timeout\"}},\"OperationID\":\"CMD_SASLAUTH\",\"Opaque\":\"0x6\",\"TimeObserved\":18986,\"RetryReasons\":null,\"RetryAttempts\":0,\"LastDispatchedTo\":\"10.0.0.42:11210\",\"LastDispatchedFrom\":\"10.0.1.28:41660\",\"LastConnectionID\":\"e64d3b08de2d94de/3825347be8d0730a\"}"} -- cbgt.(*Manager).JanitorLoop() at manager_janitor.go:93
|
I took two (2) cbcollect_info's form the two FTS nodes
node1
/opt/couchbase/bin/cbcollect_info --tmp-dir=/tmp/space /tmp/cbcollect_fts_node1_of_2_gocbcore_sdk_issue
|
|
curl -s --upload-file /tmp/cbcollect_fts_node1_of_2_gocbcore_sdk_issue.zip https://s3.amazonaws.com/customers.couchbase.com/jonstrabala/
|
node2
/opt/couchbase/bin/cbcollect_info --tmp-dir=/tmp/space /tmp/cbcollect_fts_node1_of_2_gocbcore_sdk_issue
|
|
curl -s --upload-file /tmp/cbcollect_fts_node1_of_2_gocbcore_sdk_issue.zip https://s3.amazonaws.com/customers.couchbase.com/jonstrabala/
|
Abhi Dangeti/Thejas Orkombu, just fyi- the registry related panics in regulator are a dup of MB-53358 which is fixed in 7.2.0-1827.