Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.0.0
-
Cluster: atlas_setupA
OS: CentOS 7
CPU: E5-2680 v3 (48 vCPU)
Memory: 256 GB
Disk: Samsung PM863
-
Untriaged
-
Unknown
Description
Setup and test:
CB build: 6.0.0-1242
3 nodes, data+search on each node
1M docs, Scorch
30 client threads executing fuzzy2 queries
Once load started the cbft stops responding and requests start timing out. As soon as load level goes down the cbft gets back.
The disk utilization is 95+% but it doesn't look like disk saturation is causing the timeouts. The data service works ok also server is pretty responsive when connecting via ssh. Also disk utilization is pretty high with other heavy queries like fuzzy1 or wild.
On the other hand, amount of files created by cbft grows way faster with fuzzy2:
This is either the reason or the outcome of the issue.
Full comparison:
Logs:
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-atlas-sdk-2683/172.23.99.211.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-atlas-sdk-2683/172.23.99.39.zip
https://s3-us-west-2.amazonaws.com/perf-artifacts/jenkins-atlas-sdk-2683/172.23.99.40.zip