Details
-
Bug
-
Resolution: Fixed
-
Critical
-
6.6.0, Cheshire-Cat
-
Untriaged
-
-
1
-
Unknown
Description
Build : 6.6.0-7748
Test : -test tests/n1ql/test_n1ql_fts_flexindex.yml -scope tests/n1ql/scope_flex_index_12nodes.yml
Scale : 3
Day : 2nd
Seeing fatal errors in the query logs of 172.23.104.21 like the following -
[root@cnt7-qe1 logs]# zgrep -i "2020-06-07T06:" query.log | grep -i Fatal
|
2020-06-07T06:34:01.744-07:00 [Fatal] MetakvGet: Failed to fetch /indexing/info/versionToken from metakv: Get http://127.0.0.1:8091/_metakv/indexing/info/versionToken: dial tcp 127.0.0.1:8091: socket: too many open files
|
2020-06-07T06:34:01.744-07:00 [Fatal] MetakvSet Failed to set /indexing/info/versionToken: Put http://127.0.0.1:8091/_metakv/indexing/info/versionToken: dial tcp 127.0.0.1:8091: socket: too many open files
|
2020-06-07T06:39:01.743-07:00 [Fatal] MetakvGet: Failed to fetch /indexing/info/versionToken from metakv: Get http://127.0.0.1:8091/_metakv/indexing/info/versionToken: dial tcp 127.0.0.1:8091: socket: too many open files
|
2020-06-07T06:39:01.743-07:00 [Fatal] MetakvSet Failed to set /indexing/info/versionToken: Put http://127.0.0.1:8091/_metakv/indexing/info/versionToken: dial tcp 127.0.0.1:8091: socket: too many open files
|
2020-06-07T06:44:01.743-07:00 [Fatal] MetakvGet: Failed to fetch /indexing/info/versionToken from metakv: Get http://127.0.0.1:8091/_metakv/indexing/info/versionToken: dial tcp 127.0.0.1:8091: socket: too many open files
|
2020-06-07T06:44:01.743-07:00 [Fatal] MetakvSet Failed to set /indexing/info/versionToken: Put http://127.0.0.1:8091/_metakv/indexing/info/versionToken: dial tcp 127.0.0.1:8091: socket: too many open files
|
This also seems to affect the ongoing scans
2020-06-07T06:34:05.755-07:00 [Error] GSIC[default/default-1591388641155215948] "ee387171-8d8a-4366-a905-cf8620abe3b9" creating temp file : open /opt/couchbase/var/lib/couchbase/tmp/scan-results123396395655486: too many open files
|
2020-06-07T06:34:05.772-07:00 [Error] GSIC[default/default-1591388641155215948] "afdcbe87-5e5b-41f6-a1a2-be1f9ed2789d" creating temp file : open /opt/couchbase/var/lib/couchbase/tmp/scan-results123396895972229: too many open files
|
2020-06-07T06:34:06.202-07:00 [Error] GSIC[default/default-1591388641155215948] "3d752088-911f-4bae-a080-a8328c5e9e73" creating temp file : open /opt/couchbase/var/lib/couchbase/tmp/scan-results123396132921120: too many open files
|
One other thing observed is the number of connections opened by the grpc client. Seeing too many lines like these in the logs -
_time=2020-06-07T06:34:05.594-07:00 _level=INFO _msg=client: grpc client connection #1 created for host: 172.23.96.96:9130
|
_time=2020-06-07T06:34:05.594-07:00 _level=INFO _msg=client: grpc client connection #2 created for host: 172.23.96.96:9130
|
_time=2020-06-07T06:34:05.594-07:00 _level=INFO _msg=client: grpc client connection #3 created for host: 172.23.96.96:9130
|
_time=2020-06-07T06:34:05.594-07:00 _level=INFO _msg=client: grpc client connection #4 created for host: 172.23.96.96:9130
|
_time=2020-06-07T06:34:05.594-07:00 _level=INFO _msg=client: grpc client connection #0 created for host: 172.23.104.93:9130
|
_time=2020-06-07T06:34:05.594-07:00 _level=INFO _msg=client: grpc client connection #1 created for host: 172.23.104.93:9130
|
_time=2020-06-07T06:34:05.594-07:00 _level=INFO _msg=client: grpc client connection #2 created for host: 172.23.104.93:9130
|
_time=2020-06-07T06:34:05.595-07:00 _level=INFO _msg=client: grpc client connection #3 created for host: 172.23.104.93:9130
|
_time=2020-06-07T06:34:05.595-07:00 _level=INFO _msg=client: grpc client connection #4 created for host: 172.23.104.93:9130
|
Seeing 100K occurences of these messages in the logs in a 10 min duration. Not sure if actually it attempts to open the connection those many times.
[root@cnt7-qe1 logs]# cat query.log | grep "grpc client connection" | grep "2020-06-07T06:3" | wc -l
105730
Similar errors were also seen on the other query node - 172.23.104.23 around the same time.
Attachments
Issue Links
- duplicates
-
MB-41585 [N1FTY] Close old gRPC ClientConns on refresh
- Closed