Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-39803

[System Test] : Query logs have fatal errors - socket: too many open files

    XMLWordPrintable

Details

    Description

      Build : 6.6.0-7748
      Test : -test tests/n1ql/test_n1ql_fts_flexindex.yml -scope tests/n1ql/scope_flex_index_12nodes.yml
      Scale : 3
      Day : 2nd

      Seeing fatal errors in the query logs of 172.23.104.21 like the following -

      [root@cnt7-qe1 logs]# zgrep -i "2020-06-07T06:" query.log | grep -i Fatal
      2020-06-07T06:34:01.744-07:00 [Fatal] MetakvGet: Failed to fetch /indexing/info/versionToken from metakv: Get http://127.0.0.1:8091/_metakv/indexing/info/versionToken: dial tcp 127.0.0.1:8091: socket: too many open files
      2020-06-07T06:34:01.744-07:00 [Fatal] MetakvSet Failed to set /indexing/info/versionToken: Put http://127.0.0.1:8091/_metakv/indexing/info/versionToken: dial tcp 127.0.0.1:8091: socket: too many open files
      2020-06-07T06:39:01.743-07:00 [Fatal] MetakvGet: Failed to fetch /indexing/info/versionToken from metakv: Get http://127.0.0.1:8091/_metakv/indexing/info/versionToken: dial tcp 127.0.0.1:8091: socket: too many open files
      2020-06-07T06:39:01.743-07:00 [Fatal] MetakvSet Failed to set /indexing/info/versionToken: Put http://127.0.0.1:8091/_metakv/indexing/info/versionToken: dial tcp 127.0.0.1:8091: socket: too many open files
      2020-06-07T06:44:01.743-07:00 [Fatal] MetakvGet: Failed to fetch /indexing/info/versionToken from metakv: Get http://127.0.0.1:8091/_metakv/indexing/info/versionToken: dial tcp 127.0.0.1:8091: socket: too many open files
      2020-06-07T06:44:01.743-07:00 [Fatal] MetakvSet Failed to set /indexing/info/versionToken: Put http://127.0.0.1:8091/_metakv/indexing/info/versionToken: dial tcp 127.0.0.1:8091: socket: too many open files
      

      This also seems to affect the ongoing scans

      2020-06-07T06:34:05.755-07:00 [Error] GSIC[default/default-1591388641155215948] "ee387171-8d8a-4366-a905-cf8620abe3b9" creating temp file  : open /opt/couchbase/var/lib/couchbase/tmp/scan-results123396395655486: too many open files
      2020-06-07T06:34:05.772-07:00 [Error] GSIC[default/default-1591388641155215948] "afdcbe87-5e5b-41f6-a1a2-be1f9ed2789d" creating temp file  : open /opt/couchbase/var/lib/couchbase/tmp/scan-results123396895972229: too many open files
      2020-06-07T06:34:06.202-07:00 [Error] GSIC[default/default-1591388641155215948] "3d752088-911f-4bae-a080-a8328c5e9e73" creating temp file  : open /opt/couchbase/var/lib/couchbase/tmp/scan-results123396132921120: too many open files
      

      One other thing observed is the number of connections opened by the grpc client. Seeing too many lines like these in the logs -

      _time=2020-06-07T06:34:05.594-07:00 _level=INFO _msg=client: grpc client connection #1 created for host: 172.23.96.96:9130
      _time=2020-06-07T06:34:05.594-07:00 _level=INFO _msg=client: grpc client connection #2 created for host: 172.23.96.96:9130
      _time=2020-06-07T06:34:05.594-07:00 _level=INFO _msg=client: grpc client connection #3 created for host: 172.23.96.96:9130
      _time=2020-06-07T06:34:05.594-07:00 _level=INFO _msg=client: grpc client connection #4 created for host: 172.23.96.96:9130
      _time=2020-06-07T06:34:05.594-07:00 _level=INFO _msg=client: grpc client connection #0 created for host: 172.23.104.93:9130
      _time=2020-06-07T06:34:05.594-07:00 _level=INFO _msg=client: grpc client connection #1 created for host: 172.23.104.93:9130
      _time=2020-06-07T06:34:05.594-07:00 _level=INFO _msg=client: grpc client connection #2 created for host: 172.23.104.93:9130
      _time=2020-06-07T06:34:05.595-07:00 _level=INFO _msg=client: grpc client connection #3 created for host: 172.23.104.93:9130
      _time=2020-06-07T06:34:05.595-07:00 _level=INFO _msg=client: grpc client connection #4 created for host: 172.23.104.93:9130
      

      Seeing 100K occurences of these messages in the logs in a 10 min duration. Not sure if actually it attempts to open the connection those many times.

      [root@cnt7-qe1 logs]# cat query.log | grep "grpc client connection" | grep "2020-06-07T06:3" | wc -l
      105730

      Similar errors were also seen on the other query node - 172.23.104.23 around the same time.

      Attachments

        Issue Links

          No reviews matched the request. Check your Options in the drop-down menu of this sections header.

          Activity

            People

              mihir.kamdar Mihir Kamdar (Inactive)
              mihir.kamdar Mihir Kamdar (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty