Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-57615

Error after header write of chunked response on prometheus fetch results > 4K

    XMLWordPrintable

Details

    • Untriaged
    • 0
    • Unknown
    • Analytics Sprint 21

    Description

      Observed in MB-57601 logs.

      In a recent system test run, CBAS is generating a WARNing on every prometheus fetch, where the result is > 4K. There are over 9900 instances of this warning in the logs.

      2023-06-23T03:50:37.288-07:00 WARN CBAS.server.ChunkedResponse [HttpExecutor(port:8095)-0] Error after header write of chunked response
      2023-06-23T03:50:47.288-07:00 WARN CBAS.server.ChunkedResponse [HttpExecutor(port:8095)-3] Error after header write of chunked response
      2023-06-23T03:50:57.288-07:00 WARN CBAS.server.ChunkedResponse [HttpExecutor(port:8095)-6] Error after header write of chunked response
      2023-06-23T03:51:07.288-07:00 WARN CBAS.server.ChunkedResponse [HttpExecutor(port:8095)-5] Error after header write of chunked response
      2023-06-23T03:51:17.288-07:00 WARN CBAS.server.ChunkedResponse [HttpExecutor(port:8095)-8] Error after header write of chunked response
      2023-06-23T03:51:27.288-07:00 WARN CBAS.server.ChunkedResponse [HttpExecutor(port:8095)-9] Error after header write of chunked response
      

      Also, the result is returning a 500 to ns_server- I do not know if ns_server successfully scrapes otherwise well-formed results w/ a 500 status- if they do not scrape metrics w/ a 500 error, the issue is even more severe than the WARN spam in the log.

      EDIT: I can confirm from promtimer that all analytics stats are missing when the 500 is returned for ns_server; so this is a pretty severe issue.

       

       

      Issue Resolution
      When the Prometheus stats returned from Analytics exceeded four kilobytes, the status code was inadvertently set to 500 (Internal Error), and this resulted in a large number of warnings in the Analytics warning log. Couchbase Server discarded these statistics. This has been fixed to properly return a 200 (OK) status code when the size of Prometheus stats exceeds 4KiB, allowing these stats to be recorded properly. The warning is not displayed.

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-57615
          # Subject Branch Project Status CR V

          Activity

            People

              Balakumaran.Gopal Balakumaran Gopal
              michael.blow Michael Blow
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty