Uploaded image for project: 'Couchbase Server'
  1. Couchbase Server
  2. MB-31068

GoCB has excessive memcached connects; causing log to wrap prematurely

    XMLWordPrintable

Details

    • Bug
    • Resolution: Fixed
    • Blocker
    • 6.0.0
    • 6.0.0
    • clients
    • None
    • Untriaged
    • Unknown

    Description

      As seen during MB-31053, eventing is (re)connecting to one of the Data service nodes over half a million times in the period of 19 hours - on average 7.6 times per second.

      As a consequence, the duration of the entire memcached log has been reduced to just 19 hours, which is significantly below what is a supportable duration (minimum: 24 hours; desired: 7+ days).

      Example log messages resulting from repeated reconnection:

      2018-08-25T09:33:51.588907Z INFO 123: HELO [{"a":"gocbcore/v7.1.3 gocb/v1.3.7","i":"4ffb38cc27df15a7/8dab845c1b92843d"}] XATTR, XERROR, Select bucket, Tracing [ 172.23.96.168:50508 - 172.23.99.11:11210 (not authenticated) ]
      2018-08-25T09:33:51.589846Z INFO 123: Client 172.23.96.168:50508 authenticated as <ud>@eventing</ud>
      2018-08-25T09:33:51.690733Z INFO 114: HELO [{"a":"gocbcore/v7.1.3 gocb/v1.3.7","i":"b989c5450686d4f4/fc23ebde441f76fc"}] XATTR, XERROR, Select bucket, Tracing [ 172.23.96.168:50510 - 172.23.99.11:11210 (not authenticated) ]
      2018-08-25T09:33:51.692887Z INFO 114: Client 172.23.96.168:50510 authenticated as <ud>@eventing</ud>
      2018-08-25T09:33:51.719859Z INFO 123: HELO [{"a":"gocbcore/v7.1.3 gocb/v1.3.7","i":"d42102ba62d998af/c50ab3c1fed03f59"}] XATTR, XERROR, Select bucket, Tracing [ 172.23.96.168:50512 - 172.23.99.11:11210 (not authenticated) ]
      2018-08-25T09:33:51.720959Z INFO 123: Client 172.23.96.168:50512 authenticated as <ud>@eventing</ud>
      2018-08-25T09:33:51.868894Z INFO 118: HELO [{"a":"gocbcore/v7.1.3 gocb/v1.3.7","i":"50e6a82c57445990/9c248749d8fbba4d"}] XATTR, XERROR, Select bucket, Tracing [ 172.23.96.168:50514 - 172.23.99.11:11210 (not authenticated) ]
      2018-08-25T09:33:51.869967Z INFO 118: Client 172.23.96.168:50514 authenticated as <ud>@eventing</ud>
      

      See node 172.23.99.11, memcached.log for instance:

      # Total log lines:
      $ wc -l memcached.log 
       1188976 memcached.log
      # Number related to eventing / gocbcore connecting:
      $ grep -c 'HELO \[{"a":"gocbcore\|authenticated as <ud>@eventing</ud>' memcached.log
      961491
      

      Note that the FDs are being reused, indicating that the connection is being closed and re-opened - in the case of FD 123 after 131ms.

      Note this was during a rebalance - see original MB-31053 for logs, details of workload.

      For context, the other users are nowhere near the number of eventing - there's definitely something specific to eventing here:

      $ grep 'authenticated as <ud>' memcached.log | cut -d ' ' -f 8-|sort| uniq -c|sort -rn
      519234 <ud>@eventing</ud>
      4751 <ud>Administrator</ud>
        90 <ud>@projector</ud>
        69 <ud>default</ud>
        68 <ud>@ns_server</ud>
      

      Attachments

        Issue Links

          For Gerrit Dashboard: MB-31068
          # Subject Branch Project Status CR V

          Activity

            People

              dhaikney David Haikney (Inactive)
              drigby Dave Rigby (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Gerrit Reviews

                  There are no open Gerrit changes

                  PagerDuty