Description
As seen during MB-31053, eventing is (re)connecting to one of the Data service nodes over half a million times in the period of 19 hours - on average 7.6 times per second.
As a consequence, the duration of the entire memcached log has been reduced to just 19 hours, which is significantly below what is a supportable duration (minimum: 24 hours; desired: 7+ days).
Example log messages resulting from repeated reconnection:
2018-08-25T09:33:51.588907Z INFO 123: HELO [{"a":"gocbcore/v7.1.3 gocb/v1.3.7","i":"4ffb38cc27df15a7/8dab845c1b92843d"}] XATTR, XERROR, Select bucket, Tracing [ 172.23.96.168:50508 - 172.23.99.11:11210 (not authenticated) ]
|
2018-08-25T09:33:51.589846Z INFO 123: Client 172.23.96.168:50508 authenticated as <ud>@eventing</ud>
|
2018-08-25T09:33:51.690733Z INFO 114: HELO [{"a":"gocbcore/v7.1.3 gocb/v1.3.7","i":"b989c5450686d4f4/fc23ebde441f76fc"}] XATTR, XERROR, Select bucket, Tracing [ 172.23.96.168:50510 - 172.23.99.11:11210 (not authenticated) ]
|
2018-08-25T09:33:51.692887Z INFO 114: Client 172.23.96.168:50510 authenticated as <ud>@eventing</ud>
|
2018-08-25T09:33:51.719859Z INFO 123: HELO [{"a":"gocbcore/v7.1.3 gocb/v1.3.7","i":"d42102ba62d998af/c50ab3c1fed03f59"}] XATTR, XERROR, Select bucket, Tracing [ 172.23.96.168:50512 - 172.23.99.11:11210 (not authenticated) ]
|
2018-08-25T09:33:51.720959Z INFO 123: Client 172.23.96.168:50512 authenticated as <ud>@eventing</ud>
|
2018-08-25T09:33:51.868894Z INFO 118: HELO [{"a":"gocbcore/v7.1.3 gocb/v1.3.7","i":"50e6a82c57445990/9c248749d8fbba4d"}] XATTR, XERROR, Select bucket, Tracing [ 172.23.96.168:50514 - 172.23.99.11:11210 (not authenticated) ]
|
2018-08-25T09:33:51.869967Z INFO 118: Client 172.23.96.168:50514 authenticated as <ud>@eventing</ud>
|
See node 172.23.99.11, memcached.log for instance:
# Total log lines:
|
$ wc -l memcached.log
|
1188976 memcached.log
|
# Number related to eventing / gocbcore connecting:
|
$ grep -c 'HELO \[{"a":"gocbcore\|authenticated as <ud>@eventing</ud>' memcached.log
|
961491
|
Note that the FDs are being reused, indicating that the connection is being closed and re-opened - in the case of FD 123 after 131ms.
Note this was during a rebalance - see original MB-31053 for logs, details of workload.
For context, the other users are nowhere near the number of eventing - there's definitely something specific to eventing here:
$ grep 'authenticated as <ud>' memcached.log | cut -d ' ' -f 8-|sort| uniq -c|sort -rn
|
519234 <ud>@eventing</ud>
|
4751 <ud>Administrator</ud>
|
90 <ud>@projector</ud>
|
69 <ud>default</ud>
|
68 <ud>@ns_server</ud>
|
Attachments
Issue Links
For Gerrit Dashboard: MB-31068 | ||||||
---|---|---|---|---|---|---|
# | Subject | Branch | Project | Status | CR | V |
99225,3 | MB-31068: Bump gocbcore version to v7.1.5. | master | manifest | Status: MERGED | +2 | +1 |