Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Fixed
Priority: Critical
Fix Version/s: 1.5.9
Affects Version/s: 1.4.2, 1.5.0, 1.5.8
Component/s: Core
Labels:
None

Story Points:
1

Description

In situations where the query node (or any HTTP service) goes offline without sending a TCP FIN, because we no longer pipeline (as of 1.4.2), the keepalive may not detect a stale TCP connection as intended.

To repro, set up continuous workload with 2 query nodes and pool size fixed. Forcibly take one down and restart it. This will cause the TCP connection to go half open, as no TCP FIN will be sent on "crash".

Observe that the workload is unbalanced after this.

Note that this may not be observable without fixed workload as new connecitons would be opened within the pool limit. Also, note that the client doesn't detect it as stale because it won't schedule other workload on the connection. Thus this requires a pretty specific set of circumstances to reproduce.

Workaround
Tune TCP timeout down to a shorter time interval or add TCP keep-alive at the OS level.

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews
- Show All Issues
- Show Open Issues

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Michael Nitschinger

Reporter:: Matt Ingenthron

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 30/May/18 2:01 PM

Updated:: 24/Apr/20 1:53 PM

Resolved:: 11/Jun/18 9:58 AM

Gerrit Reviews

There are no open Gerrit changes

Show There are 2 closed Gerrit changes

Hide There are 2 closed Gerrit changes

JVMCBC-543: Recover earlier than OS TCP timeout: Gerrit Review:

JVMCBC-543: Recover earlier than OS TCP timeout: Gerrit Review:

connection may not recover until OS TCP timeout

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty