Session is trying to reconnect for a long time using stale configuration during "lambda" testing scenario

Description

Test environment:

  • 2 node vagrant setup w/ CBS 7.1.0

  • beer-sample bucket

  • SDK w/ appropriate cxx client (at least commit - 872f7e)

Test steps:

  • run simple looping program (either KV loop or query loop, results are the same)

  • issue kill -s STOP <pid> during sleep

  • go to node1 (or node2) and execute rebalance that removes server

  • once rebalance is complete, issue kill -s CONT <pid>

Environment

None

Release Notes Description

None

Attachments

1
  • 25 Jul 2022, 12:49 PM

Activity

Show:

Sergey Auseyau July 26, 2022 at 6:43 AM

Turned out that the cause of this behaviour was clustermap notifications enabled by default.

During rebalance after removing node, the server generate a lot of the configuration revisions, where only last couple of them have the endpoint removed from the configuration. So when the SDK process thaws and continues to operate, it finds hundreds of configuration update packages pending on the socket delivered by OS kernel. Most of them already stale by that moment, but still we have to process them one by one to reach the last config where the node has been removed. During this processing, the sessions associated with removed node continue reconnecting. Also we cannot force configuration update, because the response will go into the same line and will be processed only once all stale notification will be considered.

As a fix, I just disable clustermap_notification feature for C++ SDK.

Also I will raise this issue on next team meeting.

Fixed
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Sprint

Story Points

Fix versions

Priority

Instabug

Open Instabug

Sentry

Zendesk Support

Created July 25, 2022 at 12:47 PM
Updated July 28, 2022 at 7:26 AM
Resolved July 28, 2022 at 7:26 AM
Instabug