Loading...

XML

Word

Printable

Details

Type: Task
Resolution: Unresolved
Priority: Major
Fix Version/s: 1.0
Affects Version/s: None
Component/s: cluster-monitor
Labels:
None

Epic Link:
Node Agent

Description

Currently in agent/cmd/cbhealthagent/main.go, we shut down the agent when all services cleanly exit, and abort start-up if a service fails to be created outright, but we don't handle the case where an agent starts up and then fails. It'll decrement the WaitGroup, but it won't hit zero so nothing will happen (plus, main() will only terminate if it receives a SIGINT).

We should have a means for sub-systems to notify the agent core that they have exited abnormally. The question that arises is what should the agent core do - retrying a few times may be sensible, but what happens if the error is fatal - go into Waiting state? Notify the cluster monitor somehow?

Attachments

Gerrit Reviews

- Issue Only
- Show All Reviews
- Show Open Reviews

No reviews matched the request. Check your Options in the drop-down menu of this sections header.

Activity

People

Assignee:: Unassigned

Reporter:: Marks Polakovs (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 04/Mar/22 5:03 AM

Updated:: 08/Apr/22 5:41 AM

Gerrit Reviews

There are no open Gerrit changes

Agent: handle sub-systems exiting abnormally

Details

Description

Attachments

Gerrit Reviews

Activity

People

Dates

Gerrit Reviews

PagerDuty