State Machines
State machines are the core orchestration mechanism in UMH-Core. Every component is managed by a finite state machine (FSM) with clearly defined states and transitions. This provides predictable, observable behavior and enables reliable error handling and recovery.
How It Works
UMH-Core uses hierarchical state machines where components build upon each other:
Bridge (formerly Protocol Converter) = Connection + Source Flow + Sink Flow
Flow (DataFlow Component) = Benthos instance with lifecycle management
Benthos Flow = Individual Benthos process with detailed startup phases
Connection = Network probe service (typically nmap-based)
Each component inherits lifecycle states (to_be_created
, creating
, removing
, removed
) and adds operational states specific to its function. The Agent continuously reconciles desired vs actual state, triggering appropriate transitions based on observed conditions.
1 β Redpanda Service
stopped
β
redpanda
process not running.
stop_done from stopping, or initial create.
start β starting
starting
β
S6 launching broker; health checks pending.
start event from stopped.
start_done β idle start_failed β stopped
idle
β
Broker healthy, no data for 30 s (default idle window).
start_done or no_data_timeout from active.
data_received β active degraded β degraded stop β stopping
active
β
Broker healthy & BytesIn/OutPerSec
> 0.
data_received from idle.
no_data_timeout β idle degraded β degraded stop β stopping
β οΈ degraded
β
Broker running but β₯1 healthβcheck failing (disk-space-low
, cpu-saturated
, etc.).
degraded from idle/active.
recovered β idle stop β stopping
stopping
β
Graceful shutdown (draining clients).
stop from any running state.
stop_done β stopped
2 β Container Monitor
active
β
CPU < 85 %, RAM < 90 %, Disk < 90 %.
metrics_all_ok after monitor start or from degraded.
metrics_not_ok β degraded
β οΈ degraded
β
One of the above limits breached for 15 s.
metrics_not_ok
metrics_all_ok β active
monitoring_stopped
β
Watchdog disabled.
stop_monitoring_done
start_monitoring β monitoring_starting
monitoring_starting
β
Monitor service booting.
start_monitoring
start_monitoring_done β degraded (initial)
monitoring_stopping
β
Monitor shutting down.
stop_monitoring
stop_monitoring_done β monitoring_stopped
3 β Agent Monitor
active
β
Agent connected & internal tasks OK.
metrics_all_ok
metrics_not_ok β degraded
β οΈ degraded
β
Cloud unreachable / auth error / task panic.
metrics_not_ok
metrics_all_ok β active
monitoring_stopped
β
Agent health monitor off.
stop_monitoring_done
start_monitoring β monitoring_starting
monitoring_starting
β
Starting health checks.
start_monitoring
start_monitoring_done β degraded (initial)
monitoring_stopping
β
Halting checks.
stop_monitoring
stop_monitoring_done β monitoring_stopped
4 β DataFlow Component (Bridge)
Aggregate Bridge FSM
stopped
β
All subβservices stopped.
stop_done or after create.
start β starting
starting
β
Launching source & sink Benthos + connection monitor.
start
start_done β idle start_failed β starting_failed
starting_failed
β
At least one subβservice failed during start.
start_failed
Manual retry (start) or removal
idle
β
Subβservices healthy, no payload for 30 s.
start_done, no_data_received, recovered
data_received β active benthos_degraded β degraded stop β stopping
active
β
Data moving through at least one flow.
data_received
no_data_received β idle benthos_degraded β degraded stop β stopping
β οΈ degraded
β
β₯1 subβFSM degraded/down (connection lost, flow error).
benthos_degraded
benthos_recovered β idle stop β stopping
stopping
β
Stopping Benthos + connection monitor.
stop
stop_done β stopped
4.1 Connection Service FSM
starting
β
Probe service launching.
up
β
Target reachable.
down
β
Target unreachable.
β οΈ degraded
β
Flaky / intermittent responses.
stopping
β
Probe shutting down.
stopped
β
Probe disabled.
4.2 Benthos Flow (Source /Sink)
stopped
β
Service file present, process not running.
starting
β
S6 launched process.
starting_config_loading
β
Benthos parsing YAML pipeline.
starting_waiting_for_healthchecks
β
Pipeline loaded; waiting for plugin health.
starting_waiting_for_service_to_remain_running
β
Stability grace period.
idle
β
Flow running, no msgs for idle window.
active
β
Processing messages.
β οΈ degraded
β
Flow running but error state (e.g., endpoint retries).
stopping
β
Graceful SIGTERM underway.
Idle/Active timeout: default 30 s (
DFC_IDLE_WINDOW
).
Quick Defaults
Idle window (Redpanda)
30 s
REDPANDA_IDLE_WINDOW
Idle window (Bridge)
30 s
DFC_IDLE_WINDOW
Container CPU limit
85 %
CONTAINER_CPU_LIMIT
Container RAM limit
90 %
CONTAINER_RAM_LIMIT
Container Disk limit
90 %
CONTAINER_DISK_LIMIT
Last updated